- cross-posted to:
- technology@hexbear.net
- cross-posted to:
- technology@hexbear.net
MiMo-7B, a series of reasoning-focused language models trained from scratch, demonstrating that small models can achieve exceptional mathematical and code reasoning capabilities, even outperforming larger 32B models. Key innovations include:
- Pre-training optimizations: Enhanced data pipelines, multi-dimensional filtering, and a three-stage data mixture (25T tokens) with Multiple-Token Prediction for improved reasoning.
- Post-training techniques: Curated 130K math/code problems with rule-based rewards, a difficulty-driven code reward for sparse tasks, and data re-sampling to stabilize RL training.
- RL infrastructure: A Seamless Rollout Engine accelerates training/validation by 2.29×/1.96×, paired with robust inference support. MiMo-7B-RL matches OpenAI’s o1-mini on reasoning tasks, with all models (base, SFT, RL) open-sourced to advance the community’s development of powerful reasoning LLMs.
an in-depth discussion of mimo-7b https://www.youtube.com/watch?v=y6mSdLgJYQY
I finally figured out the ramalamma runes to run stuff from hugging face. It sure does like to “think” through its problems:
How to use it:
Output:
Explanation:
strlen()
.left
andright
) start at the beginning and end of the string.This approach reverses the string efficiently with minimal memory usage (in-place reversal).
I’ve been ollama, it can just pull striaght from the huggingface url