MiMo-V2-Flash Release Note 2026/02/04

  1. Upgraded Coding Capabilities in Thinking Mode: Specifically optimized for programming scenarios, the Thinking Mode now achieves a score of 78.6 on SWE-Bench Verified. Both the resolution rate and the quality of code generation have been significantly improved.

  2. Substantial Boost in Tool Calling Accuracy: Stability issues regarding tool usage have been resolved. Tool calling accuracy in Thinking Mode has surged from 64% to 97.0%, greatly enhancing execution reliability in Agent scenarios.

  3. Enhanced Instruction Following & Reduced Hallucinations:

  • Instruction Following: Improved adherence to specific instructions, achieving an AA-IFBench score of 72.

  • Factuality: Enhanced rigor in factual responses, with the Non-Hallucination Rate updated to 52%.

  1. Optimized Handling of Complex Tasks: Performance on Arena-Hard (Hard Prompts) in Thinking Mode has been strengthened, with the score rising to 60.6. The model now demonstrates superior performance when handling high-difficulty logic problems.

  2. More Efficient Chain-of-Thought (CoT): By optimizing CoT generation strategies, the consumption of redundant tokens has been significantly reduced. In benchmarks such as AIME25 and HMMT, the average generation length has decreased by 13% to 30%. This effectively lowers latency and token costs while maintaining model performance.

mimo-v2-flash-0204 mimo-v2-flash-0112 mimo-v2-flash
SWE-Bench Verified
Non-Thinking
73.7 73.3 73.4
SWE-Bench Verified
Thinking
78.6 74.2 -
Arena-Hard(Hard Prompt)
Non-Thinking
49.3 52.7 46.0
Arena-Hard(Creative Writing)
Non-Thinking
85.0 86.0 78.3
Aren-Hard(Hard Prompt)
Thinking
60.6 58.3 54.1
Arena-Hard(Creative Writing)
Thinking
85.8 90.4 86.2
AA-IFBench 72 - 64
AA-Omniscience Accuracy 19 - 27
AA-Omniscience Non-Hallucination Rate 52% - 9%
Tool call success rate
Thinking
97.0% 64% 44%

Benchmark mimo-v2-flash (Acc) mimo-v2-flash (Avg Tokens) mimo-v2-flash-0204 (Acc) mimo-v2-flash-0204 (Avg Tokens) Length Reduction Ratio (%)
AIME25 94.8 26984 91.1 18879 30.04%
HMMT_Feb_25 94.2 29294 92.9 21470 26.71%
LiveCodeBench-AA 83.2 21488 84.9 18335 14.67%
GPQA-Diamond 83.7 15862 83.8 13659 13.89%

Note: The model API call method and model name remain unchanged

Update Time May 28, 2026
We use cookies and similar technologies of our own to ensure the proper functioning of the website, customize content according to user preferences and analyze users' interactions on the website, as well as their browsing habits. You can find more information in our Cookie Policy. Select an option or go to Cookie Settings to manage your preferences. Learn More.