Model Release

2026-06-02 mimo-v2.5-asr Released

Model Introduction:

Bilingual & Dialects: Supports Chinese, English, code-switching, and various regional dialects (Wu, Cantonese, Minnan, Sichuanese).
Lyrics Transcription: High-accuracy Chinese/English lyrics transcription in mixed vocal-instrumental tracks.
Robust in Complex Audio: Excels in challenging environments (high noise, far-field, multi-speaker).
Knowledge-Intensive AI: Pinpoint accuracy for classical poetry, jargon, and proper nouns, with auto-punctuation.

Model Introduction:

Trillion parameters, efficient architecture: 1T total parameters | 42B activations | 1M ultra-long context
Ultimate Agent Performance: In high-intensity agent scenarios, it performs comparably to Claude Opus4.6

Model Introduction:

Native full-modal perception + 1M context: Supports native understanding of images, videos, audio, and text, enabling cross-modal precise perception and long-range reasoning, with comprehensive perception capabilities ranking among the industry's forefront
Powerful full-modal Agent capabilities: It has native Agent execution capabilities, enabling it to efficiently complete complex tasks such as browsing, understanding, reasoning, and operation, with its performance in daily tasks comparable to that of mimo-v2.5-pro
Combining Performance and Efficiency: While maintaining leading capabilities, achieving superior token efficiency, and positioned at the Pareto frontier of performance and efficiency

Model Introduction:

Premium Voice TTS: Built-in with multiple high-quality premium voices, it has strong capabilities in understanding and adhering to style instructions, supports fine-grained control over speech rate, emotion, tone, etc., and meets the expression needs of multiple scenarios
Timbre Design: Supports quickly defining and generating new timbres through a single sentence, making timbre creation more intuitive and efficient
Timbre Cloning: Based on a small number of audio samples, it can reproduce the target timbre with high fidelity, while maintaining the consistency of timbre characteristics and possessing good generalization and stability

Model overview:

Uses hybrid architecture with a 1:7 ratio of Global Attention to Sliding Window Attention (SWA);
1T total parameters, with 42B active parameters;
Supports an ultra-long context window of 1M tokens.

Model details: https://platform.xiaomimimo.com/#/docs/news/v2-pro-release

Model overview:

Model details: https://platform.xiaomimimo.com/#/docs/news/v2-omni-release

Model overview:

Pretrained on over 100 million hours of data, using a self-developed multi-codebook speech modeling architecture;
Offers unique capabilities such as style control, singing, and voice cloning.

Pricing: free for a limited time.

Model details: https://platform.xiaomimimo.com/#/docs/news/v2-tts-release

Upgraded Coding Capabilities in Thinking Mode: Specifically optimized for programming scenarios, the Thinking Mode now achieves a score of 78.6 on SWE-Bench Verified. Both the resolution rate and the quality of code generation have been significantly improved.
Substantial Boost in Tool Calling Accuracy: Stability issues regarding tool usage have been resolved. Tool calling accuracy in Thinking Mode has surged from 64% to 97.0%, greatly enhancing execution reliability in Agent scenarios.
Enhanced Instruction Following & Reduced Hallucinations:

Instruction Following: Improved adherence to specific instructions, achieving an AA-IFBench score of 72.
Factuality: Enhanced rigor in factual responses, with the Non-Hallucination Rate updated to 52%.

Optimized Handling of Complex Tasks: Performance on Arena-Hard (Hard Prompts) in Thinking Mode has been strengthened, with the score rising to 60.6. The model now demonstrates superior performance when handling high-difficulty logic problems.
More Efficient Chain-of-Thought (CoT): By optimizing CoT generation strategies, the consumption of redundant tokens has been significantly reduced. In benchmarks such as AIME25 and HMMT, the average generation length has decreased by 13% to 30%. This effectively lowers latency and token costs while maintaining model performance.

	mimo-v2-flash-0204	mimo-v2-flash-0112	mimo-v2-flash
SWE-Bench Verified Non-Thinking	73.7	73.3	73.4
SWE-Bench Verified Thinking	78.6	74.2	-
Arena-Hard(Hard Prompt) Non-Thinking	49.3	52.7	46.0
Arena-Hard(Creative Writing) Non-Thinking	85.0	86.0	78.3
Aren-Hard(Hard Prompt) Thinking	60.6	58.3	54.1
Arena-Hard(Creative Writing) Thinking	85.8	90.4	86.2
AA-IFBench	72	-	64
AA-Omniscience Accuracy	19	-	27
AA-Omniscience Non-Hallucination Rate	52%	-	9%
Tool call success rate Thinking	97.0%	64%	44%

Benchmark	mimo-v2-flash (Acc)	mimo-v2-flash (Avg Tokens)	mimo-v2-flash-0204 (Acc)	mimo-v2-flash-0204 (Avg Tokens)	Length Reduction Ratio (%)
AIME25	94.8	26984	91.1	18879	30.04%
HMMT_Feb_25	94.2	29294	92.9	21470	26.71%
LiveCodeBench-AA	83.2	21488	84.9	18335	14.67%
GPQA-Diamond	83.7	15862	83.8	13659	13.89%

Note: The model API call method and model name remain unchanged

Enhanced general capabilities: Improved the model’s performance on a wide range of general-purpose tasks.
Upgraded coding performance in Thinking mode: Strengthened code generation quality in Thinking mode, especially for programming scenarios.
Deep integration with Claude Code: Fully supports using Thinking mode in Claude Code.
- Best practice: Set Thinking as the default mode to achieve more stable, higher-quality code generation.
Optimized Experience for Other Code Agents: Synchronized improvements to the interaction experience and generation quality across code assistant tools (Code Scaffolds) such as Kilo, Cline, and Roo.
Improved Stability & Instruction Following: Enhanced output stability and significantly improved adherence to specific output formats.

	mimo-v2-flash-0112	mimo-v2-flash
SWE-Bench Verified Non-Thinking	73.3	73.4
SWE-Bench Verified Thinking	74.2	-
Arena-Hard(Hard Prompt) Non-Thinking	52.7	46.0
Arena-Hard(Creative Writing) Non-Thinking	86.0	78.3
Arena-Hard(Hard Prompt) Thinking	58.3	54.1
Arena-Hard(Creative Writing) Thinking	90.4	86.2

Note: The model API call method and model name remain unchanged

Model overview:

Uses hybrid architecture with a 1:5 ratio of Global Attention to Sliding Window Attention (SWA), a window size of 128, native 32K context, and extended training up to 256K;
Introduces 3 MTP layers, delivering 2.5 to 3.7× faster inference.

Pricing: input 0.1/M tokens, output 0.3/M tokens.

Model details: mimo-v2-flash : High-Efficiency Inference, Code & Agent Foundation Model

Usage guide: First API call

Update Time June 02, 2026