# Xiaomi MiMo API Open Platform > Xiaomi MiMo API Open Platform provides high-performance inference services for Xiaomi's AI models, compatible with OpenAI and Anthropic API formats. This platform offers comprehensive API documentation, integration guides, and detailed update logs for Xiaomi-related AI models. It is designed to empower developers to build and deploy next-generation intelligent applications and agents with ease. --- DOCUMENT: First API Call --- URL: https://platform.xiaomimimo.com/static/docs/quick-start/first-api-call.md # First API Call ## Supported API Types Xiaomi MiMo API Open Platform is compatible with OpenAI API and Anthropic API formats. You can use existing SDKs to access model inference services. ## Preparation Before Calling ### Log in to Xiaomi MiMo API Open Platform Currently, the platform only provides personal account login. You need to use a Xiaomi account to log in. If you already have a Xiaomi account, you can log in directly. If you don't have a Xiaomi account, you can visit the [Console](https://platform.xiaomimimo.com/#/console/usage) to register, or register in advance at [id.mi.com](https://id.mi.com/). ### Get API Key Create an API Key in [Console-API Keys](https://platform.xiaomimimo.com/#/console/api-keys). Please keep your API Key safe to avoid leakage that may result in quota theft. It is recommended to configure the API Key in environment variables. ## Quick Integration Examples You can copy the following API example code and replace the API Key value to quickly make calls. The following system prompts are HIGHTLY recommended, please choose from English and Chinese version. > Chinese version > > ```json > 你是MiMo(中文名称也是MiMo),是小米公司研发的AI智能助手。 > 今天的日期:{date} {week},你的知识截止日期是2024年12月。 > ``` > English version > > ```json > You are MiMo, an AI assistant developed by Xiaomi. > Today's date: {date} {week}. Your knowledge cutoff date is December 2024. > ``` ### Python SDK Examples #### OpenAI API Format Example Install the OpenAI Python SDK by running the following command: ```shell # If the run fails, you can replace pip with pip3 and run again pip install -U openai ``` Call the API: ```python import os from openai import OpenAI client = OpenAI( api_key=os.environ.get("MIMO_API_KEY"), base_url="https://api.xiaomimimo.com/v1" ) completion = client.chat.completions.create( model="mimo-v2.5-pro", messages=[ { "role": "system", "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024." }, { "role": "user", "content": "please introduce yourself" } ], max_completion_tokens=1024, temperature=1.0, top_p=0.95, stream=False, stop=None, frequency_penalty=0, presence_penalty=0 ) print(completion.model_dump_json()) ``` #### Anthropic API Format Example Install the Anthropic Python SDK by running the following command: ```shell # If the run fails, you can replace pip with pip3 and run again pip install -U anthropic ``` Call the API: ```python import os from anthropic import Anthropic client = Anthropic( api_key=os.environ.get("MIMO_API_KEY"), base_url="https://api.xiaomimimo.com/anthropic" ) message = client.messages.create( model="mimo-v2.5-pro", max_tokens=1024, system="You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024.", messages=[ { "role": "user", "content": [ { "type": "text", "text": "please introduce yourself" } ] } ], top_p=0.95, stream=False, temperature=1.0, stop_sequences=None ) print(message.content) ``` ### Curl Examples #### OpenAI API Format Example ```bash curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \ --header "api-key: $MIMO_API_KEY" \ --header "Content-Type: application/json" \ --data-raw '{ "model": "mimo-v2.5-pro", "messages": [ { "role": "system", "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024." }, { "role": "user", "content": "please introduce yourself" } ], "max_completion_tokens": 1024, "temperature": 1.0, "top_p": 0.95, "stream": false, "stop": null, "frequency_penalty": 0, "presence_penalty": 0 }' ``` #### Anthropic API Format Example ```bash curl --location --request POST 'https://api.xiaomimimo.com/anthropic/v1/messages' \ --header "api-key: $MIMO_API_KEY" \ --header "Content-Type: application/json" \ --data-raw '{ "model": "mimo-v2.5-pro", "max_tokens": 1024, "system": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024.", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "please introduce yourself" } ] } ], "top_p": 0.95, "stream": false, "temperature": 1.0, "stop_sequences": null }' ``` ### Make Multi-turn Tool Calls in Thinking Mode During the multi-turn tool calls process in thinking mode, the model returns a `reasoning_content` field alongside `tool_calls`. To continue the conversation, it is recommended to keep all previous `reasoning_content` in the `messages` array for each subsequent request to achieve the best performance. The requested example is as follows: ```bash curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \ --header "api-key: $MIMO_API_KEY" \ --header "Content-Type: application/json" \ --data-raw '{ "messages": [ { "role": "assistant", "content": "Hello! I am MiMo.", "reasoning_content": "Okay, the user just asked me to introduce myself. That is a pretty straightforward request, but I should think about why they are asking this." }, { "role": "user", "content": "What is the weather like in Hebei?" } ], "model": "mimo-v2.5-pro", "max_completion_tokens": 1024, "temperature": 1.0, "stream": false, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "tool_choice": "auto" }' ``` ## Check Usage Information On the [Usage Information](https://platform.xiaomimimo.com/#/console/usage) page, you can view and export detailed data of your account's model Token usage and request counts by date. --- DOCUMENT: Model Hyperparameters --- URL: https://platform.xiaomimimo.com/static/docs/quick-start/model-hyperparameters.md # Model Hyperparameters `temperature` represents the sampling temperature. Higher values (such as 0.8) will make the output more random, while lower values (such as 0.2) will make the output more deterministic. `top_p` represents the probability threshold for nucleus sampling, used to control the diversity of text generated by the model. The higher the value, the greater the diversity of the generated text.
In thinking mode, the `mimo-v2.5-pro` and `mimo-v2.5` models do not support customizing the `temperature` parameter. Even if this parameter is passed in, it will be forcibly overridden and take effect with the model's recommended default value of `1.0`.
The default values and parameter ranges of `temperature` and `top_p` for different models are as follows:
**Model Name** **temperature** **top_p**
`mimo-v2.5-pro`
`mimo-v2-pro`
  • Default value: 1.0
  • Range: [0, 1.5]
  • Default value: 0.95
  • Range: [0.01, 1.0]
`mimo-v2.5`
`mimo-v2-omni`
  • Default value: 1.0
  • Range: [0, 1.5]
  • Default value: 0.95
  • Range: [0.01, 1.0]
`mimo-v2.5-tts`
`mimo-v2.5-tts-voicedesign`
`mimo-v2.5-tts-voiceclone`
`mimo-v2-tts`
  • Default value: 0.6
  • Range: [0, 1.5]
  • Default value: 0.95
  • Range: [0.01, 1.0]
`mimo-v2-flash`
  • Default value: 0.3
  • Range: [0, 1.5]
  • Default value: 0.95
  • Range: [0.01, 1.0]
We recommend that you set parameter values according to task type, and you can refer to the following recommended values. The recommended values for the `mimo-v2-flash` model are as follows:
**Task Type** **temperature** **top_p**
Vibe Coding 0.3 0.95
Function Call 0.3 0.95
General Conversation 0.8 0.95
Creative Writing 0.8 0.95
WebDev 0.8 0.95
Mathematical Reasoning 1 0.95
The recommended values for the `temperature` and `top_p` parameters of the `mimo-v2.5-pro`, `mimo-v2.5`, `mimo-v2-pro`, and `mimo-v2-omni` models for the above tasks are 1 and 0.95, respectively. --- DOCUMENT: Error Codes --- URL: https://platform.xiaomimimo.com/static/docs/quick-start/error-codes.md # Error Codes When using API calls to the MiMo model, common error codes and solutions are as follows:
**Error Code** **Causes** **Solutions**
400 - Invalid Format Invalid request format
  • Check if the JSON format is correct
  • Check if all required parameters are included
  • Check if parameter values are within the valid range
  • Check if the message format meets the interface requirements
  • Check if the model exists
  • Check if the fields are entered correctly
  • Check multimodal file input for compliance with format, size and other restrictions.
  • Check if multimodal file input is publicly accessible
  • In multi-turn conversations under thinking mode, the `reasoning_content` field must be fully passed back to the API.
401 - Authentication Fails
  • Missing or invalid API Key, or incorrect Authorization request header format
  • API Key that mixes Token Plan and Pay-as-you-go API
  • Check if the API key and request header format are correct
  • Check if a dedicated Base URL and API Key are used when using the Token Plan
402 - Insufficient Balance Insufficient account balance Check your account balance and recharge in a timely manner
403 - Forbidden Access The service is currently not available in the current region, or the API Key has been restricted by risk control Create a new API Key and pay attention to the security of input content
404 - Not Found The requested endpoint or model does not support image input capability Verify that the model / endpoint being used supports image input capability
421 - Content Filter Content moderation and blocking Avoid entering unsafe or sensitive content
429 - Too Many Requests Requests are too frequent, or the quota of Token Plan has been exhausted
  • Implement exponential backoff and retry logic, or reduce the request frequency
  • Upgrade the Token Plan package or switch to pay-as-you-go API
500 - Server Error Our server encounters an issue Please try again later, or contact us for resolution
503 - Server Overloaded The server is overloaded due to high traffic Please try again later
--- DOCUMENT: Pricing and Rate Limits --- URL: https://platform.xiaomimimo.com/static/docs/pricing.md # Pricing and Rate Limits The platform sets a model concurrency limit for accounts. When server load is high, response delays or 429 errors may occur. For details on the RPM and TPM limits of each model, please refer to the following table. We recommend that you plan your request frequency reasonably. > RPM: Requests Per Minute, which refers to the maximum number of requests you can initiate to us within one minute, and is the sum of the number of requests from all API Keys of a single account when invoking a certain model > > TPM: Tokens Per Minute, which refers to the maximum number of Tokens you can interact with us within one minute, and is the sum of the number of requested Tokens from all API Keys of a single account when invoking a certain model ## Pricing ### Domestic Pricing of the Model
Input ≤ 256K Input 256K - 1M
Input (Cache Hit) Input (Cache Miss) Output Input (Cache Hit) Input (Cache Miss) Output
`mimo-v2.5-pro`
`mimo-v2-pro`
¥1.40 ¥7.00 ¥21.00 ¥2.80 ¥14.00 ¥42.00
`mimo-v2.5` ¥0.56 ¥2.80 ¥14.00 ¥1.12 ¥5.60 ¥28.00
`mimo-v2-omni` ¥0.56 ¥2.80 ¥14.00
`mimo-v2-flash` ¥0.07 ¥0.70 ¥2.10
`mimo-v2.5-tts`
`mimo-v2.5-tts-voiceclone`
`mimo-v2.5-tts-voicedesign`
`mimo-v2-tts`
Limited-time free
> Note: Cache writing is currently free of charge for a limited time; — indicates that the context limit of this model is 256K, and this range does not apply. Unit: yuan / 1M tokens. ### Overseas Pricing of the Model
Input ≤ 256K Input 256K - 1M
Input (Cache Hit) Input (Cache Miss) Output Input (Cache Hit) Input (Cache Miss) Output
`mimo-v2.5-pro`
`mimo-v2-pro`
$0.20 $1.00 $3.00 $0.40 $2.00 $6.00
`mimo-v2.5` $0.08 $0.40 $2.00 $0.16 $0.80 $4.00
`mimo-v2-omni` $0.08 $0.40 $2.00
`mimo-v2-flash` $0.01 $0.10 $0.30
`mimo-v2.5-tts`
`mimo-v2.5-tts-voiceclone`
`mimo-v2.5-tts-voicedesign`
`mimo-v2-tts`
Limited-time free
> Note: Cache writing is currently free of charge for a limited time; — indicates that the context limit of this model is 256K, and this range does not apply. Unit: $ / 1M tokens. ### Pricing for Network Service Plugins
Service Item Price Description
Domestic Internet Connectivity Service ¥25 / 1000 times Includes web search and web parsing, used for domestic regional networked search of relevant content
Overseas Internet Connectivity Service $5 / 1000 times Includes web search and web parsing, used for networked search of relevant content in overseas regions
## Model Details ### Pro Series
**Model Name** `mimo-v2.5-pro`, `mimo-v2-pro`
**Category** Text Generation - General Large Language Model
**Context Length** 1 M
**Maximum Output Length** 128 K
**Model Capability** Text generation, deep thinking, streaming output, function call, structured output, internet search
**Flow Control** RPM: 100
TPM: 10 M
### Omni Series
**Model Name** `mimo-v2.5` `mimo-v2-omni`
**Category** Text Generation - Full Modal Understanding Model Text Generation - Full Modal Understanding Model
**Context Length** 1 M 256 K
**Maximum Output Length** 128 K 128 K
**Model Capability** Full-modal understanding, in-depth thinking, streaming output, function call, structured output, and internet search
**Flow Control** RPM: 100
TPM: 10 M
### TTS Series
**Model Name** `mimo-v2.5-tts` `mimo-v2.5-tts-voiceclone` `mimo-v2.5-tts-voicedesign` `mimo-v2-tts`
**Category** Speech Synthesis Model Speech Synthesis Model Speech Synthesis Model Speech Synthesis Model
**Context Length** 8 K 8 K 8 K 8 K
**Maximum Output Length** 8 K 8 K 8 K 8 K
**Model Capability** Speech Synthesis Timbre Cloning Timbre Design Speech Synthesis
**Flow Control** RPM: 100
TPM: 10 M
### MiMo-V2-Flash
**Model Name** `mimo-v2-flash`
**Category** Text Generation - General Large Language Model
**Context Length** 256 K
**Maximum Output Length** 64 K
**Model Capability** Text generation, deep thinking, streaming output, function call, structured output, internet search
**Flow Control** RPM: 100
TPM: 10 M
--- DOCUMENT: Xiaomi MiMo-V2.5 series open-sourced & Orbit 100 trillion token plan launched --- URL: https://platform.xiaomimimo.com/static/docs/news/v2.5-open-sourced.md # Xiaomi MiMo-V2.5 series open-sourced & Orbit 100 trillion token plan launched Today, we officially open source the Xiaomi MiMo-V2.5 series, which uses the MIT license, supports commercial inference deployment and secondary training, and requires no additional authorization. ## Open protocol, fully open source The MiMo V2.5 series models began public testing on April 23rd. We thank all users for their enthusiastic feedback and encouragement during this period. This series includes two models, both supporting a 1-million-token context window: - MiMo-V2.5-Pro: Designed for complex task scenarios, deeply optimized for Agent and Coding applications. It ranks first among open-source models globally on the GDPVal-AA and ClawEval leaderboards. - MiMo-V2.5: A native full-modal model supporting text, image, video, and audio understanding, with powerful Agent capabilities. ![图片](https://platform.xiaomimimo.com/static/VZxrbdHSUoqx63x5RtycLBnznfe.cb0a305d.png) We deeply understand that the true value of a model does not lie in its ranking on leaderboards, but rather in its ability to efficiently assist developers in solving real-world problems. On the Claw-Eval leaderboard, MiMo V2.5 ranks at the optimal frontier of task completion rate and Token efficiency ![图片](https://platform.xiaomimimo.com/static/BhnNbzCq5oBDXCxWPOcc9umtnPe.aeb7e48a.jpeg) After undergoing refinement and verification during the public beta phase, this series has further improved in terms of intelligence level and stability, and has reached the standard for release. Today, we are releasing the model weights of the MiMo V2.5 series to global developers under the MIT License, and at the same time, we are collaborating with chip manufacturers and inference frameworks to provide adaptation code, hoping to contribute to the open-source community and developer ecosystem. The weights of both models (including the Base model) have been fully open-sourced under the permissive MIT license, allowing free commercial use, secondary training, and fine-tuning without additional authorization. > Model weight collection: [https://huggingface.co/collections/XiaomiMiMo/mimo-v25](https://huggingface.co/collections/XiaomiMiMo/mimo-v25) For more details, refer to the model Blog: https://mimo.xiaomi.com/index#blog ## MiMo Orbit Program We believe that the value of open source lies not only in the public disclosure of weights, but more importantly, in the co-construction of the ecosystem. To this end, we are officially launching the MiMo Orbit Program. The MiMo Orbit plan is divided into two parts, namely the " **Creator Trillion Token Incentive Plan"** for AI builders and the " **Agent Ecosystem Co-construction Plan** " for Agent framework teams. ### Creator Trillion Token Incentive Program ![图片](https://platform.xiaomimimo.com/static/QzbXbqNIlou0rYxl8NjcZgQKn9f.49f56c9c.png) Xiaomi MiMo will distribute free Tokens to global users, with a total of **100 trillion (100T) Tokens** to be distributed within 30 days, and the distribution will end once all Tokens are given out. This event adopts an application system, and users whose applications are approved will receive the Max-tier Token Plan at most, which includes 1.6 billion Credits and is worth 659 yuan. **Event Time** From 00:00 on April 28, 2026, to 00:00 on May 28, 2026, Beijing Time **Participation Method** You can fill out the application via the following link or QR code. We will carefully evaluate each application material and match corresponding benefits based on your usage scenarios and needs. Successful applicants will receive our follow-up emails. Application URL: [100t.xiaomimimo.com](http://100t.xiaomimimo.com) Application QR Code: ![图片](https://platform.xiaomimimo.com/static/ZDwPbXuHSoURfdxMjW7cxqnmnMe.c8e38d08.png) ### Agent Ecosystem Co-construction Initiative Xiaomi MiMo provides specialized support to the global Agent Framework Team. We will offer limited-time free support for the Agent Framework, enabling your users to access and experience the MiMo series of models with zero barriers. During the model ecosystem adaptation process, we have carried out in-depth cooperation with Agent framework vendors such as OpenCode, Hermes Agent, and KiloCode, and received a great deal of positive feedback and recognition.
![图片](https://platform.xiaomimimo.com/static/BFeBbuvrVoxBXfxmpkPcSSASnCx.40e64d37.png) ![图片](https://platform.xiaomimimo.com/static/GByLbKMNGoOqutxZcybcY9D8nOt.eda23ad9.png) ![图片](https://platform.xiaomimimo.com/static/GjbfbWQbho1sE2xayLIc3TCunSd.116ef429.png) ![图片](https://platform.xiaomimimo.com/static/WlzwbOfDXorn7OxU45UczO0xnae.4621aefb.png)
We welcome like-minded Agent framework developers and manufacturers to contact us:[ business-mimo@xiaomi.com ](mailto:business-mimo@xiaomi.com) ## Chip ecosystem and inference framework adaptation MiMo-V2.5-Pro completed the integration and adaptation with multiple chip manufacturers on the first day of its open source release. The following is a partial list of manufacturers: - Ali T-HEAD > The T-HEAD Zhenwu 810E relies on a full-stack self-developed AI software stack to achieve deep adaptation. - Amazon Web Services > Amazon Web Services (AWS) has completed the in-depth adaptation of MiMo-V2.5-Pro based on its self-developed Trainium2 chip, Neuron SDK, and vLLM inference framework, achieving first-day adaptation where the model is globally available upon open-sourcing. The next-generation 3nm process Trainium3 will further unleash the Agentic performance potential of the model. - AMD > AMD, relying on the ROCm open-source software stack, provides Day-0 adaptation and comprehensive optimization support for MiMo-V2.5-Pro, helping developers and enterprise users efficiently complete model deployment and go live. - Baidu Kunlun Chip > Kunlun Chip relies on its self-developed architecture, effectively ensuring the stable and efficient operation of models on the platform through underlying operator optimization and software-hardware co-acceleration, and building a solid computing power foundation for upper-layer applications. - Suiyuan Technology > Suiyuan Technology relies on its self-developed Yusuan TopsRider software stack for in-depth optimization. MiMo-V2.5-Pro has completed full adaptation on Suiyuan L600, achieving stable operation with high throughput and low latency, and maintaining excellent performance in complex tasks and long sequence scenarios. - Muxi > Muxi Xiyun C Series relies on the full stack self-developed MXMACA software stack to achieve end-to-end native support from Triton syntax to Muxi GPU instruction set, with better performance. - Days Intelligence Chip > Tianshu Zhixin can achieve Day 0-level deep adaptation of models, relying on full-stack self-developed software and hardware to build high-quality computing power, with efficient adaptation and easy migration, capable of precisely unleashing model performance and ensuring stable operation. In addition, the MiMo-V2.5 series models have also completed Day-0 adaptation for the mainstream inference frameworks SGLang and vLLM. ![图片](https://platform.xiaomimimo.com/static/F6upbwMkIol7iex8R0Sc81XYnhh.3167ebd9.png) From the first-generation model to today's full open source of MiMo-V2.5, every step of MiMo's growth has been inseparable from the community's feedback and co-construction. We will continue to invest in the iteration of model capabilities and the improvement of the ecosystem, and work together with global developers to enable Agent to truly enter every application scenario. --- DOCUMENT: Xiaomi MiMo-V2.5-TTS-Series + ASR Officially Launched: Your Voice, Under Your Control --- URL: https://platform.xiaomimimo.com/static/docs/news/v2.5-tts-release.md # Xiaomi MiMo-V2.5-TTS-Series + ASR Officially Launched: Your Voice, Under Your Control 图片 Speech technology is undergoing such a transformation: from "being able to listen and read" to "precise understanding and flexible expression". In real creative and interactive scenarios, machines not only need to penetrate complex spoken language environments - dialect accents, environmental noise, multiple people speaking simultaneously - but also use voice to shape characters, grasp emotions, so that expression is no longer just about conveying words, but also conveying feelings. Whether it's content creators or businesses relying on speech technology, what they truly need is a speech system that can be freely controlled by language: input a noisy meeting recording, and it can accurately transcribe; input a director's note saying "this part should be low and angry", and it can generate a fitting performance. It understands everything and can express everything. To this end, we officially release today **MiMo-V2.5-TTS Series** and **MiMo-V2.5-ASR** — a whole-link speech model series for the Agent era, covering the two core capabilities of recognition and synthesis, enabling both speech input and output to be freely scheduled by language. - The MiMo-V2.5-TTS Series includes three models, which have now been launched on **Xiaomi MiMo Open Platform** , and **are available for free for a limited time** . The three models share unified style instruction following, audio label control, and text understanding capabilities, enabling voice performance to be precisely regulated by language, respectively covering three typical creative needs: - **MiMo-V2.5-TTS:** Built-in with multiple high-quality premium voices, supports fine-grained control over speech rate, emotion, tone, etc., Out Of The Box, meeting multi-scenario expression needs. - **MiMo-V2.5-TTS-VoiceDesign:** Quickly define and generate a brand-new voice in one sentence, making voice creation more intuitive and efficient. - **MiMo-V2.5-TTS-VoiceClone:** High-fidelity replication of target timbre with a small number of samples, while maintaining stable style instruction following and audio label control capabilities. **MiMo-Studio Quick Experience Address:** **https://aistudio.xiaomimimo.com/#/c** - **MiMo-V2.5-ASR is officially open-sourced.** The model's speech recognition performance in complex real-world scenarios such as Chinese-English bilingual, Chinese dialects, Code-Switch, strong noise, and multi-speaker has reached the industry-leading level, providing clear and reliable speech transcription for Agents and ensuring that every interaction is based on accurate understanding. ## MiMo-V2.5-TTS: Let Voice Become Everyone's Creativity ### Core Features of TTS Series #### Precise ability to follow style instructions From short single-sentence instructions to an entire director's notes, the model can consistently understand and follow them, covering multiple dimensions such as emotion, tone, speaking speed, vocalization style, and language style. Instructions do not need to be written as structured parameters - simply describe the desired feeling as if giving a briefing to an actor, and the model will translate it into the corresponding performance. For scenarios with higher consistency requirements - such as audio dramas, game NPCs, and character-based dialogues - the model also supports **director script-level** structured input: **characters** , **scenes** , **detailed instructions** are described in layers, with each layer independently updated at its own pace and freely combined. This layering not only ensures that the timbre identity of the character remains consistent throughout, but also allows the performance of each sentence to be individually controlled. **Case1** Instruct : 声音低沉沙哑一点,像个历经沧桑的老前辈在讲述传奇人物。语气里带点由衷的敬佩,娓娓道来。 Text: 街口那个老周啊,媳妇走得早,一个人拉扯俩娃,白天蹬三轮,晚上还去夜市摆摊修鞋。现在俩孩子都有出息喽,想接他去城里享福——他不去,就守着那间小铺子。哎,人哪,骨头硬,心里头就踏实。 Audio(Voice name:冰糖): **Case2** Instruct : ```bash CHARACTER 曾是守护九天的神祇,见证了凡人的无药可救后,决定以灭世来完成最终的净化。他的心中装满悲悯,但手段是绝对的屠戮。 SCENE 悬浮于崩塌的祭坛之上,俯视下方在火海中哀嚎、曾奉他为信仰的信徒。他在降下最后的毁灭前,发出神圣却残忍的叹息。 DIRECTION 发声机制与共鸣:充分打开胸腔共鸣,制造一种神圣的回音感。声音位置靠后,音色如古钟般低沉且带有金属质感的磁性。 声调与韵律:四声(去声)的下落要极其平缓,不要砸实,带有一种吟诵古籍般的从容与宏大。字句之间的停顿拉长,展现出视万物为刍狗的威压。 气声与实声的较量:在说前两句时,实声饱满,高高在上;但在说出“闭上眼吧”时,声音突然混入大量疲惫的气息,神性开始出现裂痕,流露出勉强的残忍。 咬字细节:古风词汇(如“垂怜”、“沉疴”、“剔骨刮毒”)咬字要深,声母起音圆润而不尖锐。结尾的最后半句,几乎全部转化为气声,像是在哄睡一个婴儿,将残酷包裹在极致的悲哀之中。 ``` Text: 你们求我垂怜,求我降下甘霖洗净这浊世。可这世间的沉疴,唯有烈火能剔骨刮毒。闭上眼吧。这业火烧起来的时候,一点也不疼。 Audio(Voice name:白桦): #### Flexible audio tag control capabilities In addition to paragraph-level natural language instructions, the model also supports inline audio tags, which are used to precisely control emotions, states, or styles at specific positions in the text. The tags support both Chinese and English languages and open text descriptions, allowing flexible mixing within the same paragraph of text. From simple emotional annotations to complex arrangements with multi-tag overlay and fine-grained layout, the model can express stably, demonstrating excellent performance in both the expressiveness of tags and the stability of combinations. Text: (调侃) 老张你当时不是说这条航线稳得很吗…… (模仿自信,提高音量) “系统全绿,放心走。” (突然停顿) ……现在呢? (爆发,愤怒压不住) 现在整艘船都在报警!你管这叫“放心”?! (声音变轻) 不过……你看那外面,裂开的星云像在呼吸一样。 (急促|呼喊) 别断通讯!喂!再撑十秒!十秒!! (低声|情绪塌陷般平静) ……算了。 (轻笑|带点释然) 也挺好,至少是一起看的。 Audio: #### Rich text comprehension ability Even without any prompt or label - just a plain text - the model can directly convey the rhythm and emotion within it. The pauses of punctuation and the undulations of sentence structure will be naturally presented; the emotional arcs hidden in the text, from calm narration to intense twists, can be actively captured by the model; even the speaker's identity (age, temperament, character type) revealed between the lines will automatically be reflected in the voice. In other words: the simplest plain text, when given to it, can still return a vivid and lifelike performance. Text: Ten... nine... eight... seven... six... five... four... three... TWO... ONE... ZERO! LAUNCH! LAUNCH! WE HAVE LIFTOFF! GO GO GO! SHE'S CLIMBING! ALTITUDE 1,000... 5,000... 10,000 FEET AND CLIMBING! BEAUTIFUL! AB-SO-LUTE-LY BEAUTIFUL! Audio: ### Model Series #### MiMo-V2.5-TTS It comes with multiple high-quality voices, covering a variety of usage scenarios. Each voice has been professionally tuned, with natural pronunciation and emotional resonance, allowing you to enjoy high-quality speech synthesis right out of the box. **Welcome everyone to visit Xiaomi MiMo Studio for voice previews:** https://aistudio.xiaomimimo.com/#/c 图片 #### MiMo-V2.5-TTS-VoiceDesign The timbre design is aimed at scenarios where " **I have a voice in my heart, but the world doesn't have one yet** ": game NPCs, animated characters, virtual LIVE creators, brand IPs, atypical voices of audio dramas - these are difficult to choose directly from the timbre library and are not suitable for human cloning. This model supports **generating a brand new timbre from scratch through natural language descriptions** , without the need for any reference audio. Users can freely use any descriptive dimensions such as age, gender, accent, timbre, vocalization style, personality, etc. - for example, "an elderly Eastern European scholar, with a deep, slightly hoarse voice and a slow speaking rhythm" or "a vibrant young girl, with a clear voice and a slight upward inflection at the end of sentences" - and the model can synthesize the corresponding character timbre. Thanks to large-scale pre-training, the model can also reasonably interpret **complex, ambiguous, or even contradictory** descriptions, rather than being limited to coarse-grained labels such as "male/female/young/old". This enables timbre design not only to generate unique voices that are difficult for real people to provide, but also to accurately reproduce the voice lines of a certain type of character. **Case1** Instruct : 一位中年男性,说标准普通话,嗓音低沉有磁性,带有轻微的沙哑质感,像纪录片旁白解说员,沉稳而有感染力。 Text: 当最后一缕阳光消失在地平线之下,这片沉睡了亿万年的大地开始显露它真正的面貌。在这寂静的荒野中,每一块岩石都记录着时间的流逝,每一阵风都在诉说着古老的故事。 Audio: **Case2** Instruct : 一位年迈的老先生,说带北方口音的普通话,语速缓慢而沉稳,嗓音略带沙哑和沧桑感,仿佛一位饱经风霜的老爷爷在讲故事,充满岁月的智慧。 Text: 我这辈子啊,走南闯北六十多年。见过最热闹的集市,也见过最安静的戈壁。到头来才明白一个道理——这人哪,不在走了多远的路,在于记住了多少风景。年轻人,别光顾着赶路,偶尔也停下来看看天。 Audio: #### MiMo-V2.5-TTS-VoiceClone Voice cloning is used to enable the model **to speak in the voice you specify**—replicating a real-life podcaster, voice actor, brand spokesperson, or the user themselves. **Simply provide a reference audio as short as a few seconds** , and without any additional training, annotation, or fine-tuning process, the model can directly reproduce the speaker's timbre and be immediately available. The reproduced voice not only retains the timbre identity of the original speaker but also preserves personal characteristics such as breath, rhythm, and habitual pauses. The cloned timbre can **reuse all the control capabilities of this series of models** — natural language instructions, audio tags, and director-level scripts can all continue to be used in combination. The reproduced voice not only "sounds like the original person" but can also perform according to the style and emotion you specify. Prompt: Instruct: 用尖锐刻薄的嗓音,带着狐假虎威的得意感说话,在提到大人物的身份时故意放慢语速并加重语气,营造压迫感。 Text: 你以为我是谁,也敢在这儿跟我耍横?我告诉你,站在我身后的那个人,说出来吓死你——是当今的——万岁爷!你今天要是不给我个说法,我让你这铺子明天就开不了门。 Audio: ## MiMo-V2.5-ASR: Understand every expression of yours, no matter how complex If TTS enables voice to become a creative tool at the "output" end, then ASR opens the door to all this at the "input" end. In real-world scenarios, being able to clearly and accurately understand speech amidst language switching, background noise, and speakers with strong dialect accents is what truly makes a good speech recognition system. MiMo-V2.5-ASR, as the auditory foundation of the whole-link speech model series, has achieved industry-leading levels in complex real-world scenarios such as Chinese-English bilingual, Chinese dialects, Code-Switch, strong noise, multi-speaker, and high knowledge density. It is not just about converting clear speech into text, but also enabling agents to capture every word and phrase worthy of understanding in noisy real-world sounds. ### Core Features Chinese dialects: Supports dialects such as Wu dialect, Cantonese, Minnan dialect, Sichuan dialect, etc. Complex English Scenarios: Achieved leading performance on the Open ASR Leaderboard in complex English scenarios such as AMI Code-Switch: Free and smooth speech transcription for Chinese-English Code-Switch, no need to pre-set language labels Song Recognition: Recognizes lyrics of Chinese and English songs, maintaining high accuracy in scenarios where accompaniment and vocals are mixed Strong noise scenario: Maintains robust recognition in complex acoustic environments such as high noise and far-field sound pickup Multi-speaker: Supports accurate transcription of multi-person cross-dialogue scenarios, such as meeting scenarios Strong Knowledge Association: Precise identification of knowledge-intensive content such as ancient poems, technical terms, personal names, and place names Native Punctuation: Outputs punctuation natively by combining speech prosody and semantics, with the transcription results ready for immediate use without post-processing ### Performance MiMo-V2.5-ASR has achieved the current optimal or highly competitive results across multiple dimensions, including general Chinese and English, Chinese dialects, Code-Switch, and lyrics recognition, demonstrating its stable advantages across scenarios and languages. The following are representative evaluation results: 图片 For Agent applications, content creation tools, conferencing systems, and voice interaction products, this is a truly verified auditory foundation in complex real-world speech. ## How to Use ### MiMo-V2.5-TTS Series To assist developers in exploring more scenarios,**MiMo-V2.5-TTS, MiMo-V2.5-TTS-VoiceDesign, and MiMo-V2.5-TTS-VoiceClone** are all available for free on the **Xiaomi MiMo API** Open Platform **for a limited time:** https://platform.xiaomimimo.com/docs/usage-guide/speech-synthesis-v2.5 Meanwhile, everyone is welcome to visit **Xiaomi MiMo Studio** for a quick experience:https://aistudio.xiaomimimo.com/#/c 图片 For more cases, please refer to [https://mimo.xiaomi.com/mimo-v2-5-](https://mimo.xiaomi.com/mimo-v2-5-tts)[tts](https://mimo.xiaomi.com/mimo-v2-5-tts) ### MiMo-V2.5-ASR **MiMo-V2.5-ASR has now open-sourced its model weights and code**, enabling developers and researchers to directly use or conduct secondary development. > Demo page: [https://mimo.xiaomi.com/mimo-v2-5-asr](https://mimo.xiaomi.com/mimo-v2-5-asr) > > Project Open Source Address: [https://github.com/XiaomiMiMo/MiMo-V2.5-ASR](https://github.com/XiaomiMiMo/MiMo-V2.5-ASR) > > Weight Open Source Address: https://huggingface.co/XiaomiMiMo/MiMo-V2.5-ASR > > Huggingface space: https://huggingface.co/spaces/XiaomiMiMo/MiMo-V2.5-ASR ## Agent Tool Call Support To facilitate everyone's quick integration of speech capabilities into Agent applications, we have fully open-sourced the access Skill for MiMo-V2.5-TTS related models. Welcome to visit the repository to pull and use: [https://github.com/XiaomiMiMo/MiMo-Skills](https://github.com/XiaomiMiMo/MiMo-Skills) ### Sound is just the starting point Beyond the MiMo-V2.5-TTS Series, we would like to answer a question: What will audio creation look like when MiMo-V2.5-TTS understands "expression", MiMo-V2.5-Pro understands "planning", and MiMo-V2.5 understands "listening"? **The answer is: a complete, closed-loop Agent-style creative chain.** - MiMo-V2.5-Pro —— Planning and screenwriting, breaking down tasks, writing scripts, arranging rhythm, and determining the editing sequence. - MiMo-V2.5-TTS Series —— Timbre and Creatives, Voice Design generates timbre, Voice Clone synthesizes content. - MiMo-V2.5 —— Listening back and evaluation, checking if the character is consistent, if the rhythm is correct, and if it deviates from the user's original intention. An example: > Create a scene of a summer afternoon lasting about 2 minutes. Grandpa (in his 70s, with a Beijing hutong accent, hoarse voice, drawn-out speech, lowered voice when concentrating on chess, and a booming laugh with a table slap) is playing chess under a pagoda tree. A 5-year-old grandson is squatting beside, watching ants, and occasionally interrupting with childish questions (clear, with rising intonation at the end, higher when excited, and occasional unclear pronunciation). Grandpa's tone is solemn when he gets serious, but immediately softens into a laughing scold when interrupted by his grandson. Users only provide a single sentence, and the finished product is generated automatically: