Speech Recognition (MiMo-V2.5-ASR)

Speech recognition converts input audio into text output, suitable for meeting transcription, lyrics recognition, dialect transcription, noisy environment recordings, and more. You can improve recognition accuracy by specifying language parameters.

Core Capabilities

  • Broad Language and Dialect Coverage: Supports bilingual Chinese-English recognition with automatic language detection. Natively recognizes Cantonese, Wu, Minnan, Sichuan, and other Chinese dialects.

  • Robust in Complex Scenarios: Maintains stable recognition in noisy environments, far-field pickup, and multi-speaker overlapping conversations. Also supports lyrics transcription with background music.

  • Precise Handling of Specialized Content: Accurately recognizes knowledge-intensive content such as classical poetry, technical terminology, proper nouns, and place names. Automatically generates punctuation without post-processing.

Supported Models

Currently, only the mimo-v2.5-asr model is supported.

Prerequisites

For API Key setup and other prerequisites, please refer to First API Call.

Supported Audio Formats

Currently, only wav and mp3 audio sample files are supported. Before passing audio to the API, convert the file to a Base64 encoded string. The encoded string size must not exceed 10 MB.

The audio must be passed in data URL format: data:{MIME_TYPE};base64,$BASE64_AUDIO

Supported formats and their MIME types:

Format MIME Type
wav audio/wav
mp3 audio/mpeg or audio/mp3

Code Sample

Notes

  • Audio data must be passed via the input_audio.data field in data URL format.
  • Use asr_options.language to specify the language. Auto-detection is applied if this parameter is not configured. Explicitly set the language when it is known to improve recognition accuracy. Supported values: auto, zh, en.

Non-streaming Call

Curl

curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "mimo-v2.5-asr",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
                    }
                }
            ]
        }
    ],
    "asr_options": {
        "language": "en"
    }
}'

Python

import os
import base64
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

# Replace with the actual local file path
with open("audio_file.wav", "rb") as f:
    audio_bytes = f.read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")

completion = client.chat.completions.create(
    model="mimo-v2.5-asr",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:audio/wav;base64,{audio_base64}"
                    }
                }
            ]
        }
    ],
    extra_body={
        "asr_options": {
            "language": "en"
        }
    }
)

print(completion.model_dump_json())

Streaming Call

Curl

curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "mimo-v2.5-asr",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
                    }
                }
            ]
        }
    ],
    "asr_options": {
        "language": "auto"
    },
    "stream": true
}'

Python

import os
import base64
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

# Replace with the actual local file path
with open("audio_file.wav", "rb") as f:
    audio_bytes = f.read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")

completion = client.chat.completions.create(
    model="mimo-v2.5-asr",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:audio/wav;base64,{audio_base64}"
                    }
                }
            ]
        }
    ],
    extra_body={
        "asr_options": {
            "language": "auto"
        }
    },
    stream=True
)

for chunk in completion:
    print(chunk.model_dump_json())

Price

Update Time June 02, 2026
We use cookies and similar technologies of our own to ensure the proper functioning of the website, customize content according to user preferences and analyze users' interactions on the website, as well as their browsing habits. You can find more information in our Cookie Policy. Select an option or go to Cookie Settings to manage your preferences. Learn More.