Speech Recognition (MiMo-V2.5-ASR)
Speech recognition converts input audio into text output, suitable for meeting transcription, lyrics recognition, dialect transcription, noisy environment recordings, and more. You can improve recognition accuracy by specifying language parameters.
Core Capabilities
-
Broad Language and Dialect Coverage: Supports bilingual Chinese-English recognition with automatic language detection. Natively recognizes Cantonese, Wu, Minnan, Sichuan, and other Chinese dialects.
-
Robust in Complex Scenarios: Maintains stable recognition in noisy environments, far-field pickup, and multi-speaker overlapping conversations. Also supports lyrics transcription with background music.
-
Precise Handling of Specialized Content: Accurately recognizes knowledge-intensive content such as classical poetry, technical terminology, proper nouns, and place names. Automatically generates punctuation without post-processing.
Supported Models
Currently, only the mimo-v2.5-asr model is supported.
Prerequisites
For API Key setup and other prerequisites, please refer to First API Call.
Supported Audio Formats
Currently, only wav and mp3 audio sample files are supported. Before passing audio to the API, convert the file to a Base64 encoded string. The encoded string size must not exceed 10 MB.
The audio must be passed in data URL format: data:{MIME_TYPE};base64,$BASE64_AUDIO
Supported formats and their MIME types:
| Format | MIME Type |
|---|---|
| wav | audio/wav |
| mp3 | audio/mpeg or audio/mp3 |
Code Sample
Notes
- Audio data must be passed via the
input_audio.datafield in data URL format. - Use
asr_options.languageto specify the language. Auto-detection is applied if this parameter is not configured. Explicitly set the language when it is known to improve recognition accuracy. Supported values:auto,zh,en.
Non-streaming Call
Curl
curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "mimo-v2.5-asr",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
}
}
]
}
],
"asr_options": {
"language": "en"
}
}'
Python
import os
import base64
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
# Replace with the actual local file path
with open("audio_file.wav", "rb") as f:
audio_bytes = f.read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
completion = client.chat.completions.create(
model="mimo-v2.5-asr",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:audio/wav;base64,{audio_base64}"
}
}
]
}
],
extra_body={
"asr_options": {
"language": "en"
}
}
)
print(completion.model_dump_json())
Streaming Call
Curl
curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "mimo-v2.5-asr",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
}
}
]
}
],
"asr_options": {
"language": "auto"
},
"stream": true
}'
Python
import os
import base64
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MIMO_API_KEY"),
base_url="https://api.xiaomimimo.com/v1"
)
# Replace with the actual local file path
with open("audio_file.wav", "rb") as f:
audio_bytes = f.read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
completion = client.chat.completions.create(
model="mimo-v2.5-asr",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:audio/wav;base64,{audio_base64}"
}
}
]
}
],
extra_body={
"asr_options": {
"language": "auto"
}
},
stream=True
)
for chunk in completion:
print(chunk.model_dump_json())
Price
-
Billing: Please refer to Pay‑As‑You‑Go API.
-
View Bill: You can view your usage on the Billing page in the Console.