音频理解

音频理解模型可以根据您传入的音频进行回答，支持音频 URL 和 Base64 编码两种传入方式，适用于音频分析等场景。

快速开始

注意：获取 API Key 等准备工作，请参考首次调用API。

通过音频 URL 方式传入模型快速体验音频理解效果，示例代码如下。

Curl

curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "mimo-v2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    "max_completion_tokens": 1024
}'

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2.5",
    messages=[
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    max_completion_tokens=1024
)

print(completion.model_dump_json())

响应结果

{
    "id": "550a678a6c2046a29128883eaaf849e7",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "",
                "role": "assistant",
                "tool_calls": null,
                "reasoning_content": "Good morning. Could you tell me what the weather will be like today?"
            }
        }
    ],
    "created": 1776850627,
    "model": "mimo-v2.5",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 17,
        "prompt_tokens": 86,
        "total_tokens": 103,
        "completion_tokens_details": {
            "reasoning_tokens": 15
        },
        "prompt_tokens_details": {
            "audio_tokens": 25,
            "cached_tokens": 82
        }
    }
}

支持的模型列表

当前仅支持 mimo-v2.5，mimo-v2-omni 模型。

音频传入方式

支持的音频传入方式如下：

音频 URL 传入：需提供公网可访问的音频 URL 地址。
Base64 编码传入：将音频转换为 Base64 编码字符串后再传入。

音频 URL 传入

通过公网可访问的音频 URL 地址直接传入音频文件，适用于音频文件已存储在公网可访问环境的场景。单个音频文件大小不能超过 100 MB。

Curl

curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "mimo-v2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    "max_completion_tokens": 1024
}'

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2.5",
    messages=[
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://example-files.cnbj1.mi-fds.com/example-files/audio/audio_example.wav"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    max_completion_tokens=1024
)

print(completion.model_dump_json())

Base64 编码传入

将音频文件转换为 Base64 编码字符串后传入，适用于音频文件无法通过公网 URL 访问的场景。转换后的 Base64 编码的字符串大小不能超过 50 MB。

请在 Base64 编码前携带前缀：data:{MIME_TYPE};base64,$BASE64_AUDIO

{MIME_TYPE}：音频的 MIME 类型（媒体类型），用于标识音频格式，需替换为实际音频对应的 MIME 值。
$BASE64_AUDIO：音频文件的纯 Base64 编码字符串（不含任何前缀）。

Curl

curl --location --request POST 'https://api.xiaomimimo.com/v1/chat/completions' \
--header "api-key: $MIMO_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "model": "mimo-v2.5",
    "messages": [
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    "max_completion_tokens": 1024
}'

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2.5",
    messages=[
        {
            "role": "system",
            "content": "You are MiMo, an AI assistant developed by Xiaomi. Today is date: Tuesday, December 16, 2025. Your knowledge cutoff date is December 2024."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "data:{MIME_TYPE};base64,$BASE64_AUDIO"
                    }
                },
                {
                    "type": "text",
                    "text": "please describe the content of the audio"
                }
            ]
        }
    ],
    max_completion_tokens=1024
)

print(completion.model_dump_json())

音频限制

音频格式：MP3，WAV，FLAC，M4A，OGG。

音频文件格式变种较多，不能保证所有文件都能被识别，请通过测试验证文件能够被正常识别。

音频大小：
- 以 URL 方式传入时：单个音频文件大小不超过 100 MB。
- 以 Base64 编码传入时：单个音频的 Base64 编码字符串大小不超过 50 MB。
音频数量：传入多个音频时，音频数量受模型上下文长度限制，所有音频和文本的总 Token 数必须小于模型的上下文长度。

注：计算音频的 Token 请参考音频 Token 用量说明。模型上下文长度请参考定价与限速。

音频 Token 用量说明

音频的 Token 转化请参考以下代码。估算结果仅供参考，实际用量以 API 响应为准。

总 Tokens 数 ≈ 音频时长（单位：秒，例如：10.6 秒）* 6.25

计费说明

计费：总费用根据输入、输入（命中缓存）和输出 Token 数计算；价格请参考定价与限速。
- 可通过音频 Token 用量说明计算音频的 Token 消耗。估算结果仅供参考，实际用量以 API 响应为准。
查看账单：您可以在控制台的账单明细页面查看账单及用量。

常见问题

是否支持本地文件上传？

mimo-v2.5 和 mimo-v2-omni 模型暂不支持音频本地文件上传。支持的上传方式请参考音频传入方式。

更新时间 2026 年 04 月 22 日