AI语音后端API文档 V0.1

基础信息

Base URL: http://localhost:8000
API前缀: /api/audio
认证方式: 需要用户登录，使用用户配置的apikey调用百炼平台

统一响应格式

{
  "code": 200,
  "message": "success",
  "data": {}
}

字段	类型	说明
code	int	状态码：200成功，400参数错误，401未授权，403未配置apikey，404资源不存在，500服务器错误，502上游错误，504超时
message	string	响应消息
data	object	响应数据

功能概述

AI语音模块包含以下四大功能：

功能	说明	模型
语音合成 (TTS)	将文本转换为语音	cosyvoice-v3-flash, cosyvoice-v3-plus, cosyvoice-v2
语音识别 (ASR)	将语音转换为文字	qwen3-asr-flash, qwen-audio-asr
声音复刻	使用音频样本创建专属音色	voice-enrollment
音色管理	查询、更新、删除音色	-

接口列表

一、语音合成 (TTS)

1.1 语音合成接口

POST /api/audio/tts/synthesize

将文本转换为语音，支持流式和非流式两种模式。

请求头

参数	类型	必填	说明
Content-Type	string	是	application/json
Authorization	string	是	Bearer {token}

请求体

参数	类型	必填	默认值	说明
model	string	是	-	语音合成模型：`cosyvoice-v3-flash`、`cosyvoice-v3-plus`、`cosyvoice-v2`
voice	string	是	-	音色ID，系统音色或复刻音色
text	string	是	-	待合成文本，单次不超过2000字符
stream	boolean	否	false	是否使用流式输出
format	string	否	mp3	音频格式：`mp3`、`wav`、`pcm`、`opus`
sample_rate	int	否	22050	采样率：8000、16000、22050、24000、44100、48000
volume	int	否	50	音量，范围 [0, 100]
speech_rate	float	否	1.0	语速，范围 [0.5, 2.0]
pitch_rate	float	否	1.0	音高，范围 [0.5, 2.0]
instruction	string	否	-	指令设置（情感、场景等），仅部分音色支持

请求示例

{
  "model": "cosyvoice-v3-flash",
  "voice": "longanyang",
  "text": "你好，欢迎使用语音合成服务！",
  "stream": false,
  "format": "mp3",
  "volume": 50,
  "speech_rate": 1.0,
  "pitch_rate": 1.0
}

非流式响应示例 (stream=false)

{
  "code": 200,
  "message": "success",
  "data": {
    "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
    "duration": 3.5,
    "format": "mp3",
    "sample_rate": 22050,
    "characters": 15,
    "bill": "0.0021",
    "record_id": 1
  }
}

📦 存储说明：合成的音频文件自动上传至OSS，存储路径为 audio/tts/{日期}/{uuid}.{format}

流式响应示例 (stream=true)

流式响应返回二进制音频数据流，Content-Type 为 audio/mpeg 或对应格式。

错误响应示例

{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}

1.2 获取语音合成模型列表

GET /api/audio/tts/models

获取所有可用的语音合成模型。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

请求参数

无

响应示例

{
  "code": 200,
  "message": "success",
  "data": [
    {
      "model_id": "cosyvoice-v3-flash",
      "model_name": "CosyVoice V3 Flash",
      "description": "平衡效果与成本，性价比高",
      "price_per_10k_chars": "0.14335",
      "features": ["快速合成", "支持SSML", "支持Instruct"]
    },
    {
      "model_id": "cosyvoice-v3-plus",
      "model_name": "CosyVoice V3 Plus",
      "description": "最高质量，最佳表现力",
      "price_per_10k_chars": "0.286706",
      "features": ["高质量", "支持SSML", "支持Instruct"]
    },
    {
      "model_id": "cosyvoice-v2",
      "model_name": "CosyVoice V2",
      "description": "兼容旧版，稳定可靠",
      "price_per_10k_chars": "0.286706",
      "features": ["稳定", "支持SSML"]
    }
  ]
}

1.3 长文本语音合成

POST /api/audio/tts/synthesize-long

将长文本转换为语音，自动切割文本并合成，适用于超过2000字符的文本。

实现原理

将长文本按句子边界（。！？；等）智能切割为多个片段
每个片段不超过2000字符
依次调用语音合成接口合成每个片段
将所有音频片段合并为完整音频返回

请求头

参数	类型	必填	说明
Content-Type	string	是	application/json
Authorization	string	是	Bearer {token}

请求体

参数	类型	必填	默认值	说明
model	string	是	-	语音合成模型
voice	string	是	-	音色ID
text	string	是	-	待合成文本，支持超过2000字符
format	string	否	mp3	音频格式
volume	int	否	50	音量 [0, 100]
speech_rate	float	否	1.0	语速 [0.5, 2.0]
pitch_rate	float	否	1.0	音高 [0.5, 2.0]

请求示例

{
  "model": "cosyvoice-v3-flash",
  "voice": "longanyang",
  "text": "这是一段很长的文本...（超过2000字符的内容）",
  "format": "mp3",
  "volume": 50,
  "speech_rate": 1.0
}

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
    "duration": 120.5,
    "format": "mp3",
    "total_characters": 5000,
    "segments": 3,
    "bill": "0.0717",
    "record_id": 2
  }
}

📦 存储说明：合成的音频文件自动上传至OSS，存储路径为 audio/tts/{日期}/{uuid}.{format}

响应字段说明

字段	类型	说明
audio_url	string	OSS上的音频文件URL
duration	float	音频总时长（秒）
format	string	音频格式
total_characters	int	总字符数
segments	int	切割的片段数
bill	decimal	本次合成费用（元）
record_id	int	生成记录ID

1.4 双向流式语音合成（暂不实现）

WebSocket /api/audio/tts/stream

⚠️ 注意：此接口暂未实现，预留接口定义供后续开发。

支持分段发送文本，实时获取合成音频，适用于实时对话场景。

连接参数

参数	类型	必填	说明
model	string	是	语音合成模型
voice	string	是	音色ID
format	string	否	音频格式，默认 pcm
sample_rate	int	否	采样率，默认 22050

发送消息格式

{
  "type": "text",
  "content": "待合成的文本片段"
}

结束发送：

{
  "type": "end"
}

接收消息格式

二进制音频数据或JSON状态消息：

{
  "type": "complete",
  "characters": 100,
  "duration": 5.2
}

二、语音识别 (ASR)

2.0 获取语音识别模型列表

GET /api/audio/asr/models

获取所有可用的语音识别模型。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

请求参数

无

响应示例

{
  "code": 200,
  "message": "success",
  "data": [
    {
      "model_id": "qwen3-asr-flash",
      "model_name": "通义千问3-ASR-Flash",
      "description": "快速识别，支持上下文增强",
      "call_type": "sync",
      "features": ["上下文增强", "情感识别", "多语种"]
    },
    {
      "model_id": "qwen-audio-asr",
      "model_name": "通义千问Audio ASR",
      "description": "通用语音识别",
      "call_type": "sync",
      "features": ["通用识别", "多语种"]
    },
    {
      "model_id": "qwen3-asr-flash-filetrans",
      "model_name": "通义千问3-ASR-Flash-Filetrans",
      "description": "长音频转写，支持多音轨",
      "call_type": "async",
      "features": ["长音频", "多音轨", "时间戳"]
    }
  ]
}

2.1 同步语音识别

POST /api/audio/asr/recognize

将音频文件转换为文字，适用于短音频（60秒以内）。

请求头

参数	类型	必填	说明
Content-Type	string	是	multipart/form-data 或 application/json
Authorization	string	是	Bearer {token}

请求体 (JSON方式)

参数	类型	必填	默认值	说明
model	string	是	-	识别模型：`qwen3-asr-flash`、`qwen-audio-asr`
audio_url	string	是*	-	音频文件URL（与audio_base64二选一）
audio_base64	string	是*	-	Base64编码的音频数据（与audio_url二选一）
language	string	否	-	指定语种：zh、en、ja、ko等，不指定则自动检测
enable_itn	boolean	否	false	是否启用逆文本标准化（仅中英文）
context	string	否	-	上下文提示，提升特定场景识别准确率，不超过10000 Token

请求体 (FormData方式)

参数	类型	必填	说明
model	string	是	识别模型
file	File	是	音频文件
language	string	否	指定语种
enable_itn	boolean	否	是否启用ITN

请求示例 (JSON方式)

{
  "model": "qwen3-asr-flash",
  "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/input.mp3",
  "language": "zh",
  "enable_itn": true
}

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "text": "欢迎使用语音识别服务。",
    "language": "zh",
    "emotion": "neutral",
    "duration": 3,
    "usage": {
      "input_tokens": 0,
      "output_tokens": 8,
      "seconds": 3
    },
    "bill": "0.0012",
    "record_id": 3
  }
}

错误响应示例

{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}

2.2 异步语音识别（长音频）

POST /api/audio/asr/transcribe

提交长音频转写任务，适用于超过60秒的音频文件。

请求头

参数	类型	必填	说明
Content-Type	string	是	application/json
Authorization	string	是	Bearer {token}

请求体

参数	类型	必填	默认值	说明
model	string	是	-	识别模型：`qwen3-asr-flash-filetrans`
file_url	string	是	-	音频文件URL，必须公网可访问
language	string	否	-	指定语种
enable_itn	boolean	否	false	是否启用ITN
context	string	否	-	上下文提示
channel_id	array	否	[0]	多音轨文件的音轨索引

请求示例

{
  "model": "qwen3-asr-flash-filetrans",
  "file_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/long-audio.mp3",
  "language": "zh",
  "enable_itn": true
}

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "PENDING",
    "record_id": 4
  }
}

2.3 查询转写任务状态

GET /api/audio/asr/task/{task_id}

查询异步转写任务的执行状态和结果。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

路径参数

参数	类型	必填	说明
task_id	string	是	任务ID

响应示例（进行中）

{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "RUNNING",
    "submit_time": "2025-01-01 10:00:00",
    "scheduled_time": "2025-01-01 10:00:01"
  }
}

响应示例（完成）

{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "SUCCEEDED",
    "submit_time": "2025-01-01 10:00:00",
    "scheduled_time": "2025-01-01 10:00:01",
    "end_time": "2025-01-01 10:00:05",
    "result": {
      "transcription_url": "https://xxx/result.json",
      "transcripts": [
        {
          "channel_id": 0,
          "text": "今天天气还行吧。",
          "sentences": [
            {
              "begin_time": 100,
              "end_time": 3820,
              "text": "今天天气还行吧。",
              "sentence_id": 0,
              "language": "zh",
              "emotion": "neutral"
            }
          ]
        }
      ]
    },
    "usage": {
      "seconds": 4
    },
    "bill": "0.0016"
  }
}

三、声音复刻

3.1 创建复刻音色

POST /api/audio/voice/create

上传音频样本创建专属复刻音色。

请求头

参数	类型	必填	说明
Content-Type	string	是	multipart/form-data 或 application/json
Authorization	string	是	Bearer {token}

请求体 (JSON方式)

参数	类型	必填	默认值	说明
target_model	string	是	-	驱动音色的语音合成模型：`cosyvoice-v3-plus`、`cosyvoice-v3-flash`、`cosyvoice-v2`
prefix	string	是	-	音色名称前缀，仅允许数字、字母和下划线，不超过10字符
audio_url	string	是*	-	音频文件URL（与file二选一）
language_hints	array	否	-	语言提示：en、fr、de、ja、ko、ru

请求体 (FormData方式)

参数	类型	必填	说明
target_model	string	是	驱动音色的语音合成模型
prefix	string	是	音色名称前缀
file	File	是	音频文件（10-60秒，≤10MB）
language_hints	string	否	语言提示，逗号分隔

音频要求

项目	要求
支持格式	WAV (16bit), MP3, M4A
音频时长	推荐 10~20 秒，最长 60 秒
文件大小	≤ 10 MB
采样率	≥ 16 kHz
声道	单声道 / 双声道
内容要求	至少5秒连续清晰朗读，无背景音乐/噪音/其他人声

请求示例 (JSON方式)

{
  "target_model": "cosyvoice-v3-plus",
  "prefix": "myvoice",
  "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/sample.mp3"
}

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "DEPLOYING",
    "target_model": "cosyvoice-v3-plus",
    "record_id": 5
  }
}

错误响应示例

{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}

3.2 查询音色列表

GET /api/audio/voice/list

分页查询已创建的复刻音色列表，支持按模型筛选。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

请求参数

参数	类型	必填	默认值	说明
prefix	string	否	-	按前缀筛选
page	int	否	0	页码，从0开始
page_size	int	否	10	每页数量
model	string	否	-	按目标模型筛选：`cosyvoice-v3-flash`、`cosyvoice-v3-plus`、`cosyvoice-v2`

请求示例

查询所有音色

GET /api/audio/voice/list?page=0&page_size=10

按模型筛选

GET /api/audio/voice/list?model=cosyvoice-v3-flash&page=0&page_size=10

按前缀筛选

GET /api/audio/voice/list?prefix=myvoice&page=0&page_size=10

组合筛选（按模型和前缀）

GET /api/audio/voice/list?model=cosyvoice-v3-plus&prefix=test&page=0&page_size=10

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "total": 2,
    "voices": [
      {
        "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
        "status": "OK",
        "target_model": "cosyvoice-v3-plus",
        "gmt_create": "2025-01-01 10:00:00",
        "gmt_modified": "2025-01-01 10:00:05"
      },
      {
        "voice_id": "cosyvoice-v3-flash-test-yyyyyyyy",
        "status": "DEPLOYING",
        "target_model": "cosyvoice-v3-flash",
        "gmt_create": "2025-01-01 11:00:00",
        "gmt_modified": "2025-01-01 11:00:00"
      }
    ]
  }
}

响应参数说明

字段	类型	说明
total	int	符合条件的音色总数
voices	array	音色列表
voices[].voice_id	string	音色ID
voices[].status	string	音色状态：`DEPLOYING`（审核中）、`OK`（可用）、`UNDEPLOYED`（未通过）
voices[].target_model	string	目标模型（创建音色时指定的模型）
voices[].gmt_create	string	创建时间
voices[].gmt_modified	string	修改时间

使用说明

按模型筛选：传入 model 参数可以只获取指定模型创建的复刻音色，便于在切换模型时快速加载对应的音色列表
分页查询：使用 page 和 page_size 参数进行分页，默认每页10条
组合筛选：可以同时使用 prefix 和 model 参数进行组合筛选

3.3 查询指定音色

GET /api/audio/voice/{voice_id}

获取指定音色的详细信息。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

路径参数

参数	类型	必填	说明
voice_id	string	是	音色ID

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "OK",
    "target_model": "cosyvoice-v3-plus",
    "resource_link": "https://xxx/audio.wav",
    "gmt_create": "2025-01-01 10:00:00",
    "gmt_modified": "2025-01-01 10:00:05"
  }
}

3.4 更新音色

PUT /api/audio/voice/{voice_id}

使用新的音频文件更新已存在的音色。

请求头

参数	类型	必填	说明
Content-Type	string	是	multipart/form-data 或 application/json
Authorization	string	是	Bearer {token}

路径参数

参数	类型	必填	说明
voice_id	string	是	音色ID

请求体

参数	类型	必填	说明
audio_url	string	是*	新的音频文件URL（与file二选一）
file	File	是*	新的音频文件（与audio_url二选一）

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "DEPLOYING"
  }
}

3.5 删除音色

DELETE /api/audio/voice/{voice_id}

删除指定的复刻音色，此操作不可逆。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

路径参数

参数	类型	必填	说明
voice_id	string	是	音色ID

响应示例

{
  "code": 200,
  "message": "success",
  "data": null
}

四、系统音色列表

4.1 获取系统音色列表

GET /api/audio/voice/system

获取所有可用的系统预置音色。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

请求参数

参数	类型	必填	默认值	说明
model	string	否	-	按模型筛选：cosyvoice-v3-flash、cosyvoice-v3-plus
category	string	否	-	按场景筛选：社交陪伴、童声、客服、语音助手、有声书等

响应示例

{
  "code": 200,
  "message": "success",
  "data": [
    {
      "voice_id": "longanyang",
      "name": "龙安洋",
      "trait": "阳光大男孩",
      "age": "20~30岁",
      "category": "社交陪伴",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": true,
        "timestamp": false
      }
    },
    {
      "voice_id": "longanhuan",
      "name": "龙安欢",
      "trait": "欢脱元气女",
      "age": "20~30岁",
      "category": "社交陪伴",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": true,
        "timestamp": false
      }
    },
    {
      "voice_id": "longyingjing_v3",
      "name": "龙应静",
      "trait": "低调冷静女",
      "age": "20~30岁",
      "category": "客服",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": false,
        "timestamp": true
      }
    }
  ]
}

五、历史记录

5.1 获取历史记录

GET /api/audio/history

获取当前用户的语音操作历史记录，支持分页和类型筛选。

请求头

参数	类型	必填	说明
Authorization	string	是	Bearer {token}

请求参数

参数	类型	必填	默认值	说明
page	int	否	1	页码
page_size	int	否	20	每页数量
type	string	否	-	操作类型筛选：tts（语音合成）、asr（语音识别）、voice（声音复刻）

请求示例

获取所有历史记录

GET /api/audio/history?page=1&page_size=20

按类型筛选

GET /api/audio/history?type=tts&page=1&page_size=10

响应示例

{
  "code": 200,
  "message": "success",
  "data": {
    "items": [
      {
        "id": 1,
        "type": "tts",
        "model_name": "cosyvoice-v3-flash",
        "input_data": "{\"text\": \"你好，欢迎使用语音合成服务！\", \"voice\": \"longanyang\"}",
        "output_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
        "duration": 3.5,
        "bill": "0.0021",
        "created_at": "2025-12-30T10:30:00"
      },
      {
        "id": 2,
        "type": "asr",
        "model_name": "qwen3-asr-flash",
        "input_data": "{\"audio_url\": \"https://xxx/input.mp3\"}",
        "output_text": "欢迎使用语音识别服务。",
        "duration": 3,
        "bill": "0.0012",
        "created_at": "2025-12-30T10:25:00"
      },
      {
        "id": 3,
        "type": "voice",
        "model_name": "cosyvoice-v3-plus",
        "input_data": "{\"prefix\": \"myvoice\"}",
        "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
        "status": "OK",
        "bill": "0",
        "created_at": "2025-12-30T10:20:00"
      }
    ],
    "total": 3,
    "page": 1,
    "page_size": 20
  }
}

数据字段说明

TTSResponse 字段

字段	类型	说明
audio_url	string	OSS音频文件URL
duration	float	音频时长（秒）
format	string	音频格式
sample_rate	int	采样率
characters	int	合成字符数
bill	decimal	本次费用（元）
record_id	int	生成记录ID

ASRResponse 字段

字段	类型	说明
text	string	识别结果文本
language	string	检测到的语言
emotion	string	情感类型
duration	int	音频时长（秒）
usage	object	用量信息
bill	decimal	本次费用（元）
record_id	int	生成记录ID

AudioHistoryItem 字段

字段	类型	说明
id	int	记录ID
type	string	操作类型：tts、asr、voice
model_name	string	使用的模型名称
input_data	string	输入数据JSON
output_url	string	输出音频URL（TTS）
output_text	string	输出文本（ASR）
voice_id	string	音色ID（声音复刻）
duration	float	时长（秒）
bill	decimal	费用
created_at	datetime	创建时间

TTSModelInfo 字段

字段	类型	说明
model_id	string	模型ID，用于API调用
model_name	string	模型显示名称
description	string	模型描述
price_per_10k_chars	decimal	每万字符价格（元）
features	array	支持的特性列表

ASRModelInfo 字段

字段	类型	说明
model_id	string	模型ID，用于API调用
model_name	string	模型显示名称
description	string	模型描述
call_type	string	调用方式：sync（同步）、async（异步）
features	array	支持的特性列表

音色状态 (VoiceStatus)

状态	说明
DEPLOYING	审核中，创建后需等待
OK	审核通过，可正常使用
UNDEPLOYED	审核不通过，不可使用

任务状态 (TaskStatus)

状态	说明
PENDING	排队中
RUNNING	处理中
SUCCEEDED	成功完成
FAILED	任务失败
UNKNOWN	不存在或未知

情感类型 (Emotion)

值	说明
neutral	平静
happy	愉快
sad	悲伤
angry	愤怒
fearful	恐惧
surprised	惊讶
disgusted	厌恶

支持的语言 (Language)

代码	语言
zh	中文（普通话）
yue	粤语
en	英文
ja	日语
ko	韩语
de	德语
fr	法语
ru	俄语
es	西班牙语
it	意大利语
pt	葡萄牙语
ar	阿拉伯语
th	泰语
vi	越南语

模型说明

语音合成模型

模型	特点	价格
cosyvoice-v3-plus	最高质量，最佳表现力	¥0.286706/万字符
cosyvoice-v3-flash	平衡效果与成本，性价比高	¥0.14335/万字符
cosyvoice-v2	兼容旧版，稳定可靠	¥0.286706/万字符

语音识别模型

模型	调用方式	特点
qwen3-asr-flash	同步	快速识别，支持上下文增强
qwen-audio-asr	同步	通用识别
qwen3-asr-flash-filetrans	异步	长音频转写，支持多音轨

错误码说明

错误码	HTTP状态码	说明
NO_API_KEY	403	用户未配置API密钥
INVALID_MODEL	400	无效的模型名称
INVALID_VOICE	400	无效的音色ID或音色不可用
INVALID_AUDIO	400	音频格式不支持或质量不符合要求
TEXT_TOO_LONG	400	文本长度超过限制
AUDIO_TOO_LONG	400	音频时长超过限制
VOICE_LIMIT_EXCEEDED	400	音色数量达到上限（1000个/账号）
MODEL_VOICE_MISMATCH	400	模型与音色不匹配
UNAUTHORIZED	401	API Key缺失或无效
TASK_NOT_FOUND	404	任务不存在
VOICE_NOT_FOUND	404	音色不存在
RATE_LIMITED	429	请求频率超限
UPSTREAM_ERROR	502	百炼平台返回错误
TIMEOUT	504	请求超时

前端对接示例

获取语音合成模型列表

interface TTSModel {
  model_id: string;
  model_name: string;
  description: string;
  price_per_10k_chars: string;
  features: string[];
}

const getTTSModels = async (token: string): Promise<TTSModel[]> => {
  const response = await fetch('/api/audio/tts/models', {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

获取语音识别模型列表

interface ASRModel {
  model_id: string;
  model_name: string;
  description: string;
  call_type: 'sync' | 'async';
  features: string[];
}

const getASRModels = async (token: string): Promise<ASRModel[]> => {
  const response = await fetch('/api/audio/asr/models', {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

语音合成（非流式）

interface TTSRequest {
  model: string;
  voice: string;
  text: string;
  stream?: boolean;
  format?: string;
  volume?: number;
  speech_rate?: number;
  pitch_rate?: number;
}

interface TTSResponse {
  audio_url: string;
  duration: number;
  format: string;
  characters: number;
  bill: string;
  record_id: number;
}

const synthesize = async (token: string, request: TTSRequest): Promise<TTSResponse> => {
  const response = await fetch('/api/audio/tts/synthesize', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify(request)
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const result = await synthesize(token, {
  model: 'cosyvoice-v3-flash',
  voice: 'longanyang',
  text: '你好，欢迎使用语音合成服务！'
});

// 播放音频（直接使用OSS URL）
const audio = new Audio(result.audio_url);
audio.play();
console.log('费用:', result.bill);

长文本语音合成

interface LongTTSResponse {
  audio_url: string;
  duration: number;
  format: string;
  total_characters: number;
  segments: number;
  bill: string;
  record_id: number;
}

const synthesizeLongText = async (token: string, request: TTSRequest): Promise<LongTTSResponse> => {
  const response = await fetch('/api/audio/tts/synthesize-long', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify(request)
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例 - 合成长文本
const longText = `这是一段很长的文本...（超过2000字符）`;
const result = await synthesizeLongText(token, {
  model: 'cosyvoice-v3-flash',
  voice: 'longanyang',
  text: longText
});
console.log(`合成完成，共${result.segments}个片段，总时长${result.duration}秒，费用${result.bill}元`);

语音合成（流式）

const synthesizeStream = async (
  token: string,
  request: TTSRequest,
  onAudioChunk: (chunk: ArrayBuffer) => void
): Promise<void> => {
  const response = await fetch('/api/audio/tts/synthesize', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify({ ...request, stream: true })
  });

  const reader = response.body?.getReader();
  if (!reader) throw new Error('No response body');

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    onAudioChunk(value.buffer);
  }
};

语音识别（文件上传）

interface ASRResponse {
  text: string;
  language: string;
  emotion: string;
  duration: number;
  bill: string;
  record_id: number;
}

const recognizeAudio = async (token: string, file: File, model: string): Promise<ASRResponse> => {
  const formData = new FormData();
  formData.append('file', file);
  formData.append('model', model);

  const response = await fetch('/api/audio/asr/recognize', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: formData
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const fileInput = document.querySelector<HTMLInputElement>('#audioFile');
const file = fileInput?.files?.[0];
if (file) {
  const result = await recognizeAudio(token, file, 'qwen3-asr-flash');
  console.log('识别结果:', result.text);
  console.log('费用:', result.bill);
}

创建复刻音色

interface CreateVoiceRequest {
  target_model: string;
  prefix: string;
  audio_url?: string;
}

interface VoiceInfo {
  voice_id: string;
  status: 'DEPLOYING' | 'OK' | 'UNDEPLOYED';
  target_model: string;
  record_id: number;
}

const createVoice = async (token: string, file: File, prefix: string, targetModel: string): Promise<VoiceInfo> => {
  const formData = new FormData();
  formData.append('file', file);
  formData.append('prefix', prefix);
  formData.append('target_model', targetModel);

  const response = await fetch('/api/audio/voice/create', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: formData
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 轮询查询音色状态
const waitForVoiceReady = async (token: string, voiceId: string, maxAttempts = 30): Promise<VoiceInfo> => {
  for (let i = 0; i < maxAttempts; i++) {
    const response = await fetch(`/api/audio/voice/${voiceId}`, {
      headers: { 'Authorization': `Bearer ${token}` }
    });
    const data = await response.json();
    
    if (data.data.status === 'OK') return data.data;
    if (data.data.status === 'UNDEPLOYED') throw new Error('音色审核未通过');
    
    await new Promise(resolve => setTimeout(resolve, 10000)); // 等待10秒
  }
  throw new Error('等待超时');
};

获取系统音色列表

interface SystemVoice {
  voice_id: string;
  name: string;
  trait: string;
  age: string;
  category: string;
  languages: string[];
  models: string[];
  features: {
    ssml: boolean;
    instruct: boolean;
    timestamp: boolean;
  };
}

const getSystemVoices = async (token: string, model?: string, category?: string): Promise<SystemVoice[]> => {
  const params = new URLSearchParams();
  if (model) params.append('model', model);
  if (category) params.append('category', category);

  const response = await fetch(`/api/audio/voice/system?${params}`, {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

获取历史记录

interface AudioHistoryItem {
  id: number;
  type: 'tts' | 'asr' | 'voice';
  model_name: string;
  input_data: string;
  output_url?: string;
  output_text?: string;
  voice_id?: string;
  duration?: number;
  bill: string;
  created_at: string;
}

interface HistoryResponse {
  items: AudioHistoryItem[];
  total: number;
  page: number;
  page_size: number;
}

const getAudioHistory = async (
  token: string,
  page: number = 1,
  pageSize: number = 20,
  type?: string
): Promise<HistoryResponse> => {
  const params = new URLSearchParams();
  params.append('page', page.toString());
  params.append('page_size', pageSize.toString());
  if (type) params.append('type', type);

  const response = await fetch(`/api/audio/history?${params}`, {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const history = await getAudioHistory(token, 1, 10, 'tts');
console.log('历史记录:', history.items);
console.log('总数:', history.total);

注意事项

认证要求：所有接口都需要在请求头中携带 Authorization: Bearer {token}
模型与音色匹配：使用复刻音色进行语音合成时，model 参数必须与创建音色时的 target_model 一致
音色配额：每个账号最多创建 1000 个复刻音色，一年内未使用的音色会被自动清理
文本长度限制：单次合成不超过 2000 字符，长文本请使用 /api/audio/tts/synthesize-long 接口
音频质量：复刻音色的效果取决于输入音频质量，请确保音频清晰、无噪音
异步任务有效期：异步转写任务的结果URL有效期为 24 小时
OSS存储：
- 语音合成的音频文件存储在OSS，路径格式：audio/tts/{日期}/{uuid}.{format}
- 声音复刻的音频样本存储在OSS，路径格式：audio/voice/{日期}/{uuid}.{format}
- 返回的 audio_url 为OSS公开访问URL，可直接用于播放或下载
计费说明：每次API调用会返回 bill 字段表示本次费用，record_id 用于追踪历史记录

在线文档

启动服务后访问：

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

MD005_AI语音后端API_V0.1.md 36 KB Vēsture Neapstrādāts

AI语音后端API文档 V0.1

基础信息

统一响应格式

功能概述

接口列表

一、语音合成 (TTS)

1.1 语音合成接口

请求头

请求体

请求示例

非流式响应示例 (stream=false)

流式响应示例 (stream=true)

错误响应示例

1.2 获取语音合成模型列表

请求头

请求参数

响应示例

1.3 长文本语音合成

实现原理

请求头

请求体

请求示例

响应示例

响应字段说明

1.4 双向流式语音合成（暂不实现）

连接参数

发送消息格式

接收消息格式

二、语音识别 (ASR)

2.0 获取语音识别模型列表

请求头

请求参数

响应示例

2.1 同步语音识别

请求头

请求体 (JSON方式)

请求体 (FormData方式)

请求示例 (JSON方式)

响应示例

错误响应示例

2.2 异步语音识别（长音频）

请求头

请求体

请求示例

响应示例

2.3 查询转写任务状态

请求头

路径参数

响应示例（进行中）

响应示例（完成）

三、声音复刻

3.1 创建复刻音色

请求头

请求体 (JSON方式)

请求体 (FormData方式)

音频要求

请求示例 (JSON方式)

响应示例

错误响应示例

3.2 查询音色列表

请求头

请求参数

请求示例

响应示例

响应参数说明

使用说明

3.3 查询指定音色

请求头

路径参数

响应示例

3.4 更新音色

请求头

路径参数

请求体

响应示例

3.5 删除音色

请求头

路径参数

响应示例

MD005_AI语音后端API_V0.1.md 36 KB

Vēsture Neapstrādāts