# AI语音后端API文档 V0.1

## 基础信息

- **Base URL**: `http://localhost:8000`
- **API前缀**: `/api/audio`
- **认证方式**: 需要用户登录，使用用户配置的apikey调用百炼平台

## 统一响应格式

```json
{
  "code": 200,
  "message": "success",
  "data": {}
}
```

| 字段 | 类型 | 说明 |
|------|------|------|
| code | int | 状态码：200成功，400参数错误，401未授权，403未配置apikey，404资源不存在，500服务器错误，502上游错误，504超时 |
| message | string | 响应消息 |
| data | object | 响应数据 |

---

## 功能概述

AI语音模块包含以下四大功能：

| 功能 | 说明 | 模型 |
|------|------|------|
| 语音合成 (TTS) | 将文本转换为语音 | cosyvoice-v3-flash, cosyvoice-v3-plus, cosyvoice-v2 |
| 语音识别 (ASR) | 将语音转换为文字 | qwen3-asr-flash, qwen-audio-asr |
| 声音复刻 | 使用音频样本创建专属音色 | voice-enrollment |
| 音色管理 | 查询、更新、删除音色 | - |

---

## 接口列表

## 一、语音合成 (TTS)

### 1.1 语音合成接口

**POST** `/api/audio/tts/synthesize`

将文本转换为语音，支持流式和非流式两种模式。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | application/json |
| Authorization | string | 是 | Bearer {token} |

#### 请求体

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| model | string | 是 | - | 语音合成模型：`cosyvoice-v3-flash`、`cosyvoice-v3-plus`、`cosyvoice-v2` |
| voice | string | 是 | - | 音色ID，系统音色或复刻音色 |
| text | string | 是 | - | 待合成文本，单次不超过2000字符 |
| stream | boolean | 否 | false | 是否使用流式输出 |
| format | string | 否 | mp3 | 音频格式：`mp3`、`wav`、`pcm`、`opus` |
| sample_rate | int | 否 | 22050 | 采样率：8000、16000、22050、24000、44100、48000 |
| volume | int | 否 | 50 | 音量，范围 [0, 100] |
| speech_rate | float | 否 | 1.0 | 语速，范围 [0.5, 2.0] |
| pitch_rate | float | 否 | 1.0 | 音高，范围 [0.5, 2.0] |
| instruction | string | 否 | - | 指令设置（情感、场景等），仅部分音色支持 |

#### 请求示例

```json
{
  "model": "cosyvoice-v3-flash",
  "voice": "longanyang",
  "text": "你好，欢迎使用语音合成服务！",
  "stream": false,
  "format": "mp3",
  "volume": 50,
  "speech_rate": 1.0,
  "pitch_rate": 1.0
}
```

#### 非流式响应示例 (stream=false)

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
    "duration": 3.5,
    "format": "mp3",
    "sample_rate": 22050,
    "characters": 15,
    "bill": "0.0021",
    "record_id": 1
  }
}
```

> 📦 **存储说明**：合成的音频文件自动上传至OSS，存储路径为 `audio/tts/{日期}/{uuid}.{format}`

#### 流式响应示例 (stream=true)

流式响应返回二进制音频数据流，Content-Type 为 `audio/mpeg` 或对应格式。

#### 错误响应示例

```json
{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}
```

---

### 1.2 获取语音合成模型列表

**GET** `/api/audio/tts/models`

获取所有可用的语音合成模型。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 请求参数

无

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": [
    {
      "model_id": "cosyvoice-v3-flash",
      "model_name": "CosyVoice V3 Flash",
      "description": "平衡效果与成本，性价比高",
      "price_per_10k_chars": "0.14335",
      "features": ["快速合成", "支持SSML", "支持Instruct"]
    },
    {
      "model_id": "cosyvoice-v3-plus",
      "model_name": "CosyVoice V3 Plus",
      "description": "最高质量，最佳表现力",
      "price_per_10k_chars": "0.286706",
      "features": ["高质量", "支持SSML", "支持Instruct"]
    },
    {
      "model_id": "cosyvoice-v2",
      "model_name": "CosyVoice V2",
      "description": "兼容旧版，稳定可靠",
      "price_per_10k_chars": "0.286706",
      "features": ["稳定", "支持SSML"]
    }
  ]
}
```

---

### 1.3 长文本语音合成

**POST** `/api/audio/tts/synthesize-long`

将长文本转换为语音，自动切割文本并合成，适用于超过2000字符的文本。

#### 实现原理

1. 将长文本按句子边界（。！？；等）智能切割为多个片段
2. 每个片段不超过2000字符
3. 依次调用语音合成接口合成每个片段
4. 将所有音频片段合并为完整音频返回

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | application/json |
| Authorization | string | 是 | Bearer {token} |

#### 请求体

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| model | string | 是 | - | 语音合成模型 |
| voice | string | 是 | - | 音色ID |
| text | string | 是 | - | 待合成文本，支持超过2000字符 |
| format | string | 否 | mp3 | 音频格式 |
| volume | int | 否 | 50 | 音量 [0, 100] |
| speech_rate | float | 否 | 1.0 | 语速 [0.5, 2.0] |
| pitch_rate | float | 否 | 1.0 | 音高 [0.5, 2.0] |

#### 请求示例

```json
{
  "model": "cosyvoice-v3-flash",
  "voice": "longanyang",
  "text": "这是一段很长的文本...（超过2000字符的内容）",
  "format": "mp3",
  "volume": 50,
  "speech_rate": 1.0
}
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
    "duration": 120.5,
    "format": "mp3",
    "total_characters": 5000,
    "segments": 3,
    "bill": "0.0717",
    "record_id": 2
  }
}
```

> 📦 **存储说明**：合成的音频文件自动上传至OSS，存储路径为 `audio/tts/{日期}/{uuid}.{format}`

#### 响应字段说明

| 字段 | 类型 | 说明 |
|------|------|------|
| audio_url | string | OSS上的音频文件URL |
| duration | float | 音频总时长（秒） |
| format | string | 音频格式 |
| total_characters | int | 总字符数 |
| segments | int | 切割的片段数 |
| bill | decimal | 本次合成费用（元） |
| record_id | int | 生成记录ID |

---

### 1.4 双向流式语音合成（暂不实现）

**WebSocket** `/api/audio/tts/stream`

> ⚠️ **注意**：此接口暂未实现，预留接口定义供后续开发。

支持分段发送文本，实时获取合成音频，适用于实时对话场景。

#### 连接参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| model | string | 是 | 语音合成模型 |
| voice | string | 是 | 音色ID |
| format | string | 否 | 音频格式，默认 pcm |
| sample_rate | int | 否 | 采样率，默认 22050 |

#### 发送消息格式

```json
{
  "type": "text",
  "content": "待合成的文本片段"
}
```

结束发送：

```json
{
  "type": "end"
}
```

#### 接收消息格式

二进制音频数据或JSON状态消息：

```json
{
  "type": "complete",
  "characters": 100,
  "duration": 5.2
}
```

---

## 二、语音识别 (ASR)

### 2.0 获取语音识别模型列表

**GET** `/api/audio/asr/models`

获取所有可用的语音识别模型。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 请求参数

无

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": [
    {
      "model_id": "qwen3-asr-flash",
      "model_name": "通义千问3-ASR-Flash",
      "description": "快速识别，支持上下文增强",
      "call_type": "sync",
      "features": ["上下文增强", "情感识别", "多语种"]
    },
    {
      "model_id": "qwen-audio-asr",
      "model_name": "通义千问Audio ASR",
      "description": "通用语音识别",
      "call_type": "sync",
      "features": ["通用识别", "多语种"]
    },
    {
      "model_id": "qwen3-asr-flash-filetrans",
      "model_name": "通义千问3-ASR-Flash-Filetrans",
      "description": "长音频转写，支持多音轨",
      "call_type": "async",
      "features": ["长音频", "多音轨", "时间戳"]
    }
  ]
}
```

---

### 2.1 同步语音识别

**POST** `/api/audio/asr/recognize`

将音频文件转换为文字，适用于短音频（60秒以内）。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | multipart/form-data 或 application/json |
| Authorization | string | 是 | Bearer {token} |

#### 请求体 (JSON方式)

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| model | string | 是 | - | 识别模型：`qwen3-asr-flash`、`qwen-audio-asr` |
| audio_url | string | 是* | - | 音频文件URL（与audio_base64二选一） |
| audio_base64 | string | 是* | - | Base64编码的音频数据（与audio_url二选一） |
| language | string | 否 | - | 指定语种：zh、en、ja、ko等，不指定则自动检测 |
| enable_itn | boolean | 否 | false | 是否启用逆文本标准化（仅中英文） |
| context | string | 否 | - | 上下文提示，提升特定场景识别准确率，不超过10000 Token |

#### 请求体 (FormData方式)

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| model | string | 是 | 识别模型 |
| file | File | 是 | 音频文件 |
| language | string | 否 | 指定语种 |
| enable_itn | boolean | 否 | 是否启用ITN |

#### 请求示例 (JSON方式)

```json
{
  "model": "qwen3-asr-flash",
  "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/input.mp3",
  "language": "zh",
  "enable_itn": true
}
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "text": "欢迎使用语音识别服务。",
    "language": "zh",
    "emotion": "neutral",
    "duration": 3,
    "usage": {
      "input_tokens": 0,
      "output_tokens": 8,
      "seconds": 3
    },
    "bill": "0.0012",
    "record_id": 3
  }
}
```

#### 错误响应示例

```json
{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}
```

---

### 2.2 异步语音识别（长音频）

**POST** `/api/audio/asr/transcribe`

提交长音频转写任务，适用于超过60秒的音频文件。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | application/json |
| Authorization | string | 是 | Bearer {token} |

#### 请求体

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| model | string | 是 | - | 识别模型：`qwen3-asr-flash-filetrans` |
| file_url | string | 是 | - | 音频文件URL，必须公网可访问 |
| language | string | 否 | - | 指定语种 |
| enable_itn | boolean | 否 | false | 是否启用ITN |
| context | string | 否 | - | 上下文提示 |
| channel_id | array | 否 | [0] | 多音轨文件的音轨索引 |

#### 请求示例

```json
{
  "model": "qwen3-asr-flash-filetrans",
  "file_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/long-audio.mp3",
  "language": "zh",
  "enable_itn": true
}
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "PENDING",
    "record_id": 4
  }
}
```

---

### 2.3 查询转写任务状态

**GET** `/api/audio/asr/task/{task_id}`

查询异步转写任务的执行状态和结果。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 路径参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| task_id | string | 是 | 任务ID |

#### 响应示例（进行中）

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "RUNNING",
    "submit_time": "2025-01-01 10:00:00",
    "scheduled_time": "2025-01-01 10:00:01"
  }
}
```

#### 响应示例（完成）

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "8fab76d0-0eed-4d20-929f-xxxx",
    "task_status": "SUCCEEDED",
    "submit_time": "2025-01-01 10:00:00",
    "scheduled_time": "2025-01-01 10:00:01",
    "end_time": "2025-01-01 10:00:05",
    "result": {
      "transcription_url": "https://xxx/result.json",
      "transcripts": [
        {
          "channel_id": 0,
          "text": "今天天气还行吧。",
          "sentences": [
            {
              "begin_time": 100,
              "end_time": 3820,
              "text": "今天天气还行吧。",
              "sentence_id": 0,
              "language": "zh",
              "emotion": "neutral"
            }
          ]
        }
      ]
    },
    "usage": {
      "seconds": 4
    },
    "bill": "0.0016"
  }
}
```

---

## 三、声音复刻

### 3.1 创建复刻音色

**POST** `/api/audio/voice/create`

上传音频样本创建专属复刻音色。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | multipart/form-data 或 application/json |
| Authorization | string | 是 | Bearer {token} |

#### 请求体 (JSON方式)

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| target_model | string | 是 | - | 驱动音色的语音合成模型：`cosyvoice-v3-plus`、`cosyvoice-v3-flash`、`cosyvoice-v2` |
| prefix | string | 是 | - | 音色名称前缀，仅允许数字、字母和下划线，不超过10字符 |
| audio_url | string | 是* | - | 音频文件URL（与file二选一） |
| language_hints | array | 否 | - | 语言提示：en、fr、de、ja、ko、ru |

#### 请求体 (FormData方式)

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| target_model | string | 是 | 驱动音色的语音合成模型 |
| prefix | string | 是 | 音色名称前缀 |
| file | File | 是 | 音频文件（10-60秒，≤10MB） |
| language_hints | string | 否 | 语言提示，逗号分隔 |

#### 音频要求

| 项目 | 要求 |
|------|------|
| 支持格式 | WAV (16bit), MP3, M4A |
| 音频时长 | 推荐 10~20 秒，最长 60 秒 |
| 文件大小 | ≤ 10 MB |
| 采样率 | ≥ 16 kHz |
| 声道 | 单声道 / 双声道 |
| 内容要求 | 至少5秒连续清晰朗读，无背景音乐/噪音/其他人声 |

#### 请求示例 (JSON方式)

```json
{
  "target_model": "cosyvoice-v3-plus",
  "prefix": "myvoice",
  "audio_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/sample.mp3"
}
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "DEPLOYING",
    "target_model": "cosyvoice-v3-plus",
    "record_id": 5
  }
}
```

#### 错误响应示例

```json
{
  "code": 403,
  "message": "未配置API密钥，请在用户设置中配置apikey",
  "data": null
}
```

---

### 3.2 查询音色列表

**GET** `/api/audio/voice/list`

分页查询已创建的复刻音色列表，支持按模型筛选。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 请求参数

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| prefix | string | 否 | - | 按前缀筛选 |
| page | int | 否 | 0 | 页码，从0开始 |
| page_size | int | 否 | 10 | 每页数量 |
| model | string | 否 | - | 按目标模型筛选：`cosyvoice-v3-flash`、`cosyvoice-v3-plus`、`cosyvoice-v2` |

#### 请求示例

**查询所有音色**
```
GET /api/audio/voice/list?page=0&page_size=10
```

**按模型筛选**
```
GET /api/audio/voice/list?model=cosyvoice-v3-flash&page=0&page_size=10
```

**按前缀筛选**
```
GET /api/audio/voice/list?prefix=myvoice&page=0&page_size=10
```

**组合筛选（按模型和前缀）**
```
GET /api/audio/voice/list?model=cosyvoice-v3-plus&prefix=test&page=0&page_size=10
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "total": 2,
    "voices": [
      {
        "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
        "status": "OK",
        "target_model": "cosyvoice-v3-plus",
        "gmt_create": "2025-01-01 10:00:00",
        "gmt_modified": "2025-01-01 10:00:05"
      },
      {
        "voice_id": "cosyvoice-v3-flash-test-yyyyyyyy",
        "status": "DEPLOYING",
        "target_model": "cosyvoice-v3-flash",
        "gmt_create": "2025-01-01 11:00:00",
        "gmt_modified": "2025-01-01 11:00:00"
      }
    ]
  }
}
```

#### 响应参数说明

| 字段 | 类型 | 说明 |
|------|------|------|
| total | int | 符合条件的音色总数 |
| voices | array | 音色列表 |
| voices[].voice_id | string | 音色ID |
| voices[].status | string | 音色状态：`DEPLOYING`（审核中）、`OK`（可用）、`UNDEPLOYED`（未通过） |
| voices[].target_model | string | 目标模型（创建音色时指定的模型） |
| voices[].gmt_create | string | 创建时间 |
| voices[].gmt_modified | string | 修改时间 |

#### 使用说明

1. **按模型筛选**：传入 `model` 参数可以只获取指定模型创建的复刻音色，便于在切换模型时快速加载对应的音色列表
2. **分页查询**：使用 `page` 和 `page_size` 参数进行分页，默认每页10条
3. **组合筛选**：可以同时使用 `prefix` 和 `model` 参数进行组合筛选

---

### 3.3 查询指定音色

**GET** `/api/audio/voice/{voice_id}`

获取指定音色的详细信息。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 路径参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| voice_id | string | 是 | 音色ID |

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "OK",
    "target_model": "cosyvoice-v3-plus",
    "resource_link": "https://xxx/audio.wav",
    "gmt_create": "2025-01-01 10:00:00",
    "gmt_modified": "2025-01-01 10:00:05"
  }
}
```

---

### 3.4 更新音色

**PUT** `/api/audio/voice/{voice_id}`

使用新的音频文件更新已存在的音色。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Content-Type | string | 是 | multipart/form-data 或 application/json |
| Authorization | string | 是 | Bearer {token} |

#### 路径参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| voice_id | string | 是 | 音色ID |

#### 请求体

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| audio_url | string | 是* | 新的音频文件URL（与file二选一） |
| file | File | 是* | 新的音频文件（与audio_url二选一） |

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
    "status": "DEPLOYING"
  }
}
```

---

### 3.5 删除音色

**DELETE** `/api/audio/voice/{voice_id}`

删除指定的复刻音色，此操作不可逆。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 路径参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| voice_id | string | 是 | 音色ID |

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": null
}
```

---

## 四、系统音色列表

### 4.1 获取系统音色列表

**GET** `/api/audio/voice/system`

获取所有可用的系统预置音色。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 请求参数

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| model | string | 否 | - | 按模型筛选：cosyvoice-v3-flash、cosyvoice-v3-plus |
| category | string | 否 | - | 按场景筛选：社交陪伴、童声、客服、语音助手、有声书等 |

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": [
    {
      "voice_id": "longanyang",
      "name": "龙安洋",
      "trait": "阳光大男孩",
      "age": "20~30岁",
      "category": "社交陪伴",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": true,
        "timestamp": false
      }
    },
    {
      "voice_id": "longanhuan",
      "name": "龙安欢",
      "trait": "欢脱元气女",
      "age": "20~30岁",
      "category": "社交陪伴",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": true,
        "timestamp": false
      }
    },
    {
      "voice_id": "longyingjing_v3",
      "name": "龙应静",
      "trait": "低调冷静女",
      "age": "20~30岁",
      "category": "客服",
      "languages": ["中文（普通话）", "英文"],
      "models": ["cosyvoice-v3-flash", "cosyvoice-v3-plus"],
      "features": {
        "ssml": true,
        "instruct": false,
        "timestamp": true
      }
    }
  ]
}
```

---

## 五、历史记录

### 5.1 获取历史记录

**GET** `/api/audio/history`

获取当前用户的语音操作历史记录，支持分页和类型筛选。

#### 请求头

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| Authorization | string | 是 | Bearer {token} |

#### 请求参数

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| page | int | 否 | 1 | 页码 |
| page_size | int | 否 | 20 | 每页数量 |
| type | string | 否 | - | 操作类型筛选：tts（语音合成）、asr（语音识别）、voice（声音复刻） |

#### 请求示例

**获取所有历史记录**
```
GET /api/audio/history?page=1&page_size=20
```

**按类型筛选**
```
GET /api/audio/history?type=tts&page=1&page_size=10
```

#### 响应示例

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "items": [
      {
        "id": 1,
        "type": "tts",
        "model_name": "cosyvoice-v3-flash",
        "input_data": "{\"text\": \"你好，欢迎使用语音合成服务！\", \"voice\": \"longanyang\"}",
        "output_url": "https://your-bucket.oss-cn-beijing.aliyuncs.com/audio/tts/20251230/xxx.mp3",
        "duration": 3.5,
        "bill": "0.0021",
        "created_at": "2025-12-30T10:30:00"
      },
      {
        "id": 2,
        "type": "asr",
        "model_name": "qwen3-asr-flash",
        "input_data": "{\"audio_url\": \"https://xxx/input.mp3\"}",
        "output_text": "欢迎使用语音识别服务。",
        "duration": 3,
        "bill": "0.0012",
        "created_at": "2025-12-30T10:25:00"
      },
      {
        "id": 3,
        "type": "voice",
        "model_name": "cosyvoice-v3-plus",
        "input_data": "{\"prefix\": \"myvoice\"}",
        "voice_id": "cosyvoice-v3-plus-myvoice-xxxxxxxx",
        "status": "OK",
        "bill": "0",
        "created_at": "2025-12-30T10:20:00"
      }
    ],
    "total": 3,
    "page": 1,
    "page_size": 20
  }
}
```

---

## 数据字段说明

### TTSResponse 字段

| 字段 | 类型 | 说明 |
|------|------|------|
| audio_url | string | OSS音频文件URL |
| duration | float | 音频时长（秒） |
| format | string | 音频格式 |
| sample_rate | int | 采样率 |
| characters | int | 合成字符数 |
| bill | decimal | 本次费用（元） |
| record_id | int | 生成记录ID |

### ASRResponse 字段

| 字段 | 类型 | 说明 |
|------|------|------|
| text | string | 识别结果文本 |
| language | string | 检测到的语言 |
| emotion | string | 情感类型 |
| duration | int | 音频时长（秒） |
| usage | object | 用量信息 |
| bill | decimal | 本次费用（元） |
| record_id | int | 生成记录ID |

### AudioHistoryItem 字段

| 字段 | 类型 | 说明 |
|------|------|------|
| id | int | 记录ID |
| type | string | 操作类型：tts、asr、voice |
| model_name | string | 使用的模型名称 |
| input_data | string | 输入数据JSON |
| output_url | string | 输出音频URL（TTS） |
| output_text | string | 输出文本（ASR） |
| voice_id | string | 音色ID（声音复刻） |
| duration | float | 时长（秒） |
| bill | decimal | 费用 |
| created_at | datetime | 创建时间 |

### TTSModelInfo 字段

| 字段 | 类型 | 说明 |
|------|------|------|
| model_id | string | 模型ID，用于API调用 |
| model_name | string | 模型显示名称 |
| description | string | 模型描述 |
| price_per_10k_chars | decimal | 每万字符价格（元） |
| features | array | 支持的特性列表 |

### ASRModelInfo 字段

| 字段 | 类型 | 说明 |
|------|------|------|
| model_id | string | 模型ID，用于API调用 |
| model_name | string | 模型显示名称 |
| description | string | 模型描述 |
| call_type | string | 调用方式：sync（同步）、async（异步） |
| features | array | 支持的特性列表 |

### 音色状态 (VoiceStatus)

| 状态 | 说明 |
|------|------|
| DEPLOYING | 审核中，创建后需等待 |
| OK | 审核通过，可正常使用 |
| UNDEPLOYED | 审核不通过，不可使用 |

### 任务状态 (TaskStatus)

| 状态 | 说明 |
|------|------|
| PENDING | 排队中 |
| RUNNING | 处理中 |
| SUCCEEDED | 成功完成 |
| FAILED | 任务失败 |
| UNKNOWN | 不存在或未知 |

### 情感类型 (Emotion)

| 值 | 说明 |
|------|------|
| neutral | 平静 |
| happy | 愉快 |
| sad | 悲伤 |
| angry | 愤怒 |
| fearful | 恐惧 |
| surprised | 惊讶 |
| disgusted | 厌恶 |

### 支持的语言 (Language)

| 代码 | 语言 |
|------|------|
| zh | 中文（普通话） |
| yue | 粤语 |
| en | 英文 |
| ja | 日语 |
| ko | 韩语 |
| de | 德语 |
| fr | 法语 |
| ru | 俄语 |
| es | 西班牙语 |
| it | 意大利语 |
| pt | 葡萄牙语 |
| ar | 阿拉伯语 |
| th | 泰语 |
| vi | 越南语 |

---

## 模型说明

### 语音合成模型

| 模型 | 特点 | 价格 |
|------|------|------|
| cosyvoice-v3-plus | 最高质量，最佳表现力 | ¥0.286706/万字符 |
| cosyvoice-v3-flash | 平衡效果与成本，性价比高 | ¥0.14335/万字符 |
| cosyvoice-v2 | 兼容旧版，稳定可靠 | ¥0.286706/万字符 |

### 语音识别模型

| 模型 | 调用方式 | 特点 |
|------|---------|------|
| qwen3-asr-flash | 同步 | 快速识别，支持上下文增强 |
| qwen-audio-asr | 同步 | 通用识别 |
| qwen3-asr-flash-filetrans | 异步 | 长音频转写，支持多音轨 |

---

## 错误码说明

| 错误码 | HTTP状态码 | 说明 |
|--------|-----------|------|
| NO_API_KEY | 403 | 用户未配置API密钥 |
| INVALID_MODEL | 400 | 无效的模型名称 |
| INVALID_VOICE | 400 | 无效的音色ID或音色不可用 |
| INVALID_AUDIO | 400 | 音频格式不支持或质量不符合要求 |
| TEXT_TOO_LONG | 400 | 文本长度超过限制 |
| AUDIO_TOO_LONG | 400 | 音频时长超过限制 |
| VOICE_LIMIT_EXCEEDED | 400 | 音色数量达到上限（1000个/账号） |
| MODEL_VOICE_MISMATCH | 400 | 模型与音色不匹配 |
| UNAUTHORIZED | 401 | API Key缺失或无效 |
| TASK_NOT_FOUND | 404 | 任务不存在 |
| VOICE_NOT_FOUND | 404 | 音色不存在 |
| RATE_LIMITED | 429 | 请求频率超限 |
| UPSTREAM_ERROR | 502 | 百炼平台返回错误 |
| TIMEOUT | 504 | 请求超时 |

---

## 前端对接示例

### 获取语音合成模型列表

```typescript
interface TTSModel {
  model_id: string;
  model_name: string;
  description: string;
  price_per_10k_chars: string;
  features: string[];
}

const getTTSModels = async (token: string): Promise<TTSModel[]> => {
  const response = await fetch('/api/audio/tts/models', {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};
```

### 获取语音识别模型列表

```typescript
interface ASRModel {
  model_id: string;
  model_name: string;
  description: string;
  call_type: 'sync' | 'async';
  features: string[];
}

const getASRModels = async (token: string): Promise<ASRModel[]> => {
  const response = await fetch('/api/audio/asr/models', {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};
```

### 语音合成（非流式）

```typescript
interface TTSRequest {
  model: string;
  voice: string;
  text: string;
  stream?: boolean;
  format?: string;
  volume?: number;
  speech_rate?: number;
  pitch_rate?: number;
}

interface TTSResponse {
  audio_url: string;
  duration: number;
  format: string;
  characters: number;
  bill: string;
  record_id: number;
}

const synthesize = async (token: string, request: TTSRequest): Promise<TTSResponse> => {
  const response = await fetch('/api/audio/tts/synthesize', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify(request)
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const result = await synthesize(token, {
  model: 'cosyvoice-v3-flash',
  voice: 'longanyang',
  text: '你好，欢迎使用语音合成服务！'
});

// 播放音频（直接使用OSS URL）
const audio = new Audio(result.audio_url);
audio.play();
console.log('费用:', result.bill);
```

### 长文本语音合成

```typescript
interface LongTTSResponse {
  audio_url: string;
  duration: number;
  format: string;
  total_characters: number;
  segments: number;
  bill: string;
  record_id: number;
}

const synthesizeLongText = async (token: string, request: TTSRequest): Promise<LongTTSResponse> => {
  const response = await fetch('/api/audio/tts/synthesize-long', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify(request)
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例 - 合成长文本
const longText = `这是一段很长的文本...（超过2000字符）`;
const result = await synthesizeLongText(token, {
  model: 'cosyvoice-v3-flash',
  voice: 'longanyang',
  text: longText
});
console.log(`合成完成，共${result.segments}个片段，总时长${result.duration}秒，费用${result.bill}元`);
```

### 语音合成（流式）

```typescript
const synthesizeStream = async (
  token: string,
  request: TTSRequest,
  onAudioChunk: (chunk: ArrayBuffer) => void
): Promise<void> => {
  const response = await fetch('/api/audio/tts/synthesize', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify({ ...request, stream: true })
  });

  const reader = response.body?.getReader();
  if (!reader) throw new Error('No response body');

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    onAudioChunk(value.buffer);
  }
};
```

### 语音识别（文件上传）

```typescript
interface ASRResponse {
  text: string;
  language: string;
  emotion: string;
  duration: number;
  bill: string;
  record_id: number;
}

const recognizeAudio = async (token: string, file: File, model: string): Promise<ASRResponse> => {
  const formData = new FormData();
  formData.append('file', file);
  formData.append('model', model);

  const response = await fetch('/api/audio/asr/recognize', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: formData
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const fileInput = document.querySelector<HTMLInputElement>('#audioFile');
const file = fileInput?.files?.[0];
if (file) {
  const result = await recognizeAudio(token, file, 'qwen3-asr-flash');
  console.log('识别结果:', result.text);
  console.log('费用:', result.bill);
}
```

### 创建复刻音色

```typescript
interface CreateVoiceRequest {
  target_model: string;
  prefix: string;
  audio_url?: string;
}

interface VoiceInfo {
  voice_id: string;
  status: 'DEPLOYING' | 'OK' | 'UNDEPLOYED';
  target_model: string;
  record_id: number;
}

const createVoice = async (token: string, file: File, prefix: string, targetModel: string): Promise<VoiceInfo> => {
  const formData = new FormData();
  formData.append('file', file);
  formData.append('prefix', prefix);
  formData.append('target_model', targetModel);

  const response = await fetch('/api/audio/voice/create', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: formData
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 轮询查询音色状态
const waitForVoiceReady = async (token: string, voiceId: string, maxAttempts = 30): Promise<VoiceInfo> => {
  for (let i = 0; i < maxAttempts; i++) {
    const response = await fetch(`/api/audio/voice/${voiceId}`, {
      headers: { 'Authorization': `Bearer ${token}` }
    });
    const data = await response.json();
    
    if (data.data.status === 'OK') return data.data;
    if (data.data.status === 'UNDEPLOYED') throw new Error('音色审核未通过');
    
    await new Promise(resolve => setTimeout(resolve, 10000)); // 等待10秒
  }
  throw new Error('等待超时');
};
```

### 获取系统音色列表

```typescript
interface SystemVoice {
  voice_id: string;
  name: string;
  trait: string;
  age: string;
  category: string;
  languages: string[];
  models: string[];
  features: {
    ssml: boolean;
    instruct: boolean;
    timestamp: boolean;
  };
}

const getSystemVoices = async (token: string, model?: string, category?: string): Promise<SystemVoice[]> => {
  const params = new URLSearchParams();
  if (model) params.append('model', model);
  if (category) params.append('category', category);

  const response = await fetch(`/api/audio/voice/system?${params}`, {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};
```

### 获取历史记录

```typescript
interface AudioHistoryItem {
  id: number;
  type: 'tts' | 'asr' | 'voice';
  model_name: string;
  input_data: string;
  output_url?: string;
  output_text?: string;
  voice_id?: string;
  duration?: number;
  bill: string;
  created_at: string;
}

interface HistoryResponse {
  items: AudioHistoryItem[];
  total: number;
  page: number;
  page_size: number;
}

const getAudioHistory = async (
  token: string,
  page: number = 1,
  pageSize: number = 20,
  type?: string
): Promise<HistoryResponse> => {
  const params = new URLSearchParams();
  params.append('page', page.toString());
  params.append('page_size', pageSize.toString());
  if (type) params.append('type', type);

  const response = await fetch(`/api/audio/history?${params}`, {
    headers: { 'Authorization': `Bearer ${token}` }
  });
  const data = await response.json();
  if (data.code !== 200) throw new Error(data.message);
  return data.data;
};

// 使用示例
const history = await getAudioHistory(token, 1, 10, 'tts');
console.log('历史记录:', history.items);
console.log('总数:', history.total);
```

---

## 注意事项

1. **认证要求**：所有接口都需要在请求头中携带 `Authorization: Bearer {token}`
2. **模型与音色匹配**：使用复刻音色进行语音合成时，`model` 参数必须与创建音色时的 `target_model` 一致
3. **音色配额**：每个账号最多创建 1000 个复刻音色，一年内未使用的音色会被自动清理
4. **文本长度限制**：单次合成不超过 2000 字符，长文本请使用 `/api/audio/tts/synthesize-long` 接口
5. **音频质量**：复刻音色的效果取决于输入音频质量，请确保音频清晰、无噪音
6. **异步任务有效期**：异步转写任务的结果URL有效期为 24 小时
7. **OSS存储**：
   - 语音合成的音频文件存储在OSS，路径格式：`audio/tts/{日期}/{uuid}.{format}`
   - 声音复刻的音频样本存储在OSS，路径格式：`audio/voice/{日期}/{uuid}.{format}`
   - 返回的 `audio_url` 为OSS公开访问URL，可直接用于播放或下载
8. **计费说明**：每次API调用会返回 `bill` 字段表示本次费用，`record_id` 用于追踪历史记录

---

## 在线文档

启动服务后访问：
- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`