流式输出API文档.md 7.3 KB

文档 AI 对话 — 流式输出 API 文档

改造说明:后端已增加流式输出能力,LLM 推理过程实时推送给前端,前端可按需展示打字效果。

接口基本信息

项目 内容
URL POST /sgbx/document_chat
非流式 查询参数 stream=false(默认),返回完整 JSON
流式 查询参数 stream=trueresponse_mode="sse",返回 SSE 事件流
Content-Type application/json
Response Content-Type text/event-stream(流式)/ application/json(非流式)

请求体(流式/非流式共用)

{
  "user_id": "string,必填",
  "message": "string,必填,用户问题",
  "conversation_id": "string,可选,对话历史 ID",
  "task_id": "string,可选,任务 ID",
  "response_mode": "sse 或 blocking,可选,默认 blocking",
  "project_info": {
    "tenant_id": "string",
    "project_id": "string"
  },
  "selected_section": {
    "index": "string,必填,章节索引",
    "title": "string,必填,章节标题",
    "code": "string,可选,章节编号",
    "content": "string,必填,章节正文"
  },
  "document_context": {
    "full_text": "string,可选,文档全文",
    "previous_section": { "title": "...", "content": "..." },
    "next_section": { "title": "...", "content": "..." }
  },
  "conversation_history": [
    { "role": "user/assistant", "content": "string" }
  ]
}

流式 SSE 事件格式

每个事件遵循标准 SSE 协议:

event: <事件类型>
data: <JSON 对象>

事件顺序总览

connected → processing(workflow_started) → reasoning(recognize_intent) → intent
→ reasoning(rerank_context) → retrieval_result
→ reasoning(run_answer_skill / run_modify_skill)
→ [chunk] → [chunk] → ...  ← 实时推理流
→ answer_completed / proposal_completed
→ completed

1. connected — 连接建立

{
  "callback_task_id": "doc_chat_abc123def456",
  "status": "connected",
  "timestamp": 1748150000
}

2. processing — 工作流启动

{
  "callback_task_id": "doc_chat_abc123def456",
  "stage_name": "workflow_started",
  "status": "processing",
  "message": "文档 AI 对话工作流已启动"
}

3. reasoning — 阶段进度(共 3 次)

stage_name message
recognize_intent "已完成用户意图识别"
rerank_context "知识库内容检索重排完成"
run_answer_skill "已生成章节问答结果"
run_modify_skill "已生成章节修改草案"
{
  "callback_task_id": "doc_chat_abc123def456",
  "stage_name": "recognize_intent",
  "status": "processing",
  "message": "已完成用户意图识别"
}

异常时 status"failed"

4. intent — 意图识别结果

紧跟 reasoning(recognize_intent) 之后。

{
  "callback_task_id": "doc_chat_abc123def456",
  "intent_result": {
    "intent": "answer",
    "skill_name": "document-answer",
    "confidence": 0.92,
    "normalized_instruction": "请解释施工准备的内容",
    "operation": null
  }
}

5. retrieval_result — RAG 检索结果

紧跟 reasoning(rerank_context) 之后。

{
  "callback_task_id": "doc_chat_abc123def456",
  "retrieval_status": "reranked",
  "retrieval_method": "hybrid",
  "retrieval_metrics": {
    "recall_count": 12,
    "rerank_count": 8
  },
  "rerank_count": 8,
  "references": [
    {
      "source": "向量知识库",
      "content": "施工准备包括...",
      "vector_similarity": 0.87,
      "metadata": {
        "tenant_id": "t1",
        "project_id": "p1",
        "chapter_level_1": "第一章 施工准备",
        "source_scope_valid": true
      }
    }
  ],
  "warnings": []
}

references 最多返回 8 条,每条 content 截取前 600 字符。

6. chunk — 实时推理文本(改造新增)

在 LLM 生成阶段持续推送,前端应拼接为完整回答并做打字效果展示。

{
  "callback_task_id": "doc_chat_abc123def456",
  "chunk": "施工准备是项目实施前的关键环节"
}

前端收到多个 chunk 后拼接得到完整文本。该文本为 JSON 包裹格式,前端需从中提取 answerproposed_content 字段作为展示内容。

思考内容(<think>...</think> 等)已被后端过滤,不会推送。

7. answer_completed / proposal_completed — 最终结果

问答场景 answer_completed

{
  "callback_task_id": "doc_chat_abc123def456",
  "response_type": "answer",
  "intent_result": { "intent": "answer", "skill_name": "document-answer", "confidence": 0.92 },
  "answer": "施工准备包括...(完整回答)",
  "references": [
    { "source": "...", "content": "...", "metadata": {}, "vector_similarity": 0.87 }
  ],
  "retrieval_status": "reranked",
  "retrieval_metrics": { "recall_count": 12, "rerank_count": 8, "approved_count": 5 },
  "warnings": [],
  "selected_section": { "index": "2", "code": "SP-02", "title": "施工准备" },
  "error_message": null
}

修改场景 proposal_completed

{
  "callback_task_id": "doc_chat_abc123def456",
  "response_type": "proposal",
  "intent_result": { "intent": "modify", "skill_name": "document-modify", "confidence": 0.88 },
  "answer": null,
  "proposed_content": "修改后的完整章节正文...",
  "change_summary": ["调整了施工准备流程描述", "补充了安全要求"],
  "references": [],
  "retrieval_status": "reranked",
  "retrieval_metrics": { "recall_count": 12, "rerank_count": 8, "approved_count": 5 },
  "warnings": [],
  "selected_section": { "index": "2", "code": "SP-02", "title": "施工准备" },
  "error_message": null
}

对比说明:修改场景的 diff 对比由前端自行处理,后端不再返回 diff 结果。

8. completed — 流程结束

仅在 response_type != "error" 时发送。

{
  "callback_task_id": "doc_chat_abc123def456",
  "status": "completed",
  "duration": 12.345
}

9. error — 异常

{
  "callback_task_id": "doc_chat_abc123def456",
  "status": "error",
  "message": "错误详情"
}

error 事件发出后,不会再发送 completed 事件。

非流式响应(stream=false)

{
  "code": 200,
  "message": "success",
  "data": {
    "callback_task_id": "doc_chat_abc123def456",
    "response_type": "answer",
    "intent_result": { ... },
    "answer": "施工准备包括...",
    "proposed_content": null,
    "change_summary": [],
    "references": [ ... ],
    "retrieval_status": "reranked",
    "retrieval_metrics": { ... },
    "warnings": [],
    "selected_section": { "index": "2", "code": "SP-02", "title": "施工准备" },
    "error_message": null
  }
}

code: 500 表示异常,message 包含错误信息。

前端对接要点

  1. 流式选择:请求时加 ?stream=true 或 body 中 response_mode: "sse"
  2. chunk 拼接:将所有 chunk 事件的 chunk 字段拼接,从结果 JSON 中提取 answerproposed_content 做展示
  3. diff 对比:修改场景下,前端自行对 proposed_content 与原章节 content 做 diff 展示
  4. 进度展示:监听 reasoning 事件的 message 字段作为用户可见的进度提示
  5. 错误处理:收到 error 事件即终止,不再等待 completed
  6. 健康检查GET /sgbx/document_chat/health