# Design Document: External API
## Overview
本设计文档描述了标注平台对外开放API及内部配套管理功能的技术实现方案。主要包括:
1. **对外API**:使用 `/api/external/` 前缀,供样本中心等外部系统调用
2. **内部管理功能**:预派发项目管理、项目配置、一键任务分发等
外部系统通过管理员Token调用对外接口,实现项目初始化、进度查询和数据导出功能。管理员通过内部界面完成项目配置和任务分发。
## Architecture
### 系统架构图
```mermaid
graph TB
subgraph "外部系统"
SC[样本中心]
end
subgraph "标注平台"
subgraph "Frontend"
PM[项目管理页面]
PC[项目配置页面]
TD[任务分发页面]
end
subgraph "API Layer"
EA[External API Router
/api/external/*]
PA[Project API Router
/api/projects/*]
TA[Task API Router
/api/tasks/*]
end
subgraph "Middleware"
AM[Auth Middleware]
end
subgraph "Services"
ES[External Service]
PS[Project Service]
TS[Task Service]
AS[Assignment Service]
EXS[Export Service]
end
subgraph "Data Layer"
DB[(Database)]
end
end
SC -->|Admin Token| EA
PM --> PA
PC --> PA
TD --> TA
EA --> AM
PA --> AM
TA --> AM
AM --> ES
AM --> PS
AM --> TS
ES --> PS
ES --> TS
ES --> EXS
TS --> AS
PS --> DB
TS --> DB
AS --> DB
EXS --> DB
```
### API 路由结构
```
/api/external/ # 对外API
├── /projects/init POST - 项目初始化
├── /projects/{id}/progress GET - 进度查询
└── /projects/{id}/export POST - 数据导出
/api/projects/ # 内部项目API(扩展)
├── /{id}/config PUT - 更新项目配置
├── /{id}/status PUT - 更新项目状态
└── /{id}/dispatch POST - 一键任务分发
/api/tasks/ # 内部任务API(扩展)
├── /preview-assignment POST - 预览任务分配
└── /batch-assign POST - 批量分配(已有)
```
### 项目状态流转图
```mermaid
stateDiagram-v2
[*] --> draft: 外部系统创建项目
draft --> configuring: 管理员开始配置
configuring --> ready: 管理员完成配置
ready --> in_progress: 管理员分发任务
in_progress --> completed: 所有任务完成
configuring --> draft: 重置配置
ready --> configuring: 修改配置
```
## Components and Interfaces
### 1. External API Router (`routers/external.py`)
负责处理所有对外API请求的路由模块。
```python
from fastapi import APIRouter, HTTPException, status, Request
from schemas.external import (
ProjectInitRequest, ProjectInitResponse,
ProgressResponse, ExternalExportRequest, ExternalExportResponse
)
from services.external_service import ExternalService
router = APIRouter(
prefix="/api/external",
tags=["external"]
)
```
### 2. External Service (`services/external_service.py`)
封装对外API的业务逻辑。
```python
class ExternalService:
@staticmethod
def init_project(request: ProjectInitRequest, user_id: str) -> ProjectInitResponse:
"""初始化项目并创建任务"""
pass
@staticmethod
def get_project_progress(project_id: str) -> ProgressResponse:
"""获取项目进度"""
pass
@staticmethod
def export_project_data(project_id: str, request: ExternalExportRequest) -> ExternalExportResponse:
"""导出项目数据"""
pass
```
### 3. Assignment Service (`services/assignment_service.py`)
封装任务分配的业务逻辑。
```python
class AssignmentService:
@staticmethod
def preview_assignment(
project_id: str,
user_ids: List[str]
) -> AssignmentPreview:
"""预览任务分配结果"""
pass
@staticmethod
def dispatch_tasks(
project_id: str,
user_ids: List[str],
mode: str = "equal"
) -> DispatchResult:
"""执行一键任务分发"""
pass
@staticmethod
def get_annotator_workload(user_id: str) -> AnnotatorWorkload:
"""获取标注人员当前工作负载"""
pass
```
### 4. Schema Definitions (`schemas/external.py`)
#### 项目初始化请求
```python
class TaskType(str, Enum):
TEXT_CLASSIFICATION = "text_classification"
IMAGE_CLASSIFICATION = "image_classification"
OBJECT_DETECTION = "object_detection"
NER = "ner"
class TaskDataItem(BaseModel):
"""单个任务数据项"""
id: Optional[str] = None # 外部系统的数据ID
content: str # 文本内容或图像URL
metadata: Optional[dict] = None # 额外元数据
class ProjectInitRequest(BaseModel):
"""项目初始化请求"""
name: str
description: Optional[str] = ""
task_type: TaskType
data: List[TaskDataItem]
config: Optional[str] = None # 自定义XML配置,为空则使用默认空模板
external_id: Optional[str] = None # 外部系统的项目ID,用于关联
```
#### 项目初始化响应
```python
class ProjectInitResponse(BaseModel):
"""项目初始化响应"""
project_id: str # 标注平台的项目ID,样本中心需保存用于后续回调
project_name: str
task_count: int
status: str # "draft"
created_at: datetime
config: str # 实际使用的XML配置模板
external_id: Optional[str] = None # 样本中心传入的外部ID(如有)
```
#### 进度查询响应
```python
class AnnotatorProgress(BaseModel):
"""标注人员进度"""
user_id: str
username: str
assigned_count: int
completed_count: int
in_progress_count: int
completion_rate: float
class ProgressResponse(BaseModel):
"""项目进度响应"""
project_id: str
project_name: str
total_tasks: int
completed_tasks: int
in_progress_tasks: int
pending_tasks: int
completion_percentage: float
annotators: List[AnnotatorProgress]
last_updated: datetime
```
#### 数据导出请求/响应
```python
class ExternalExportFormat(str, Enum):
JSON = "json"
CSV = "csv"
SHAREGPT = "sharegpt" # ShareGPT对话格式
YOLO = "yolo" # YOLO目标检测格式
COCO = "coco" # COCO数据集格式
ALPACA = "alpaca" # Alpaca指令微调格式
class ExternalExportRequest(BaseModel):
"""导出请求"""
format: ExternalExportFormat = ExternalExportFormat.JSON
completed_only: bool = True # 是否只导出已完成的任务
callback_url: Optional[str] = None # 回调URL,导出完成后通知样本中心
class ExportedTaskData(BaseModel):
"""导出的任务数据"""
task_id: str
external_id: Optional[str] # 外部系统的数据ID
original_data: dict
annotations: List[dict]
status: str
annotator: Optional[str]
completed_at: Optional[datetime]
class ExternalExportResponse(BaseModel):
"""导出响应"""
project_id: str
format: str
total_exported: int
file_url: str # 导出文件的下载URL
file_name: str # 文件名
file_size: Optional[int] = None # 文件大小(字节)
expires_at: Optional[datetime] = None # 下载链接过期时间
class ExportCallbackPayload(BaseModel):
"""导出完成回调载荷"""
project_id: str
export_id: str
status: str # "completed" 或 "failed"
format: str
total_exported: int
file_url: str
file_name: str
file_size: int
error_message: Optional[str] = None
```
### 5. Internal Management Schemas (`schemas/project.py` 扩展)
#### 项目状态更新
```python
class ProjectStatus(str, Enum):
DRAFT = "draft"
CONFIGURING = "configuring"
READY = "ready"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
class ProjectSource(str, Enum):
INTERNAL = "internal"
EXTERNAL = "external"
class ProjectStatusUpdate(BaseModel):
"""项目状态更新请求"""
status: ProjectStatus
class ProjectConfigUpdate(BaseModel):
"""项目配置更新请求"""
config: str # XML配置
labels: Optional[List[LabelConfig]] = None
class LabelConfig(BaseModel):
"""标签配置"""
name: str
color: Optional[str] = None
hotkey: Optional[str] = None
```
#### 任务分发相关
```python
class DispatchRequest(BaseModel):
"""一键分发请求"""
user_ids: List[str]
mode: str = "equal" # equal 或 round_robin
class AssignmentPreviewRequest(BaseModel):
"""分配预览请求"""
user_ids: List[str]
class AnnotatorAssignment(BaseModel):
"""单个标注人员的分配信息"""
user_id: str
username: str
task_count: int
percentage: float
current_workload: int # 当前已有任务数
class AssignmentPreviewResponse(BaseModel):
"""分配预览响应"""
project_id: str
total_tasks: int
assignments: List[AnnotatorAssignment]
class DispatchResponse(BaseModel):
"""分发结果响应"""
project_id: str
success: bool
total_assigned: int
assignments: List[AnnotatorAssignment]
project_status: str # 更新后的项目状态
```
### 6. Extended Project Response
```python
class ProjectResponseExtended(BaseModel):
"""扩展的项目响应(包含状态和来源)"""
id: str
name: str
description: str
config: str
task_type: Optional[str] = None
status: ProjectStatus = ProjectStatus.DRAFT
source: ProjectSource = ProjectSource.INTERNAL
created_at: datetime
updated_at: Optional[datetime] = None
task_count: int = 0
completed_task_count: int = 0
assigned_task_count: int = 0
```
## Data Models
### 数据库表扩展
需要在 `projects` 表中添加以下字段:
```sql
ALTER TABLE projects ADD COLUMN status VARCHAR(20) DEFAULT 'draft';
ALTER TABLE projects ADD COLUMN source VARCHAR(20) DEFAULT 'internal';
ALTER TABLE projects ADD COLUMN task_type VARCHAR(50);
ALTER TABLE projects ADD COLUMN updated_at TIMESTAMP;
ALTER TABLE projects ADD COLUMN external_id VARCHAR(100); -- 外部系统的项目ID
```
### 任务类型与默认配置映射
```python
# 默认配置模板(不含标签,由管理员后续配置)
DEFAULT_CONFIGS = {
"text_classification": """
""",
"image_classification": """
""",
"object_detection": """
""",
"ner": """
"""
}
```
### 任务数据格式
根据任务类型,任务数据的存储格式:
```python
# 文本分类/NER
{
"items": [
{
"id": "item_001",
"external_id": "ext_123",
"text": "这是一段待标注的文本"
}
]
}
# 图像分类/目标检测
{
"items": [
{
"id": "item_001",
"external_id": "ext_123",
"image": "https://example.com/image.jpg"
}
]
}
```
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system-essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Token验证一致性
*For any* API请求,如果提供的Token无效或过期,系统应返回401未授权错误;如果Token有效但用户不是管理员,系统应返回403禁止访问错误。
**Validates: Requirements 1.5, 2.6, 3.8, 4.2, 4.4, 4.5, 4.6**
### Property 2: 项目创建完整性
*For any* 有效的项目初始化请求,创建的任务数量应等于请求中提供的数据项数量,且返回的响应应包含有效的项目ID和正确的任务计数。
**Validates: Requirements 1.2, 1.3, 1.4**
### Property 3: 进度计算正确性
*For any* 项目,返回的完成百分比应等于已完成任务数除以总任务数,且总任务数应等于已完成、进行中和待处理任务数之和。
**Validates: Requirements 2.2, 2.3**
### Property 4: 人员统计一致性
*For any* 项目进度查询,返回的标注人员统计中,每个人员的任务总数(assigned_count)应等于其completed_count + in_progress_count + pending_count。
**Validates: Requirements 2.4**
### Property 5: 导出过滤正确性
*For any* 导出请求,如果指定completed_only=true,则导出的所有任务状态都应为"completed";如果指定completed_only=false,则导出的任务数量应等于项目的总任务数。
**Validates: Requirements 3.5, 3.6**
### Property 6: 导出数据完整性
*For any* 导出的任务数据,应包含原始数据(original_data)和对应的标注结果(annotations),且original_data应与创建时的输入数据一致。
**Validates: Requirements 3.4**
### Property 7: 配置生成正确性
*For any* 项目初始化请求,系统应根据task_type生成对应的默认XML配置模板(不含标签),标签由管理员后续配置。
**Validates: Requirements 5.5**
### Property 8: 资源不存在处理
*For any* 进度查询或导出请求,如果项目ID不存在,系统应返回404错误。
**Validates: Requirements 2.5, 3.7**
### Property 9: 项目状态流转正确性
*For any* 项目状态更新操作,系统应只允许符合状态流转规则的转换:draft→configuring→ready→in_progress→completed。
**Validates: Requirements 10.2, 10.3, 10.4, 10.5, 10.6, 10.7**
### Property 10: 一键分发任务数量一致性
*For any* 一键分发操作,分配给所有标注人员的任务总数应等于项目的总任务数,且每个人分配的任务数量差异不超过1。
**Validates: Requirements 8.6, 8.7**
### Property 11: 分配预览与实际分配一致性
*For any* 任务分发操作,如果使用相同的参数,预览结果中的任务分配数量应与实际分配结果一致。
**Validates: Requirements 9.1, 9.2, 9.5**
## Error Handling
### 错误响应格式
```python
class ErrorResponse(BaseModel):
"""统一错误响应格式"""
error_code: str
message: str
details: Optional[dict] = None
# 错误码定义
ERROR_CODES = {
"INVALID_TOKEN": "Token无效或已过期",
"PERMISSION_DENIED": "权限不足,需要管理员权限",
"PROJECT_NOT_FOUND": "项目不存在",
"INVALID_REQUEST": "请求参数无效",
"INVALID_TASK_TYPE": "不支持的任务类型",
"INVALID_STATUS_TRANSITION": "无效的状态转换",
"PROJECT_NOT_READY": "项目尚未就绪,无法分发任务",
"NO_TASKS_TO_ASSIGN": "没有可分配的任务",
"NO_USERS_SELECTED": "未选择标注人员",
"EXPORT_FAILED": "导出失败",
"INTERNAL_ERROR": "内部服务器错误"
}
```
### HTTP状态码映射
| 错误码 | HTTP状态码 | 说明 |
|--------|-----------|------|
| INVALID_TOKEN | 401 | Token验证失败 |
| PERMISSION_DENIED | 403 | 非管理员用户 |
| PROJECT_NOT_FOUND | 404 | 项目不存在 |
| INVALID_REQUEST | 400 | 请求参数错误 |
| INVALID_TASK_TYPE | 400 | 任务类型不支持 |
| INVALID_STATUS_TRANSITION | 400 | 状态转换不合法 |
| PROJECT_NOT_READY | 400 | 项目未就绪 |
| NO_TASKS_TO_ASSIGN | 400 | 无可分配任务 |
| NO_USERS_SELECTED | 400 | 未选择用户 |
| EXPORT_FAILED | 500 | 导出过程出错 |
| INTERNAL_ERROR | 500 | 服务器内部错误 |
## Frontend Components
### 1. 预派发项目列表页面
```typescript
// 项目列表扩展,显示项目状态和来源
interface ProjectListItem {
id: string;
name: string;
description: string;
taskType: string;
status: 'draft' | 'configuring' | 'ready' | 'in_progress' | 'completed';
source: 'internal' | 'external';
taskCount: number;
completedTaskCount: number;
createdAt: string;
}
// 状态筛选器
type StatusFilter = 'all' | 'draft' | 'configuring' | 'ready' | 'in_progress' | 'completed';
```
### 2. 项目配置页面
```typescript
// 标签配置组件
interface LabelEditorProps {
labels: LabelConfig[];
onLabelsChange: (labels: LabelConfig[]) => void;
taskType: string;
}
// XML配置预览组件
interface ConfigPreviewProps {
config: string;
onConfigChange: (config: string) => void;
}
```
### 3. 一键分发对话框
```typescript
// 分发对话框组件
interface DispatchDialogProps {
projectId: string;
totalTasks: number;
onDispatch: (userIds: string[]) => Promise;
}
// 标注人员选择组件
interface AnnotatorSelectorProps {
annotators: AnnotatorInfo[];
selectedIds: string[];
onSelectionChange: (ids: string[]) => void;
}
interface AnnotatorInfo {
id: string;
username: string;
currentWorkload: number; // 当前任务数
completedToday: number; // 今日完成数
}
// 分配预览组件
interface AssignmentPreviewProps {
preview: AssignmentPreviewResponse;
}
```
## Testing Strategy
### 单元测试
1. **配置生成测试**:验证不同任务类型生成正确的XML配置
2. **进度计算测试**:验证进度百分比计算逻辑
3. **数据格式转换测试**:验证任务数据的序列化和反序列化
### 属性测试(Property-Based Testing)
使用 `hypothesis` 库进行属性测试:
1. **Property 1 测试**:生成随机Token,验证认证逻辑
2. **Property 2 测试**:生成随机长度的数据列表,验证任务创建数量
3. **Property 3 测试**:生成随机任务状态分布,验证进度计算
4. **Property 5 测试**:生成混合状态的任务,验证导出过滤
### 集成测试
1. **完整流程测试**:项目初始化 → 进度查询 → 数据导出
2. **认证流程测试**:有效Token、无效Token、非管理员Token
3. **错误处理测试**:各种错误场景的响应验证
### 测试配置
```python
# pytest.ini 配置
[pytest]
testpaths = test
python_files = test_*.py
python_functions = test_*
# hypothesis 配置
hypothesis_settings = {
"max_examples": 100,
"deadline": None
}
```