design.md 21 KB

Design Document: Annotation Platform

Overview

标注平台是一个完整的数据标注管理系统,支持从项目创建、任务分配到人员标注的完整工作流程。系统采用前后端分离架构:

  • 前端: React + TypeScript + Nx 单体仓库,使用 Jotai 状态管理,集成 @humansignal/editor 标注编辑器
  • 后端: Python FastAPI + SQLite,提供 RESTful API
  • 设计理念: 组件化、模块化、可复用,视图组件独立以便其他平台集成

Architecture

System Architecture

graph TB
    subgraph "Frontend (web/apps/lq_label)"
        A[React App] --> B[Layout Component]
        B --> C[Project View]
        B --> D[Task View]
        B --> E[Annotation View]
        A --> F[Jotai State Management]
        E --> G[@humansignal/editor]
        A --> H[@humansignal/ui Components]
    end
    
    subgraph "Backend (backend/)"
        I[FastAPI Server] --> J[Project API]
        I --> K[Task API]
        I --> L[Annotation API]
        J --> M[SQLite Database]
        K --> M
        L --> M
    end
    
    A -->|HTTP/REST| I

Frontend Architecture

graph TB
    subgraph "App Structure"
        A[main.tsx] --> B[App.tsx]
        B --> C[Layout]
        C --> D[Sidebar Navigation]
        C --> E[Main Content Area]
        E --> F[Router]
        F --> G[ProjectListView]
        F --> H[ProjectDetailView]
        F --> I[TaskListView]
        F --> J[AnnotationView]
    end
    
    subgraph "State Management"
        K[projectsAtom] --> G
        K --> H
        L[tasksAtom] --> I
        M[currentAnnotationAtom] --> J
    end
    
    subgraph "Shared Components"
        N[@humansignal/ui]
        O[Custom Components]
    end

Backend Architecture

graph TB
    subgraph "API Layer"
        A[main.py] --> B[Project Router]
        A --> C[Task Router]
        A --> D[Annotation Router]
    end
    
    subgraph "Data Layer"
        E[Database Models]
        F[SQLite Connection]
        E --> F
    end
    
    subgraph "Business Logic"
        G[Project Service]
        H[Task Service]
        I[Annotation Service]
    end
    
    B --> G
    C --> H
    D --> I
    G --> E
    H --> E
    I --> E

Components and Interfaces

Frontend Components

1. Layout Component

Location: web/apps/lq_label/src/components/Layout/Layout.tsx

Responsibility: 提供后台管理平台样式的主布局

Interface:

interface LayoutProps {
  children: React.ReactNode;
}

Structure:

  • Sidebar navigation (fixed left)
  • Top header bar
  • Main content area (scrollable)
  • Responsive design

2. Sidebar Component

Location: web/apps/lq_label/src/components/Layout/Sidebar.tsx

Responsibility: 导航菜单

Interface:

interface SidebarProps {
  activeRoute: string;
}

interface MenuItem {
  id: string;
  label: string;
  icon: React.ReactNode;
  path: string;
}

Menu Items:

  • Projects (项目管理)
  • Tasks (任务管理)
  • Annotations (我的标注)

3. ProjectListView

Location: web/apps/lq_label/src/views/ProjectListView/ProjectListView.tsx

Responsibility: 显示项目列表,支持创建、编辑、删除项目

Interface:

interface ProjectListViewProps {}

interface Project {
  id: string;
  name: string;
  description: string;
  config: string;
  created_at: string;
  task_count: number;
}

Features:

  • 项目列表展示 (使用 DataTable 组件)
  • 创建项目按钮
  • 项目搜索和筛选
  • 项目操作 (查看详情、编辑、删除)

4. ProjectDetailView

Location: web/apps/lq_label/src/views/ProjectDetailView/ProjectDetailView.tsx

Responsibility: 显示项目详情和关联任务列表

Interface:

interface ProjectDetailViewProps {
  projectId: string;
}

Features:

  • 项目基本信息展示
  • 项目编辑功能
  • 关联任务列表
  • 创建任务按钮

5. TaskListView

Location: web/apps/lq_label/src/views/TaskListView/TaskListView.tsx

Responsibility: 显示任务列表,支持筛选和操作

Interface:

interface TaskListViewProps {}

interface Task {
  id: string;
  project_id: string;
  name: string;
  data: any;
  status: 'pending' | 'in_progress' | 'completed';
  assigned_to: string | null;
  created_at: string;
  progress: number;
}

Features:

  • 任务列表展示
  • 状态筛选
  • 任务操作 (开始标注、查看详情、删除)

6. AnnotationView

Location: web/apps/lq_label/src/views/AnnotationView/AnnotationView.tsx

Responsibility: 标注界面,集成 LabelStudio 编辑器

Interface:

interface AnnotationViewProps {
  taskId: string;
}

interface AnnotationData {
  id: string;
  task_id: string;
  user_id: string;
  result: any;
  created_at: string;
  updated_at: string;
}

Features:

  • LabelStudio 编辑器集成
  • 标注保存和提交
  • 跳过功能
  • 进度显示

7. ProjectForm Component

Location: web/apps/lq_label/src/components/ProjectForm/ProjectForm.tsx

Responsibility: 项目创建和编辑表单

Interface:

interface ProjectFormProps {
  project?: Project;
  onSubmit: (data: ProjectFormData) => void;
  onCancel: () => void;
}

interface ProjectFormData {
  name: string;
  description: string;
  config: string;
}

8. TaskForm Component

Location: web/apps/lq_label/src/components/TaskForm/TaskForm.tsx

Responsibility: 任务创建表单

Interface:

interface TaskFormProps {
  projectId: string;
  onSubmit: (data: TaskFormData) => void;
  onCancel: () => void;
}

interface TaskFormData {
  name: string;
  data: any;
  assigned_to: string | null;
}

Backend API Endpoints

Project API

Router: backend/routers/project.py

Endpoints:

GET    /api/projects              # List all projects
POST   /api/projects              # Create project
GET    /api/projects/{id}         # Get project by ID
PUT    /api/projects/{id}         # Update project
DELETE /api/projects/{id}         # Delete project

Models:

class ProjectCreate(BaseModel):
    name: str
    description: str
    config: str

class ProjectUpdate(BaseModel):
    name: Optional[str]
    description: Optional[str]
    config: Optional[str]

class ProjectResponse(BaseModel):
    id: str
    name: str
    description: str
    config: str
    created_at: datetime
    task_count: int

Task API

Router: backend/routers/task.py

Endpoints:

GET    /api/tasks                 # List all tasks (with filters)
POST   /api/tasks                 # Create task
GET    /api/tasks/{id}            # Get task by ID
PUT    /api/tasks/{id}            # Update task
DELETE /api/tasks/{id}            # Delete task
GET    /api/projects/{id}/tasks   # Get tasks by project

Models:

class TaskCreate(BaseModel):
    project_id: str
    name: str
    data: dict
    assigned_to: Optional[str]

class TaskUpdate(BaseModel):
    name: Optional[str]
    data: Optional[dict]
    status: Optional[str]
    assigned_to: Optional[str]

class TaskResponse(BaseModel):
    id: str
    project_id: str
    name: str
    data: dict
    status: str
    assigned_to: Optional[str]
    created_at: datetime
    progress: float

Annotation API

Router: backend/routers/annotation.py

Endpoints:

GET    /api/annotations           # List annotations (with filters)
POST   /api/annotations           # Create annotation
GET    /api/annotations/{id}      # Get annotation by ID
PUT    /api/annotations/{id}      # Update annotation
GET    /api/tasks/{id}/annotations # Get annotations by task

Models:

class AnnotationCreate(BaseModel):
    task_id: str
    user_id: str
    result: dict

class AnnotationUpdate(BaseModel):
    result: dict

class AnnotationResponse(BaseModel):
    id: str
    task_id: str
    user_id: str
    result: dict
    created_at: datetime
    updated_at: datetime

Data Models

Database Schema

-- Projects table
CREATE TABLE projects (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    config TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Tasks table
CREATE TABLE tasks (
    id TEXT PRIMARY KEY,
    project_id TEXT NOT NULL,
    name TEXT NOT NULL,
    data TEXT NOT NULL,  -- JSON string
    status TEXT DEFAULT 'pending',
    assigned_to TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
);

-- Annotations table
CREATE TABLE annotations (
    id TEXT PRIMARY KEY,
    task_id TEXT NOT NULL,
    user_id TEXT NOT NULL,
    result TEXT NOT NULL,  -- JSON string
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
);

State Management (Jotai Atoms)

Location: web/apps/lq_label/src/atoms/

// projectAtoms.ts
export const projectsAtom = atom<Project[]>([]);
export const currentProjectAtom = atom<Project | null>(null);
export const projectLoadingAtom = atom<boolean>(false);
export const projectErrorAtom = atom<string | null>(null);

// taskAtoms.ts
export const tasksAtom = atom<Task[]>([]);
export const currentTaskAtom = atom<Task | null>(null);
export const taskLoadingAtom = atom<boolean>(false);
export const taskErrorAtom = atom<string | null>(null);
export const taskFilterAtom = atom<TaskFilter>({
  status: null,
  projectId: null,
});

// annotationAtoms.ts
export const currentAnnotationAtom = atom<AnnotationData | null>(null);
export const annotationLoadingAtom = atom<boolean>(false);
export const annotationErrorAtom = atom<string | null>(null);
export const lsfInstanceAtom = atom<any>(null);

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system—essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Project creation adds to list

For any valid project data (non-empty name and config), creating a project should result in the project appearing in the projects list with a unique ID.

Validates: Requirements 1.3

Property 2: Empty project name rejection

For any project creation attempt with an empty or whitespace-only name, the frontend should prevent submission and display a validation error.

Validates: Requirements 1.4

Property 3: Project deletion cascades

For any project with associated tasks, deleting the project should also delete all associated tasks and their annotations.

Validates: Requirements 1.7

Property 4: Task creation associates with project

For any valid task data with a valid project_id, creating a task should result in the task being associated with that project and appearing in the project's task list.

Validates: Requirements 2.2

Property 5: Task status filtering

For any task status filter value, the displayed task list should only contain tasks matching that status.

Validates: Requirements 2.4

Property 6: Task completion updates status

For any task where all data items have been annotated, the task status should automatically update to 'completed'.

Validates: Requirements 2.7

Property 7: User task assignment filtering

For any user, the task list view should only display tasks assigned to that user.

Validates: Requirements 3.1

Property 8: Annotation saves update progress

For any annotation save operation, the associated task's progress should be updated to reflect the number of completed annotations.

Validates: Requirements 3.3

Property 9: Empty annotation rejection

For any annotation submission attempt with empty or null result data, the frontend should prevent submission and display an error message.

Validates: Requirements 3.4

Property 10: LabelStudio config initialization

For any annotation view load, the LabelStudio editor should be initialized with the project's annotation config.

Validates: Requirements 3.7

Property 11: API error responses

For any invalid API request (missing required fields, invalid IDs, etc.), the backend should return a 4xx status code with a descriptive error message.

Validates: Requirements 5.6

Property 12: Project ID validation on task creation

For any task creation request, if the project_id does not exist in the database, the backend should reject the request with a 404 error.

Validates: Requirements 6.4

Property 13: Task ID validation on annotation creation

For any annotation creation request, if the task_id does not exist in the database, the backend should reject the request with a 404 error.

Validates: Requirements 6.5

Property 14: JSON serialization round-trip

For any valid annotation result object, serializing to JSON and deserializing should produce an equivalent object.

Validates: Requirements 6.7

Property 15: Navigation menu highlighting

For any active route, the corresponding menu item in the sidebar should be visually highlighted.

Validates: Requirements 7.3

Property 16: Editor cleanup on unmount

For any LabelStudio editor instance, when the annotation view unmounts, all editor resources should be properly cleaned up (event listeners, DOM references, MST subscriptions).

Validates: Requirements 8.8

Error Handling

Frontend Error Handling

  1. API Request Errors

    • Use try-catch blocks for all API calls
    • Display user-friendly error messages using Toast notifications
    • Log errors to console for debugging
    • Set error atoms for component-level error display
  2. Form Validation Errors

    • Validate inputs before submission
    • Display inline validation errors
    • Prevent form submission until valid
  3. Editor Errors

    • Wrap LabelStudio initialization in try-catch
    • Display error state if editor fails to load
    • Implement cleanup to prevent memory leaks
  4. Error Boundary

    • Implement React Error Boundary at app level
    • Display fallback UI for unhandled errors
    • Log errors for monitoring

Backend Error Handling

  1. Validation Errors

    • Use Pydantic models for request validation
    • Return 422 status code with validation details
    • Provide clear error messages
  2. Not Found Errors

    • Return 404 status code for missing resources
    • Include resource type and ID in error message
  3. Database Errors

    • Catch SQLite exceptions
    • Return 500 status code for database errors
    • Log errors for debugging
  4. CORS Errors

    • Configure CORS middleware properly
    • Allow frontend origin in development and production

Testing Strategy

Unit Testing

Frontend:

  • Test individual components with React Testing Library
  • Test utility functions and hooks
  • Test form validation logic
  • Test state management atoms
  • Mock API calls with MSW (Mock Service Worker)

Backend:

  • Test API endpoints with pytest
  • Test database operations
  • Test request/response models
  • Test error handling

Example Unit Tests:

  • ProjectForm validates empty name
  • TaskListView filters by status
  • API returns 404 for invalid project ID
  • Database cascade deletes work correctly

Property-Based Testing

Configuration:

  • Use fast-check for frontend property tests
  • Use Hypothesis for backend property tests
  • Run minimum 100 iterations per property test
  • Tag each test with feature name and property number

Frontend Property Tests:

  • Property 2: Empty project name rejection
  • Property 5: Task status filtering
  • Property 7: User task assignment filtering
  • Property 9: Empty annotation rejection
  • Property 15: Navigation menu highlighting

Backend Property Tests:

  • Property 1: Project creation adds to list
  • Property 3: Project deletion cascades
  • Property 4: Task creation associates with project
  • Property 6: Task completion updates status
  • Property 8: Annotation saves update progress
  • Property 11: API error responses
  • Property 12: Project ID validation on task creation
  • Property 13: Task ID validation on annotation creation
  • Property 14: JSON serialization round-trip

Integration Property Tests:

  • Property 10: LabelStudio config initialization
  • Property 16: Editor cleanup on unmount

Test Tag Format:

// Frontend example
test('Feature: annotation-platform, Property 2: Empty project name rejection', () => {
  fc.assert(
    fc.property(fc.string(), (name) => {
      // Test that whitespace-only names are rejected
    })
  );
});
# Backend example
@given(st.text())
def test_property_12_project_id_validation(project_id):
    """Feature: annotation-platform, Property 12: Project ID validation on task creation"""
    # Test that invalid project IDs are rejected

Integration Testing

  • Test complete user flows (create project → create task → annotate)
  • Test API integration with frontend
  • Test database transactions
  • Test LabelStudio editor integration

End-to-End Testing

  • Use Cypress for E2E tests
  • Test critical user journeys
  • Test across different browsers
  • Test responsive design

Implementation Notes

Frontend Development

  1. Initialize Nx App

    cd web
    nx generate @nx/react:application lq_label --style=scss --bundler=webpack
    
  2. Project Structure

    web/apps/lq_label/
    ├── src/
    │   ├── app/
    │   │   ├── App.tsx
    │   │   └── App.module.scss
    │   ├── components/
    │   │   ├── Layout/
    │   │   ├── ProjectForm/
    │   │   └── TaskForm/
    │   ├── views/
    │   │   ├── ProjectListView/
    │   │   ├── ProjectDetailView/
    │   │   ├── TaskListView/
    │   │   └── AnnotationView/
    │   ├── atoms/
    │   │   ├── projectAtoms.ts
    │   │   ├── taskAtoms.ts
    │   │   └── annotationAtoms.ts
    │   ├── services/
    │   │   └── api.ts
    │   ├── utils/
    │   │   └── helpers.ts
    │   ├── main.tsx
    │   └── index.html
    
  3. Key Dependencies

    • react-router-dom (routing)
    • jotai (state management)
    • @humansignal/ui (UI components)
    • @humansignal/editor (annotation editor)
    • axios (HTTP client)
  4. Styling Approach

    • Use Tailwind CSS utility classes
    • Use SCSS modules for component-specific styles
    • Follow semantic token naming from design tokens

Backend Development

  1. Project Structure

    backend/
    ├── main.py
    ├── database.py
    ├── models.py
    ├── routers/
    │   ├── project.py
    │   ├── task.py
    │   └── annotation.py
    ├── services/
    │   ├── project_service.py
    │   ├── task_service.py
    │   └── annotation_service.py
    ├── schemas/
    │   ├── project.py
    │   ├── task.py
    │   └── annotation.py
    └── requirements.txt
    
  2. Key Dependencies

    • fastapi
    • uvicorn
    • pydantic
    • sqlite3 (built-in)
    • python-multipart (for file uploads)
  3. Database Initialization

    • Create database on startup
    • Run migrations if needed
    • Use context manager for connections
  4. CORS Configuration

    from fastapi.middleware.cors import CORSMiddleware
       
    app.add_middleware(
       CORSMiddleware,
       allow_origins=["http://localhost:4200"],  # Frontend dev server
       allow_credentials=True,
       allow_methods=["*"],
       allow_headers=["*"],
    )
    

LabelStudio Integration

  1. Dynamic Import

    • Import @humansignal/editor dynamically in AnnotationView
    • Prevents loading editor code until needed
  2. Instance Lifecycle

    • Create instance when task loads
    • Destroy instance on unmount or task change
    • Clean up MST subscriptions
  3. Config Loading

    • Fetch project config from API
    • Pass config to LabelStudio constructor
    • Handle config errors gracefully
  4. Annotation Serialization

    • Use MST onSnapshot to observe changes
    • Serialize annotation to JSON
    • Store in Jotai atom for display/save

Deployment Considerations

Frontend Deployment

  • Build with nx build lq_label --prod
  • Output to dist/apps/lq_label
  • Serve as static files
  • Configure base href if not at root

Backend Deployment

  • Run with uvicorn main:app --host 0.0.0.0 --port 8000
  • Use environment variables for configuration
  • Set up proper CORS for production domain
  • Consider using gunicorn for production

Database

  • SQLite file location configurable via environment variable
  • Backup strategy for production
  • Consider migration to PostgreSQL for scale