Bladeren bron

加入同步传输代码,远程文件传输

lxylxy123321 1 week geleden
bovenliggende
commit
8f80e3d08b
9 gewijzigde bestanden met toevoegingen van 152 en 511 verwijderingen
  1. 4 118
      CLAUDE.md
  2. 5 1
      backend/Dockerfile
  3. 7 2
      backend/app/core/job_queue.py
  4. 39 3
      backend/app/core/remote_executor.py
  5. 6 23
      backend/app/engines/remote_train.py
  6. 26 0
      backend/entrypoint.sh
  7. 24 0
      deploy.sh
  8. 21 0
      deploy_remote.sh
  9. 20 364
      result.txt

+ 4 - 118
CLAUDE.md

@@ -1,124 +1,6 @@
 # CLAUDE.md
 
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## 项目概览
-
-基于 PEFT 的前后端分离微调平台,支持文本(LLaMA/Qwen)、视觉(ViT/CLIP)、多模态(LLaVA/Qwen-VL)三类模型,完整 MLOps 流水线。
-
-## 常用命令
-
-### 后端
-
-```bash
-cd backend
-pip install -r requirements.txt
-uvicorn main:app --host 0.0.0.0 --port 8000 --reload
-```
-
-API 文档:`http://192.168.91.253:8000/docs`
-
-### 前端
-
-```bash
-cd frontend
-npm install
-npm run dev          # 开发
-npm run build        # 构建
-npm run preview      # 预览构建产物
-```
-
-## 架构
-
-### 三层环境变量
-
-| 文件 | 归属 | 路径 |
-|------|------|------|
-| `.env` | 全局共享 | HF Token、GPU、ModelScope |
-| `backend/.env` | 后端专属 | HOST/PORT/CORS、数据库、训练参数 |
-| `frontend/.env` | 前端专属 | VITE_API_BASE_URL、VITE_WS_BASE_URL |
-
-`backend/app/config.py` 中的 `Settings` 类通过 `pydantic-settings` 加载 `backend/.env`。
-
-### 后端分层
-
-```
-main.py → FastAPI 入口(挂载路由 + CORS + 健康检查)
-├── api/        REST 路由层(薄,委托给 services)
-│   ├── models.py       GET /api/v1/models, POST /download
-│   ├── datasets.py     上传/预览/验证/列表/删除
-│   ├── training.py     训练任务 CRUD + 取消 + 日志流
-│   ├── evaluation.py   评估运行 + 结果查询
-│   └── deployment.py   adapter 合并 + 导出
-├── core/       基础设施
-│   ├── db.py           SQLAlchemy async + aiosqlite
-│   ├── job_queue.py    训练任务状态模型 + 状态机
-│   ├── websocket.py    WebSocket 广播(进度/错误/心跳)
-│   └── logging.py      结构化日志
-├── services/   业务逻辑层
-│   ├── model_service.py     模型下载/缓存
-│   ├── dataset_service.py   上传/格式检测
-│   ├── training_service.py  任务编排
-│   ├── eval_service.py      评估
-│   └── deploy_service.py    adapter 导出
-├── engines/    按模型类型的训练引擎(BaseEngine 抽象接口)
-│   ├── base.py             load_model / get_peft_config / preprocess_dataset / train
-│   ├── text_engine.py      LLaMA/Qwen
-│   ├── vision_engine.py    ViT/CLIP
-│   └── multimodal_engine.py LLaVA/Qwen-VL
-├── peft/       PEFT 配置工厂(LoRA/QLoRA/IA3/AdaLoRA/PrefixTuning)
-├── schemas/    Pydantic 请求/响应模型
-└── preprocessors/ 数据预处理(待实现)
-```
-
-### 前端结构
-
-```
-src/
-├── api/
-│   ├── client.ts      Axios 实例 + 拦截器
-│   └── websocket.ts   WebSocket 管理器 + 自动重连
-├── stores/
-│   └── trainingStore.ts  Zustand 训练任务状态
-├── components/layout/Layout.tsx  侧边栏导航 + 内容区
-├── pages/            Dashboard / Models / Datasets / Training / Evaluation / Deployment
-└── App.tsx           React Router 路由定义
-```
-
-### 关键数据流
-
-1. 前端通过 `/api/v1/training/jobs` POST 创建训练任务
-2. 后端 `training_service` 创建任务记录,加入 job_queue
-3. 训练引擎从 `BaseEngine` 派生(text/vision/multimodal)执行训练
-4. `app/core/websocket.py` 通过 `/ws/training/{job_id}` 实时推送进度到前端
-5. 前端 `wsManager.subscribe()` 接收消息,更新 `trainingStore`
-
-### WebSocket 消息类型
-
-| type | 说明 | 关键字段 |
-|------|------|----------|
-| progress | 训练进度 | epoch, step, total_steps, loss, learning_rate |
-| epoch_done | epoch 完成 | epoch, eval_loss, eval_accuracy |
-| completed | 训练完成 | total_time_seconds, adapter_path |
-| error | 错误 | message |
-| heartbeat | 心跳保活 | timestamp |
-
-### 任务状态机
-
-```
-pending → queued → preprocessing → training → completed
-                                                   ↓
-                            cancelled ←── any state ← evaluating → evaluation_done
-                            failed   ←── any state ←
-```
-
-### 部署信息
-
-- 服务器地址:`192.168.91.253`
-- 项目路径:`/root/Fine-tuning`
-- 数据目录:`/root/Fine-tuning/backend/data`
-- 数据库:`/root/Fine-tuning/backend/data/finetuning.db`
-
 ## 语言规范要求
 
 ### 基础规则
@@ -134,3 +16,7 @@ pending → queued → preprocessing → training → completed
 - 说明文字:简体中文
 - 代码块、终端命令、JSON、YAML、报错日志:保持原生英文不变
 - 列表、步骤、结论一律中文表述
+
+### 安全要求
+
+- 不要尝试去连接ssh远程,可以把命令给用户去执行

+ 5 - 1
backend/Dockerfile

@@ -8,7 +8,7 @@ WORKDIR /app
 RUN sed -i 's|deb.debian.org|mirrors.aliyun.com|g' /etc/apt/sources.list.d/debian.sources && \
     sed -i 's|security.debian.org|mirrors.aliyun.com|g' /etc/apt/sources.list.d/debian.sources
 
-RUN apt-get update && apt-get install -y git openssh-client sshpass && rm -rf /var/lib/apt/lists/*
+RUN apt-get update && apt-get install -y git openssh-client sshpass rsync && rm -rf /var/lib/apt/lists/*
 
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
@@ -21,3 +21,7 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
     CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8010/health')" || exit 1
 
 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8010"]
+
+COPY entrypoint.sh /entrypoint.sh
+RUN chmod +x /entrypoint.sh
+ENTRYPOINT ["/entrypoint.sh"]

+ 7 - 2
backend/app/core/job_queue.py

@@ -190,12 +190,17 @@ class JobQueue:
 
             # 判断是否远程执行
             if settings.use_remote_compute:
-                # 远程训练模式
+                # 远程训练模式 — 数据集路径已由上面的代码查好
+                if not dataset_path:
+                    dataset_path = self._find_dataset_path(dataset_id)
+                if not dataset_path:
+                    raise FileNotFoundError(f"Dataset not found: {dataset_id}")
+
                 self.update_job(job_id, status=JobStatus.TRAINING)
                 await self._notify_callbacks()
 
                 from app.core.remote_executor import run_training_remote, is_process_running
-                pid = run_training_remote(job_id, model_id, model_type, dataset_id, config)
+                pid = run_training_remote(job_id, model_id, model_type, dataset_path, config)
 
                 if not pid:
                     raise RuntimeError("Failed to launch remote training")

+ 39 - 3
backend/app/core/remote_executor.py

@@ -36,6 +36,26 @@ def scp_to_remote(local_path: str, remote_path: str) -> tuple[int, str, str]:
         return -1, "", str(e)
 
 
+def scp_to_remote_dir(local_path: str, remote_path: str) -> tuple[int, str, str]:
+    """通过 SCP 把本地目录递归复制到远端主机。"""
+    target = f"{settings.compute_node_ssh_user}@{settings.compute_node_host}"
+    scp_args = ["scp", "-r", *_get_ssh_prefix(), "-P", str(settings.compute_node_ssh_port)]
+    if settings.compute_node_ssh_key:
+        scp_args += ["-i", settings.compute_node_ssh_key]
+    elif settings.compute_node_ssh_password:
+        scp_args = ["sshpass", "-p", settings.compute_node_ssh_password] + scp_args
+    scp_args += [local_path, f"{target}:{remote_path}"]
+
+    try:
+        proc = subprocess.run(scp_args, capture_output=True, text=True, timeout=120)
+        clean_stderr = "\n".join(line for line in proc.stderr.split("\n")
+                                  if not line.startswith("Warning:"))
+        return proc.returncode, proc.stdout, clean_stderr
+    except Exception as e:
+        logger.error(f"SCP dir failed: {e}")
+        return -1, "", str(e)
+
+
 def ssh_exec(cmd: str, timeout: int | None = None) -> tuple[int, str, str]:
     """通过 SSH 在算力节点执行命令,返回 (exit_code, stdout, stderr)。"""
     if not settings.use_remote_compute:
@@ -79,12 +99,13 @@ def run_training_remote(
     job_id: str,
     model_id: str,
     model_type: str,
-    dataset_id: str,
+    dataset_path: str,
     config: dict[str, Any],
 ) -> str | None:
     """在算力节点启动训练任务(通过 docker exec,后台执行)。
 
     通过 SCP 把配置文件传到远端宿主机,再在容器内启动训练。
+    dataset_path 由主节点预先查好,直接传给远程脚本。
     """
     import tempfile
 
@@ -102,13 +123,28 @@ def run_training_remote(
         logger.error(f"SCP config file failed: ret_code={ret_code}")
         return None
 
-    # 在容器内启动训练(不再依赖 stdin pipe)
+    # 把数据集路径也传到远程(SCP 到 data/uploads/ 目录)
+    remote_dataset_name = os.path.basename(dataset_path)
+    remote_dataset_path = f"{settings.compute_node_remote_data_dir}/datasets/{remote_dataset_name}"
+
+    if os.path.isdir(dataset_path):
+        # 目录:用 scp -r
+        ret_code, _, _ = scp_to_remote_dir(dataset_path, remote_dataset_path)
+    else:
+        # 文件:普通 scp
+        ret_code, _, _ = scp_to_remote(dataset_path, remote_dataset_path)
+
+    if ret_code != 0:
+        logger.error(f"SCP dataset failed: ret_code={ret_code}")
+        return None
+
+    # 在容器内启动训练
     remote_cmd = (
         f"docker exec -w {settings.compute_node_workdir} "
         f"{settings.compute_node_docker_container} "
         f"bash -c '"
         f"nohup {settings.compute_node_python} -m app.engines.remote_train "
-        f"{job_id} {model_id} {model_type} {dataset_id} {remote_config_path} "
+        f"{job_id} {model_id} {model_type} {remote_dataset_path} {remote_config_path} "
         f"</dev/null >/tmp/train_{job_id}.log 2>&1 & echo $!'"
     )
 

+ 6 - 23
backend/app/engines/remote_train.py

@@ -72,7 +72,7 @@ class FileProgressCallback:
                        eval_accuracy=metrics.get("eval_accuracy"))
 
 
-async def run_training(job_id: str, model_id: str, model_type: str, dataset_id: str, config: dict):
+async def run_training(job_id: str, model_id: str, model_type: str, dataset_path: str, config: dict):
     """执行单个训练任务(远程调用入口)。"""
     from app.config import get_settings
     from app.core.logging import logger
@@ -81,26 +81,9 @@ async def run_training(job_id: str, model_id: str, model_type: str, dataset_id:
     _init_log_file(settings.data_dir, job_id)
 
     try:
-        # 查找数据集
-        from app.core.db import async_session, DatasetRecord
-        from sqlalchemy import select
-
-        dataset_path = None
-        async with async_session() as session:
-            result = await session.execute(select(DatasetRecord).where(
-                (DatasetRecord.id == dataset_id) | (DatasetRecord.name == dataset_id)
-            ))
-            record = result.scalar_one_or_none()
-            if record:
-                dataset_path = record.file_path
-
-        if not dataset_path:
-            upload_path = settings.uploads_dir / dataset_id
-            if upload_path.exists():
-                dataset_path = str(upload_path)
-
-        if not dataset_path:
-            raise FileNotFoundError(f"Dataset not found: {dataset_id}")
+        # dataset_path 由主节点直接传入
+        if not dataset_path or not Path(dataset_path).exists():
+            raise FileNotFoundError(f"Dataset not found: {dataset_path}")
 
         _write_log(type="status", status="preprocessing")
 
@@ -159,9 +142,9 @@ async def run_training(job_id: str, model_id: str, model_type: str, dataset_id:
 
 
 def main():
-    """命令行入口:python -m app.engines.remote_train <job_id> <model_id> <model_type> <dataset_id> <config_file>"""
+    """命令行入口:python -m app.engines.remote_train <job_id> <model_id> <model_type> <dataset_path> <config_file>"""
     if len(sys.argv) < 6:
-        print("Usage: python -m app.engines.remote_train <job_id> <model_id> <model_type> <dataset_id> <config_file>")
+        print("Usage: python -m app.engines.remote_train <job_id> <model_id> <model_type> <dataset_path> <config_file>")
         sys.exit(1)
 
     job_id = sys.argv[1]

+ 26 - 0
backend/entrypoint.sh

@@ -0,0 +1,26 @@
+#!/bin/bash
+# 容器启动时自动将 backend 代码同步到 253 训练节点
+
+REMOTE_USER="${COMPUTE_NODE_SSH_USER:-root}"
+REMOTE_HOST="${COMPUTE_NODE_HOST}"
+REMOTE_PASS="${COMPUTE_NODE_SSH_PASSWORD}"
+REMOTE_DIR="/root/Fine-tuning/backend"
+
+if [ -n "$REMOTE_HOST" ]; then
+  echo "=> Syncing backend code to compute node ${REMOTE_HOST} ..."
+  if [ -n "$REMOTE_PASS" ]; then
+    sshpass -p "$REMOTE_PASS" rsync -avz --delete \
+      -e "ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5" \
+      /app/ ${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_DIR}/
+  else
+    rsync -avz --delete \
+      -e "ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5" \
+      /app/ ${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_DIR}/
+  fi
+  echo "=> Sync done."
+else
+  echo "=> No compute node configured, skipping code sync."
+fi
+
+# 启动主进程
+exec "$@"

+ 24 - 0
deploy.sh

@@ -0,0 +1,24 @@
+#!/bin/bash
+# 一键部署:拉最新代码 → 构建后端 → 同步到253 → 重启
+
+set -e
+
+PROJECT_DIR="/root/Fine-tuning"
+REMOTE_USER="root"
+REMOTE_HOST="192.168.91.253"
+REMOTE_PASS="ictrek"
+
+cd ${PROJECT_DIR}
+
+echo "=== Step 1: Git pull ==="
+git pull
+
+echo "=== Step 2: Build backend ==="
+docker compose up -d --build backend
+
+echo "=== Step 3: Sync backend to 253 ==="
+sshpass -p "${REMOTE_PASS}" rsync -avz --delete \
+  -e "ssh -o StrictHostKeyChecking=no -p 22" \
+  ${PROJECT_DIR}/backend/ ${REMOTE_USER}@${REMOTE_HOST}:/root/Fine-tuning/backend/
+
+echo "=== Deploy done ==="

+ 21 - 0
deploy_remote.sh

@@ -0,0 +1,21 @@
+#!/bin/bash
+# 将 151 上的 backend 代码同步到 253 训练节点
+
+REMOTE_USER="root"
+REMOTE_HOST="192.168.91.253"
+REMOTE_PASS="ictrek"
+REMOTE_DIR="/root/Fine-tuning"
+LOCAL_BACKEND="./backend"
+
+echo "=> Syncing backend to ${REMOTE_HOST}:${REMOTE_DIR}/backend ..."
+
+sshpass -p "$REMOTE_PASS" rsync -avz --delete \
+  -e "ssh -o StrictHostKeyChecking=no -p 22" \
+  ${LOCAL_BACKEND}/ ${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_DIR}/backend/
+
+if [ $? -eq 0 ]; then
+  echo "=> Sync done."
+else
+  echo "=> Sync failed!"
+  exit 1
+fi

+ 20 - 364
result.txt

@@ -1,364 +1,20 @@
-finetune-backend  | 2026-05-20 05:13:34 | INFO     | peft-platform | Remote training launched in container: job=a52d395e-d3c8-40d2-9be3-1839f597dc7f, container_pid=12699
-finetune-backend  | INFO:     127.0.0.1:59032 - "GET /health HTTP/1.1" 200 OK
-finetune-backend  | INFO:     172.20.0.4:56196 - "GET /api/v1/training/jobs HTTP/1.0" 500 Internal Server Error
-finetune-backend  | ERROR:    Exception in ASGI application
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2443, in connect
-finetune-backend  |     return await connect_utils._connect(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1218, in _connect
-finetune-backend  |     conn = await _connect_addr(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1054, in _connect_addr
-finetune-backend  |     return await __connect_addr(params, True, *args)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1102, in __connect_addr
-finetune-backend  |     await connected
-finetune-backend  | asyncio.exceptions.CancelledError
-finetune-backend  | 
-finetune-backend  | During handling of the above exception, another exception occurred:
-finetune-backend  | 
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 421, in run_asgi
-finetune-backend  |     result = await app(  # type: ignore[func-returns-value]
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 56, in __call__
-finetune-backend  |     return await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1159, in __call__
-finetune-backend  |     await super().__call__(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 90, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
-finetune-backend  |     await self.app(scope, receive, _send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 88, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
-finetune-backend  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 660, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 680, in app
-finetune-backend  |     await route.handle(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 134, in app
-finetune-backend  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 120, in app
-finetune-backend  |     response = await f(request)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 674, in app
-finetune-backend  |     raw_response = await run_endpoint_function(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 328, in run_endpoint_function
-finetune-backend  |     return await dependant.call(**values)
-finetune-backend  |   File "/app/app/api/training.py", line 20, in list_training_jobs
-finetune-backend  |     items = await training_service.list_training_jobs()
-finetune-backend  |   File "/app/app/services/training_service.py", line 87, in list_training_jobs
-finetune-backend  |     result = await session.execute(select(TrainingJobModel).order_by(TrainingJobModel.created_at.desc()))
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 449, in execute
-finetune-backend  |     result = await greenlet_spawn(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
-finetune-backend  |     result = context.throw(*sys.exc_info())
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
-finetune-backend  |     return self._execute_internal(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2239, in _execute_internal
-finetune-backend  |     conn = self._connection_for_bind(bind)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2108, in _connection_for_bind
-finetune-backend  |     return trans._connection_for_bind(engine, execution_options)
-finetune-backend  |   File "<string>", line 2, in _connection_for_bind
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
-finetune-backend  |     ret_value = fn(self, *arg, **kw)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1187, in _connection_for_bind
-finetune-backend  |     conn = bind.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3293, in connect
-finetune-backend  |     return self._connection_cls(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 143, in __init__
-finetune-backend  |     self._dbapi_connection = engine.raw_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3317, in raw_connection
-finetune-backend  |     return self.pool.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 448, in connect
-finetune-backend  |     return _ConnectionFairy._checkout(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1272, in _checkout
-finetune-backend  |     fairy = _ConnectionRecord.checkout(pool)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
-finetune-backend  |     rec = pool._do_get()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 175, in _do_get
-finetune-backend  |     return self._create_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 389, in _create_connection
-finetune-backend  |     return _ConnectionRecord(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
-finetune-backend  | INFO:     172.20.0.4:56212 - "GET /api/v1/models/ HTTP/1.0" 500 Internal Server Error
-finetune-backend  |     self.__connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
-finetune-backend  |     self.dbapi_connection = connection = pool._invoke_creator(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 667, in connect
-finetune-backend  |     return dialect.connect(*cargs_tup, **cparams)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 630, in connect
-finetune-backend  |     return self.loaded_dbapi.connect(*cargs, **cparams)  # type: ignore[no-any-return]  # NOQA: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 955, in connect
-finetune-backend  |     await_only(creator_fn(*arg, **kw)),
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
-finetune-backend  |     return current.parent.switch(awaitable)  # type: ignore[no-any-return,attr-defined] # noqa: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
-finetune-backend  |     value = await result
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2442, in connect
-finetune-backend  |     async with compat.timeout(timeout):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 179, in __aexit__
-finetune-backend  |     self._do_exit(exc_type)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 265, in _do_exit
-finetune-backend  |     raise asyncio.TimeoutError
-finetune-backend  | asyncio.exceptions.TimeoutError
-finetune-backend  | ERROR:    Exception in ASGI application
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2443, in connect
-finetune-backend  |     return await connect_utils._connect(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1218, in _connect
-finetune-backend  |     conn = await _connect_addr(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1054, in _connect_addr
-finetune-backend  |     return await __connect_addr(params, True, *args)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1102, in __connect_addr
-finetune-backend  |     await connected
-finetune-backend  | asyncio.exceptions.CancelledError
-finetune-backend  | 
-finetune-backend  | During handling of the above exception, another exception occurred:
-finetune-backend  | 
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 421, in run_asgi
-finetune-backend  |     result = await app(  # type: ignore[func-returns-value]
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 56, in __call__
-finetune-backend  |     return await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1159, in __call__
-finetune-backend  |     await super().__call__(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 90, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
-finetune-backend  |     await self.app(scope, receive, _send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 88, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
-finetune-backend  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  | INFO:     172.20.0.4:56228 - "GET /api/v1/datasets/ HTTP/1.0" 500 Internal Server Error
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 660, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 680, in app
-finetune-backend  |     await route.handle(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 134, in app
-finetune-backend  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 120, in app
-finetune-backend  |     response = await f(request)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 674, in app
-finetune-backend  |     raw_response = await run_endpoint_function(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 328, in run_endpoint_function
-finetune-backend  |     return await dependant.call(**values)
-finetune-backend  |   File "/app/app/api/models.py", line 13, in list_models
-finetune-backend  |     models = await model_service.list_cached_models()
-finetune-backend  |   File "/app/app/services/model_service.py", line 123, in list_cached_models
-finetune-backend  |     result = await session.execute(select(ModelCache).order_by(ModelCache.created_at.desc()))
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 449, in execute
-finetune-backend  |     result = await greenlet_spawn(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
-finetune-backend  |     result = context.throw(*sys.exc_info())
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
-finetune-backend  |     return self._execute_internal(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2239, in _execute_internal
-finetune-backend  |     conn = self._connection_for_bind(bind)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2108, in _connection_for_bind
-finetune-backend  |     return trans._connection_for_bind(engine, execution_options)
-finetune-backend  |   File "<string>", line 2, in _connection_for_bind
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
-finetune-backend  |     ret_value = fn(self, *arg, **kw)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1187, in _connection_for_bind
-finetune-backend  |     conn = bind.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3293, in connect
-finetune-backend  |     return self._connection_cls(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 143, in __init__
-finetune-backend  |     self._dbapi_connection = engine.raw_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3317, in raw_connection
-finetune-backend  |     return self.pool.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 448, in connect
-finetune-backend  |     return _ConnectionFairy._checkout(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1272, in _checkout
-finetune-backend  |     fairy = _ConnectionRecord.checkout(pool)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
-finetune-backend  |     rec = pool._do_get()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 175, in _do_get
-finetune-backend  |     return self._create_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 389, in _create_connection
-finetune-backend  |     return _ConnectionRecord(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
-finetune-backend  |     self.__connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
-finetune-backend  |     self.dbapi_connection = connection = pool._invoke_creator(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 667, in connect
-finetune-backend  |     return dialect.connect(*cargs_tup, **cparams)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 630, in connect
-finetune-backend  |     return self.loaded_dbapi.connect(*cargs, **cparams)  # type: ignore[no-any-return]  # NOQA: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 955, in connect
-finetune-backend  |     await_only(creator_fn(*arg, **kw)),
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
-finetune-backend  |     return current.parent.switch(awaitable)  # type: ignore[no-any-return,attr-defined] # noqa: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
-finetune-backend  |     value = await result
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2442, in connect
-finetune-backend  |     async with compat.timeout(timeout):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 179, in __aexit__
-finetune-backend  |     self._do_exit(exc_type)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 265, in _do_exit
-finetune-backend  |     raise asyncio.TimeoutError
-finetune-backend  | asyncio.exceptions.TimeoutError
-finetune-backend  | ERROR:    Exception in ASGI application
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2443, in connect
-finetune-backend  |     return await connect_utils._connect(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1218, in _connect
-finetune-backend  |     conn = await _connect_addr(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1054, in _connect_addr
-finetune-backend  |     return await __connect_addr(params, True, *args)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 1102, in __connect_addr
-finetune-backend  |     await connected
-finetune-backend  | asyncio.exceptions.CancelledError
-finetune-backend  | 
-finetune-backend  | During handling of the above exception, another exception occurred:
-finetune-backend  | 
-finetune-backend  | Traceback (most recent call last):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 421, in run_asgi
-finetune-backend  |     result = await app(  # type: ignore[func-returns-value]
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 56, in __call__
-finetune-backend  |     return await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1159, in __call__
-finetune-backend  |     await super().__call__(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 90, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
-finetune-backend  |     await self.app(scope, receive, _send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 88, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
-finetune-backend  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 660, in __call__
-finetune-backend  |     await self.middleware_stack(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 680, in app
-finetune-backend  |     await route.handle(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
-finetune-backend  |     await self.app(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 134, in app
-finetune-backend  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
-finetune-backend  |     raise exc
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
-finetune-backend  |     await app(scope, receive, sender)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 120, in app
-finetune-backend  |     response = await f(request)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 674, in app
-finetune-backend  |     raw_response = await run_endpoint_function(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 328, in run_endpoint_function
-finetune-backend  |     return await dependant.call(**values)
-finetune-backend  |   File "/app/app/api/datasets.py", line 48, in list_datasets
-finetune-backend  |     items = await dataset_service.list_datasets()
-finetune-backend  |   File "/app/app/services/dataset_service.py", line 361, in list_datasets
-finetune-backend  |     result = await session.execute(select(DatasetRecord).order_by(DatasetRecord.created_at.desc()))
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 449, in execute
-finetune-backend  |     result = await greenlet_spawn(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
-finetune-backend  |     result = context.throw(*sys.exc_info())
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
-finetune-backend  |     return self._execute_internal(
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2239, in _execute_internal
-finetune-backend  |     conn = self._connection_for_bind(bind)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2108, in _connection_for_bind
-finetune-backend  |     return trans._connection_for_bind(engine, execution_options)
-finetune-backend  |   File "<string>", line 2, in _connection_for_bind
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py", line 137, in _go
-finetune-backend  |     ret_value = fn(self, *arg, **kw)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1187, in _connection_for_bind
-finetune-backend  |     conn = bind.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3293, in connect
-finetune-backend  |     return self._connection_cls(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 143, in __init__
-finetune-backend  |     self._dbapi_connection = engine.raw_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3317, in raw_connection
-finetune-backend  |     return self.pool.connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 448, in connect
-finetune-backend  |     return _ConnectionFairy._checkout(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1272, in _checkout
-finetune-backend  |     fairy = _ConnectionRecord.checkout(pool)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
-finetune-backend  |     rec = pool._do_get()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 175, in _do_get
-finetune-backend  |     return self._create_connection()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 389, in _create_connection
-finetune-backend  |     return _ConnectionRecord(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
-finetune-backend  |     self.__connect()
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
-finetune-backend  |     with util.safe_reraise():
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__
-finetune-backend  |     raise exc_value.with_traceback(exc_tb)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
-finetune-backend  |     self.dbapi_connection = connection = pool._invoke_creator(self)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 667, in connect
-finetune-backend  |     return dialect.connect(*cargs_tup, **cparams)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 630, in connect
-finetune-backend  |     return self.loaded_dbapi.connect(*cargs, **cparams)  # type: ignore[no-any-return]  # NOQA: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 955, in connect
-finetune-backend  |     await_only(creator_fn(*arg, **kw)),
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
-finetune-backend  |     return current.parent.switch(awaitable)  # type: ignore[no-any-return,attr-defined] # noqa: E501
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
-finetune-backend  |     value = await result
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 2442, in connect
-finetune-backend  |     async with compat.timeout(timeout):
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 179, in __aexit__
-finetune-backend  |     self._do_exit(exc_type)
-finetune-backend  |   File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 265, in _do_exit
-finetune-backend  |     raise asyncio.TimeoutError
-finetune-backend  | asyncio.exceptions.TimeoutError
-finetune-backend  | INFO:     127.0.0.1:44850 - "GET /health HTTP/1.1" 200 OK
-finetune-backend  | 2026-05-20 05:14:27 | INFO     | peft-platform | Remote training launched for job a52d395e-d3c8-40d2-9be3-1839f597dc7f
+(base) [root@localhost ~]# docker exec -w /root/Fine-tuning/backend finetune-trainer /opt/conda/bin/python -m app.engines.remote_train "test-manual-001" "Qwen/Qwen3.5-0.8B" "text" "/root/Fine-tuning/backend/data/processed/ms_yanalong_yanalong/distill_r1_sft.json" "/root/Fine-tuning/backend/data/config_92a0a9cd-46aa-48bc-b7ad-bd5a18270c51.json"
+2026-05-20 14:08:47 | ERROR    | peft-platform | Remote training failed: test-manual-001 - No module named 'sqlalchemy'
+Traceback (most recent call last):
+  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
+    return _run_code(code, main_globals, None,
+  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
+    exec(code, run_globals)
+  File "/root/Fine-tuning/backend/app/engines/remote_train.py", line 179, in <module>
+    main()
+  File "/root/Fine-tuning/backend/app/engines/remote_train.py", line 175, in main
+    asyncio.run(run_training(job_id, model_id, model_type, dataset_id, config))
+  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
+    return loop.run_until_complete(main)
+  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
+    return future.result()
+  File "/root/Fine-tuning/backend/app/engines/remote_train.py", line 85, in run_training
+    from app.core.db import async_session, DatasetRecord
+  File "/root/Fine-tuning/backend/app/core/db.py", line 3, in <module>
+    from sqlalchemy import Column, DateTime, Float, Integer, String, Text
+ModuleNotFoundError: No module named 'sqlalchemy'