INFO:     172.20.0.4:39360 - "POST /api/oauth/exchange-code HTTP/1.0" 200 OK
INFO:     172.20.0.4:39364 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:39378 - "GET /api/v1/datasets/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:39394 - "GET /api/v1/models/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:50946 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:50944 - "GET /api/v1/models/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:50952 - "GET /api/v1/datasets/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:50958 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
2026-05-27 02:30:41 | INFO     | peft-platform | Training job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf: num_gpus=1, batch_size=16
2026-05-27 02:30:41 | INFO     | peft-platform | Job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf enqueued
2026-05-27 02:30:41 | INFO     | peft-platform | Training job created: 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf
INFO:     172.20.0.4:50972 - "POST /api/v1/training/jobs HTTP/1.0" 200 OK
2026-05-27 02:30:41 | INFO     | app.engines.text_engine | Preprocessed 0 samples for ppo/alpaca
INFO:     172.20.0.4:50998 - "GET /api/v1/models/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:50984 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:51000 - "GET /api/v1/datasets/ HTTP/1.0" 200 OK
INFO:     172.20.0.4:51012 - "WebSocket /ws/training/4fd86f1d-3f2f-48ac-92a4-8e236159d1cf?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhZjgyN2IxZC0wM2IxLTQwZGMtOTliMC1jOGRjYTEzNWEwNmUiLCJ1c2VybmFtZSI6InN1cGVyX2FkbWluIiwicm9sZXMiOlsic3VwZXJfYWRtaW4iXSwiZXhwIjoxNzc5ODUwMjMzLCJpYXQiOjE3Nzk4NDkwMzMsInR5cGUiOiJhY2Nlc3MifQ.WvY2rgy_lvYhdR4UGaXA6x1X5MiMFvWKwqk3JzQdpOY" [accepted]
2026-05-27 02:30:41 | INFO     | peft-platform | 客户端已连接到训练 WebSocket (job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf)
INFO:     connection open
INFO:     172.20.0.4:35710 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:35720 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:43638 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     127.0.0.1:40052 - "GET /health HTTP/1.1" 200 OK
INFO:     172.20.0.4:43646 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:59604 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
2026-05-27 02:31:07 | INFO     | peft-platform | Remote cleanup result: true
cleaned 147 processes
2026-05-27 02:32:00 | INFO     | peft-platform | Created remote dataset directory: /root/Fine-tuning/backend/data/datasets
2026-05-27 02:32:00 | INFO     | peft-platform | Uploading dataset file: /root/Fine-tuning/backend/data/uploads/ppo_sample.jsonl -> /root/Fine-tuning/backend/data/datasets/ppo_sample.jsonl
2026-05-27 02:32:18 | INFO     | peft-platform | Dataset uploaded successfully: /root/Fine-tuning/backend/data/datasets/ppo_sample.jsonl
2026-05-27 02:32:53 | INFO     | peft-platform | Remote training launched in container: job=4fd86f1d-3f2f-48ac-92a4-8e236159d1cf, container_pid=26886
INFO:     127.0.0.1:57260 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:59094 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:55910 - "GET /health HTTP/1.1" 200 OK
INFO:     172.20.0.4:37264 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:59616 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:37248 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42048 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:45268 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42050 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42172 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42170 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42188 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42186 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42194 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42198 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42202 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42218 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42234 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42252 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42262 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42246 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42270 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42272 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:42284 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     172.20.0.4:44220 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     127.0.0.1:52000 - "GET /health HTTP/1.1" 200 OK
2026-05-27 02:33:46 | ERROR    | peft-platform | Remote job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf failed: num_samples should be a positive integer value, but got num_samples=0
INFO:     172.20.0.4:51606 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
INFO:     127.0.0.1:54618 - "GET /health HTTP/1.1" 200 OK
INFO:     172.20.0.4:47416 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
2026-05-27 02:33:56 | ERROR    | peft-platform | SSH command timeout after 10s: docker exec finetune-trainer bash -c 'kill -9 26886 2>/dev/null; pkill -9 -P 26886 2>/dev/null'
2026-05-27 02:33:56 | INFO     | peft-platform | Killed remote process 26886 via docker exec
2026-05-27 02:33:56 | INFO     | peft-platform | Remote training launched for job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf
2026-05-27 02:33:56 | INFO     | peft-platform | 客户端已从训练 WebSocket 断开 (job 4fd86f1d-3f2f-48ac-92a4-8e236159d1cf)
INFO:     connection closed
