This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
在 253 服务器(192.168.91.253)上重建 finetune-trainer 容器的命令:
docker stop finetune-trainer && docker rename finetune-trainer finetune-trainer-old && docker run -d --name finetune-trainer --privileged --network host --shm-size 4g -e MACA_MPS_MODE=1 -v /root/Fine-tuning/backend:/root/Fine-tuning/backend 5334348e7a9b tail -f /dev/null
5334348e7a9b(沐曦官方镜像的 image ID)--privileged 允许容器访问沐曦 GPU 设备--network host 使用宿主机网络--shm-size 4g(沐曦驱动 ring buffer 需要足够共享内存)1 启用沐曦 MPS 模式/root/Fine-tuning/backend(由 151 rsync 同步)/opt/conda/bin/python(conda 环境)容器创建后需要进入容器安装依赖:
docker exec -it finetune-trainer /opt/conda/bin/pip install peft trl accelerate bitsandbytes datasets
docker exec -it finetune-trainer /opt/conda/bin/pip install --no-deps --upgrade transformers huggingface-hub
注意: 253 容器不需要安装 fastapi/uvicorn。推理 worker(inference_worker.py)只用 Python 标准库 + torch/transformers,API 代理由 151 主节点提供。