result.txt 4.4 KB

123456789101112131415161718192021222324252627282930313233
  1. INFO: 172.19.0.3:52548 - "POST /api/v1/datasets/download HTTP/1.0" 200 OK
  2. INFO: 127.0.0.1:46426 - "GET /health HTTP/1.1" 200 OK
  3. INFO: 172.19.0.3:48310 - "GET /api/v1/models/ HTTP/1.0" 200 OK
  4. INFO: 172.19.0.3:48320 - "GET /api/v1/datasets/ HTTP/1.0" 200 OK
  5. INFO: 172.19.0.3:48332 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
  6. 2026-05-15 17:24:03 | INFO | peft-platform | Job 5999c2df-0b6a-4ec2-a99a-9894ef923a85 enqueued
  7. 2026-05-15 17:24:03 | INFO | peft-platform | Training job created: 5999c2df-0b6a-4ec2-a99a-9894ef923a85
  8. INFO: 172.19.0.3:48340 - "POST /api/v1/training/jobs HTTP/1.0" 200 OK
  9. 2026-05-15 17:24:03 | INFO | peft-platform | Preprocessed 60 samples for sft/alpaca
  10. INFO: 172.19.0.3:48356 - "GET /api/v1/training/jobs HTTP/1.0" 200 OK
  11. INFO: 172.19.0.3:48362 - "GET /api/v1/models/ HTTP/1.0" 200 OK
  12. INFO: 172.19.0.3:48360 - "GET /api/v1/datasets/ HTTP/1.0" 200 OK
  13. 2026-05-15 17:24:13 | INFO | peft-platform | CUDA available: True
  14. 2026-05-15 17:24:13 | INFO | peft-platform | CUDA device count: 1
  15. 2026-05-15 17:24:13 | INFO | peft-platform | GPU 0: MetaX N260
  16. 2026-05-15 17:24:13 | INFO | peft-platform | GPU 0 memory: 63.78 GB
  17. [transformers] `torch_dtype` is deprecated! Use `dtype` instead!
  18. 2026-05-15 17:24:14 | WARNING | fla.utils | Current Triton version 3.0.0 is below the recommended 3.2.0 version. Errors may occur and these issues will not be fixed. Please consider upgrading Triton.
  19. 2026-05-15 17:24:14 | WARNING | fla.utils | Current Python version 3.10 is below the recommended 3.11 version. It is recommended to upgrade to Python 3.11 or higher for the best experience.
  20. 2026-05-15 17:24:20 | WARNING | fla.ops.rwkv7.fused_addcmul | torch.compile is not available in Python 3.10, using identity decorator instead
  21. /opt/conda/lib/python3.10/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  22. warnings.warn(_BETA_TRANSFORMS_WARNING)
  23. /opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
  24. warnings.warn(_BETA_TRANSFORMS_WARNING)
  25. Loading weights: 100%|██████████| 320/320 [00:00<00:00, 382.46it/s]
  26. 2026-05-15 17:24:21 | INFO | peft-platform | Loaded model: Qwen/Qwen3.5-0.8B
  27. Map: 100%|██████████| 60/60 [00:00<00:00, 2212.59 examples/s]
  28. /opt/conda/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:1348: UserWarning: Model has `tie_word_embeddings=True` and a tied layer is part of the adapter, but `ensure_weight_tying` is not set to True. This can lead to complications, for example when merging the adapter or converting your model to formats other than safetensors. Check the discussion here: https://github.com/huggingface/peft/issues/2777
  29. warnings.warn(msg)
  30. [transformers] warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
  31. trainable params: 5,070,848 || all params: 757,463,872 || trainable%: 0.6695
  32. 0%| | 0/12 [00:00<?, ?it/s]2026-05-15 17:27:03 | ERROR | peft-platform | Training failed for job 5999c2df-0b6a-4ec2-a99a-9894ef923a85: out of resource: shared memory, Required: 106496, Hardware limit: 65536. Reducing block sizes or `num_stages` may help.
  33. 2026-05-15 17:27:03 | ERROR | peft-platform | Job 5999c2df-0b6a-4ec2-a99a-9894ef923a85 failed: out of resource: shared memory, Required: 106496, Hardware limit: 65536. Reducing block sizes or `num_stages` may help.