This tutorial guides you through the process of configuring and running the original DeepSeek R1 671B using Distributed Ascend MindIE on a GPUStack cluster.
Due to the extremely large size of the model, distributed inference across multiple workers is usually required.
GPUStack enables easy setup and orchestration of distributed inference using Ascend MindIE, making it possible to run massive models like DeepSeek R1 with minimal manual configuration.
Before you begin, make sure the following requirements are met:
!!! note
- In this tutorial, we assume a setup of 4 nodes, each equipped with 8 910B3 NPUs and connected via 200G Huawei Cache Conherence Network (HCCN).
- Altas NPUs do not support the FP8 precision originally used by DeepSeek R1. Hence, we use the BF16 version from [Unsloth](https://huggingface.co/unsloth/DeepSeek-R1-BF16).
According to the Installation, you can use the following command to start the GPUStack server:
sudo docker run -d --name gpustack \
--restart unless-stopped \
-p 80:80 \
--volume gpustack-data:/var/lib/gpustack \
--volume /path/to/your/model:/path/to/your/model \
gpustack/gpustack
!!! note
- Replace `/path/to/your/model` with the actual path on your system where the DeepSeek R1 model files are stored.
After GPUStack server is up and running, run the following commands to get the initial admin password:
sudo docker exec gpustack \
cat /var/lib/gpustack/initial_admin_password
Login to the GPUStack UI using the admin user and the obtained password.
http://your_gpustack_server_ip_or_hostname
Navigate to the Workers page in the GPUStack UI, click Add Worker button to get the command for adding workers.
And then on each worker node, run the worker adding command to start a GPUStack worker:
sudo docker run -d --name gpustack \
--restart unless-stopped \
--privileged \
--env "ASCEND_VISIBLE_DEVICES=$(sudo ls /dev/davinci* | head -1 | grep -o '[0-9]\+' || echo "0")" \
--network host \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume gpustack-data:/var/lib/gpustack \
--volume /path/to/your/model:/path/to/your/model \
--runtime ascend \
gpustack/gpustack \
--server-url http://your_gpustack_server_ip_or_hostname \
--token your_gpustack_cluster_token
!!! note
- Replace the placeholder paths, IP address/hostname, and cluster token accordingly.
- Replace `/path/to/your/model` with the actual path on your system where the DeepSeek R1 model files are stored.
- Ensure the `hccn_tool` tool is installed and configured correctly on your system. This is required for discvoring the HCCN network communication.
After all workers are added, return to the GPUStack UI.
Navigate to the Workers page to verify that all workers are in the Ready state and their GPUs are listed.
Deployments page.Deploy Model.Local Path as your source.DeepSeek-R1) in the Name field.Model Path as the directory that contains the DeepSeek R1 model files on each worker node.Backend is set to Ascend MindIE.Advanced, append following parameters to Backend Parameters:
--data-parallel-size=4--tensor-parallel-size=8--moe-tensor-parallel-size=1--moe-expert-parallel-size=32--npu-memory-fraction=0.95, since we are using Data Parallelism, the memory fraction should be set to 0.95 to ensure efficient memory usage across all NPUs.Save to deploy.You can monitor the deployment status on the Deployments page. Hover over distributed across workers to view GPU and worker usage. Click View Logs to see real-time logs showing model loading progress. It may take a few minutes to load the model.
After the model is running, navigate to the Workers page to check GPU utilization. Ascend MindIE uses 95% of NPU memory with above settings.
Once the model is deployed and running, you can test it using the GPUStack Playground.
Playground -> Chat.DeepSeek-R1.You can also use the Compare tab to test concurrent inference scenarios.
You have now successfully deployed and run DeepSeek R1 671B using Distributed Ascend MindIE on a GPUStack cluster. Explore the model's performance and capabilities in your own applications.
For further assistance, feel free to reach out to the GPUStack community or support team.