# Running DeepSeek R1 671B with Distributed vLLM
This tutorial guides you through the process of configuring and running the original **DeepSeek R1 671B** using **Distributed vLLM** on a GPUStack cluster. Due to the extremely large size of the model, distributed inference across multiple workers is usually required.
MaaS-Base enables easy setup and orchestration of distributed inference using vLLM, making it possible to run massive models like DeepSeek R1 with minimal manual configuration.
## Prerequisites
Before you begin, make sure the following requirements are met:
- You have access to a sufficient number of Linux nodes, each equipped with the required GPUs. For example:
| **GPU** | **Number of Nodes** |
| ---------------- | ------------------- |
| H100/H800:8 | 2 |
| A100/A800-80GB:8 | 4 |
| A100/A800:8 | 8 |
- High-speed interconnects such as NVLink or InfiniBand are recommended for optimal performance.
- Model files should be downloaded to the same path on each node. While MaaS-Base supports on-the-fly model downloading, pre-downloading is recommended as it can be time consuming depending on the network speed.
!!! note
- In this tutorial, we assume a setup of 4 nodes, each equipped with 8 A800-80GB GPUs and connected via 200G InfiniBand.
- A100/A800 GPUs do not support the FP8 precision originally used by DeepSeek R1. Hence, we use the BF16 version from [Unsloth](https://huggingface.co/unsloth/DeepSeek-R1-BF16).
## Step 1: Install GPUStack Server
According to the [Installation](../installation/installation.md), you can use the following command to start the MaaS-Base server:
```bash
sudo docker run -d --name gpustack \
--restart unless-stopped \
-p 80:80 \
--volume gpustack-data:/var/lib/gpustack \
--volume /path/to/your/model:/path/to/your/model \
gpustack/gpustack
```
!!! note
- Replace `/path/to/your/model` with the actual path.
After MaaS-Base server is up and running, run the following commands to get the initial admin password:
```bash
sudo docker exec gpustack \
cat /var/lib/gpustack/initial_admin_password
```
## Step 2: Access MaaS-Base UI
Login to the MaaS-Base UI using the `admin` user and the obtained password.
```
http://your_gpustack_server_ip_or_hostname
```
## Step 3: Install MaaS-Base Workers
Navigate to the `Workers` page in the MaaS-Base UI, click `Add Worker` button to get the command for adding workers.
And then on **each worker node**, run the worker adding command to start a MaaS-Base worker:
```bash
sudo docker run -d --name gpustack \
--restart unless-stopped \
--privileged \
--network host \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume gpustack-data:/var/lib/gpustack \
--volume /path/to/your/model:/path/to/your/model \
--runtime nvidia \
gpustack/gpustack \
--server-url http://your_gpustack_server_ip_or_hostname \
--token your_gpustack_cluster_token
```
!!! note
- Replace the placeholder paths, IP address/hostname, and cluster token accordingly.
- Replace `/path/to/your/model` with the actual path on your system where the DeepSeek R1 model files are stored.
After all workers are added, return to the MaaS-Base UI.
Navigate to the `Workers` page to verify that all workers are in the Ready state and their GPUs are listed.

## Step 4: Deploy the DeepSeek R1 Model
1. Go to the `Deployments` page.
2. Click `Deploy Model`.
3. Select `Local Path` as your source.
4. Enter a name (e.g., `DeepSeek-R1`) in the `Name` field.
5. Specify the `Model Path` as the directory that contains the DeepSeek R1 model files on each worker node.
6. Ensure the `Backend` is set to `vLLM`.
7. After passing the compatibility check, click `Save` to deploy.

## Step 5: Monitor Deployment
You can monitor the deployment status on the `Deployments` page. Hover over `distributed across workers` to view GPU and worker usage. Click `View Logs` to see real-time logs showing model loading progress. It may take a few minutes to load the model.

After the model is running, navigate to the `Workers` page to check GPU utilization. By default, vLLM uses 90% of GPU memory. You may adjust this in the model configuration settings.

## Step 6: Run Inference via Playground
Once the model is deployed and running, you can test it using the MaaS-Base Playground.
1. Navigate to the `Playground` -> `Chat`.
2. If only one model is deployed, it will be selected by default. Otherwise, use the dropdown menu to choose `DeepSeek-R1`.
3. Enter prompts and interact with the model.

You can also use the `Compare` tab to test concurrent inference scenarios.

You have now successfully deployed and run DeepSeek R1 671B using Distributed vLLM on a MaaS-Base cluster. Explore the model’s performance and capabilities in your own applications.
For further assistance, feel free to reach out to the MaaS-Base community or support team.