2 هفته پیش · be9005acd1
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
				 
			
 
				 ## Project Overview
			
 
				 
			
 
				-GPUStack is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control.
			
 
				+MASS-Base is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control.
			
 
				 
			
 
				 **Tech stack:** Python 3.10–3.12, FastAPI, SQLModel, Pydantic, uv (package manager), hatchling (build), Alembic (migrations), pytest, Higress (API gateway).
			
 
				 
			
@@ -38,7 +38,7 @@ gpustack/
 
				 
			
 
				 **Key components:**
			
 
				 - **Server:** API Server (FastAPI) + Scheduler + Controllers. Handles model instance assignment and resource state management.
			
 
				-- **Worker:** GPUStack Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes.
			
 
				+- **Worker:** MASS-Base Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes.
			
 
				 - **AI Gateway:** Uses Higress for API routing and load balancing.
			
 
				 - **Database:** Embedded PostgreSQL by default; external PostgreSQL/MySQL supported. Alembic for migrations under `gpustack/migrations/`.
			
 
				 
			
--- a/README.md
+++ b/README.md
@@ -1,37 +1,52 @@
 
				 # MASS-Base
			
 
				 
			
 
				-MASS-Base 是一个开源的模型服务（Model-as-a-Service）基础平台，用于高效管理和调度 AI 模型推理服务。它支持多种推理引擎（vLLM、SGLang、TensorRT-LLM 等），可跨多节点进行性能优化与资源编排。
			
 
				+MASS-Base（Model-as-a-Service Base）是一个开源的模型服务（Model-as-a-Service）基础平台，用于高效管理和调度 AI 模型推理服务。它支持多种推理引擎（vLLM、SGLang、TensorRT-LLM 等），可跨多节点进行性能优化与资源编排。
			
 
				 
			
 
				 ## 核心特性
			
 
				 
			
 
				-- **多集群管理**：统一管理多个环境中的计算节点，支持本地服务器和云平台。
			
 
				-- **可插拔推理引擎**：自动配置 vLLM、SGLang、TensorRT-LLM 等高性能推理引擎，也支持自定义引擎接入。
			
 
				-- **开箱即用的模型部署**：新模型发布即可快速部署。
			
 
				-- **性能优化配置**：内置低延迟与高吞吐预调优模式，支持扩展 KV Cache（如 LMCache、HiCache）以降低 TTFT，并内置投机解码（EAGLE3、MTP、N-grams）支持。
			
 
				-- **企业级运维能力**：支持自动故障恢复、负载均衡、监控、认证与访问控制。
			
 
				+- **多集群管理：** 统一管理多个环境中的计算节点，支持本地服务器和云平台。
			
 
				+- **可插拔推理引擎：** 自动配置 vLLM、SGLang、TensorRT-LLM 等高性能推理引擎，也支持自定义引擎接入。
			
 
				+- **开箱即用的模型部署：** 新模型发布即可快速部署，内置模型目录和兼容性检查。
			
 
				+- **性能优化配置：** 内置低延迟与高吞吐预调优模式，支持扩展 KV Cache（如 LMCache、HiCache）以降低 TTFT，并内置投机解码（EAGLE3、MTP、N-grams）支持。
			
 
				+- **企业级运维能力：** 支持自动故障恢复、负载均衡、监控（Prometheus + Grafana）、认证与访问控制。
			
 
				+- **多硬件支持：** NVIDIA GPU、AMD GPU、Ascend NPU、Hygon DCU、MThreads GPU、Iluvatar GPU、MetaX GPU、Cambricon MLU、T-Head PPU。
			
 
				 
			
 
				 ## 架构
			
 
				 
			
 
				 MASS-Base 由以下核心组件构成：
			
 
				 
			
 
				-- **API Server**：基于 FastAPI 构建的 RESTful 接口层，处理认证与授权。
			
 
				-- **Scheduler**：负责将模型实例调度分配到工作节点。
			
 
				-- **Controllers**：管理系统资源状态，处理模型实例的扩缩容。
			
 
				-- **Worker**：检测 GPU 设备，管理模型实例的生命周期并导出性能指标。
			
 
				-- **AI Gateway**：基于 Higress 构建，负责 API 路由与负载均衡。
			
 
				-- **SQL Database**：默认使用嵌入式 PostgreSQL，也支持外部 PostgreSQL 或 MySQL。
			
 
				+- **API Server：** 基于 FastAPI 构建的 RESTful 接口层，处理认证与授权。
			
 
				+- **Scheduler：** 负责将模型实例调度分配到工作节点。
			
 
				+- **Controllers：** 管理系统资源状态，处理模型实例的扩缩容。
			
 
				+- **Worker：** 检测 GPU 设备，管理模型实例的生命周期并导出性能指标。
			
 
				+- **AI Gateway：** 基于 [Higress](https://github.com/alibaba/higress) 构建，负责 API 路由与负载均衡。
			
 
				+- **SQL Database：** 默认使用嵌入式 PostgreSQL，也支持外部 PostgreSQL 或 MySQL。
			
 
				 
			
 
				 ![architecture](docs/assets/gpustack-v2-architecture.png)
			
 
				 
			
 
				+详细架构说明请参见 [架构文档](docs/architecture.md)。
			
 
				+
			
 
				+## 部署方式
			
 
				+
			
 
				+MASS-Base 支持多种部署方式，请根据场景选择：
			
 
				+
			
 
				+| 部署方式 | 适合场景 | 文档 |
			
 
				+|----------|---------|------|
			
 
				+| **Docker 单容器** | 快速体验、演示、单节点 | [下方快速开始](#快速开始) |
			
 
				+| **Docker Compose** | 开发测试、小团队、含监控部署 | [Docker Compose 部署指南](docs/deployment/docker-compose.md) |
			
 
				+| **Kubernetes (Helm)** | 生产环境、大规模、多节点 | [Kubernetes 部署指南](docs/deployment/kubernetes.md) |
			
 
				+
			
 
				 ## 快速开始
			
 
				 
			
 
				 ### 前置要求
			
 
				 
			
 
				-1. 至少一台 Linux 节点（支持 NVIDIA GPU、AMD GPU、Ascend NPU、Hygon DCU、MThreads GPU、Iluvatar GPU、MetaX GPU、Cambricon MLU、T-Head PPU 等加速器）。
			
 
				-2. 工作节点需安装驱动、[Docker](https://docs.docker.com/engine/install/) 和 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。
			
 
				-3. 服务端可运行在无 GPU 的 CPU 节点上，需安装 Docker。
			
 
				+1. Linux 节点（支持 NVIDIA GPU、AMD GPU、Ascend NPU 等加速器）。
			
 
				+2. 已安装 [Docker](https://docs.docker.com/engine/install/)。
			
 
				+3. GPU 节点需安装对应驱动和 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)（NVIDIA GPU 场景）。
			
 
				+
			
 
				+### 1. 部署 Server
			
 
				 
			
 
				-### 安装服务端
			
 
				+#### 方式 A：无 GPU 的 CPU 节点（纯 Server）
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name mass-base \
			
@@ -41,27 +56,40 @@ sudo docker run -d --name mass-base \
 
				     mass-base/mass-base
			
 
				 ```
			
 
				 
			
 
				-启动后查看日志：
			
 
				+#### 方式 B：有 GPU 的节点（Server + Worker 合一）
			
 
				 
			
 
				 ```bash
			
 
				-sudo docker logs -f mass-base
			
 
				+sudo docker run -d --name mass-base \
			
 
				+    --restart unless-stopped \
			
 
				+    --privileged \
			
 
				+    --network host \
			
 
				+    --ipc host \
			
 
				+    -v /var/run/docker.sock:/var/run/docker.sock \
			
 
				+    -v /var/run/cdi:/var/run/cdi \
			
 
				+    -v mass-base-data:/var/lib/mass-base \
			
 
				+    -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins \
			
 
				+    -e NVIDIA_VISIBLE_DEVICES=all \
			
 
				+    -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
			
 
				+    mass-base/mass-base
			
 
				 ```
			
 
				 
			
 
				-获取默认管理员密码：
			
 
				+### 2. 获取管理员密码
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker exec mass-base cat /var/lib/mass-base/initial_admin_password
			
 
				 ```
			
 
				 
			
 
				-在浏览器中访问 `http://your_host_ip`，使用用户名 `admin` 和获取到的密码登录。
			
 
				+### 3. 访问 UI
			
 
				+
			
 
				+在浏览器中打开 `http://<服务器IP>`，使用用户名 `admin` 和上一步获取的密码登录。
			
 
				 
			
 
				-### 部署模型
			
 
				+### 4. 部署模型
			
 
				 
			
 
				 1. 在 MASS-Base UI 中进入 **Catalog** 页面。
			
 
				 2. 选择可用模型，通过兼容性检查后点击 **Save** 部署。
			
 
				 3. 部署状态变为 **Running** 后即可通过 UI Playground 或 API 调用。
			
 
				 
			
 
				-### 使用 API
			
 
				+### 5. 使用 API
			
 
				 
			
 
				 1. 在 UI 中进入 **API Keys** 页面，创建新的 API Key。
			
 
				 2. 使用 API Key 调用 OpenAI 兼容接口：
			
@@ -81,8 +109,18 @@ curl http://your_mass_base_server_url/v1/chat/completions \
 
				   }'
			
 
				 ```
			
 
				 
			
 
				+## 完整部署指南
			
 
				+
			
 
				+如需更详细的部署说明（含 Docker Compose 完整监控栈、Worker 节点独立部署、Kubernetes Helm 部署等），请参见：
			
 
				+
			
 
				+- [Docker Compose 部署指南](docs/deployment/docker-compose.md) — 最小部署 + 完整监控部署
			
 
				+- [Worker 节点部署指南](docs/deployment/worker.md) — 多 GPU 节点 Worker 部署、多厂商 GPU 支持
			
 
				+- [Kubernetes (Helm) 部署指南](docs/deployment/kubernetes.md) — 生产环境大规模部署
			
 
				+
			
 
				 ## 构建
			
 
				 
			
 
				+### 构建 Wheel 包
			
 
				+
			
 
				 1. 安装 Python 3.10 ~ 3.12。
			
 
				 
			
 
				 2. 执行构建：
			
@@ -93,8 +131,18 @@ make build
 
				 
			
 
				 构建产物位于 `dist` 目录。
			
 
				 
			
 
				+### 构建容器镜像
			
 
				+
			
 
				+```bash
			
 
				+make package
			
 
				+```
			
 
				+
			
 
				+> **注意：** 镜像构建仅支持 Linux/macOS。
			
 
				+
			
 
				 ## 开发
			
 
				 
			
 
				+### 本地开发
			
 
				+
			
 
				 ```bash
			
 
				 # 安装开发依赖
			
 
				 make install
			
@@ -106,12 +154,31 @@ uv run gpustack start \
 
				   --api-port 80
			
 
				 ```
			
 
				 
			
 
				-更多开发指南请参考 [Development Guide](docs/development.md)。
			
 
				+### 常用命令
			
 
				+
			
 
				+| 命令 | 说明 |
			
 
				+|------|------|
			
 
				+| `make install` | 安装 uv、同步依赖、设置 pre-commit hooks |
			
 
				+| `make deps` | 同步锁定依赖 |
			
 
				+| `make generate` | 生成代码（OpenAPI Client 等） |
			
 
				+| `make lint` | 运行代码检查（flake8、black 等） |
			
 
				+| `make test` | 运行单元测试 |
			
 
				+| `make build` | 构建 wheel 包 |
			
 
				+| `make ci` | 完整 CI 流水线 |
			
 
				+
			
 
				+详细开发指南请参见 [Development Guide](docs/development.md)。
			
 
				 
			
 
				 ## 文档
			
 
				 
			
 
				 完整文档请访问 [官方文档站点](https://docs.gpustack.ai)。
			
 
				 
			
 
				+项目内文档：
			
 
				+
			
 
				+- [架构文档](docs/architecture.md)
			
 
				+- [Docker Compose 部署指南](docs/deployment/docker-compose.md)
			
 
				+- [Worker 节点部署指南](docs/deployment/worker.md)
			
 
				+- [Kubernetes (Helm) 部署指南](docs/deployment/kubernetes.md)
			
 
				+
			
 
				 ## 加入社区
			
 
				 
			
 
				 有任何问题或建议，欢迎加入我们的 [Discord 社区](https://discord.gg/VXYJzuaqwD) 获取支持。
			
--- a/docs/api-reference.md
+++ b/docs/api-reference.md
@@ -1,5 +1,5 @@
 
				 # API Reference
			
 
				 
			
 
				-GPUStack provides a built-in Swagger UI. You can access it by navigating to `<gpustack-server-url>/docs` in your browser to view and interact with the APIs.
			
 
				+MASS-Base provides a built-in Swagger UI. You can access it by navigating to `<gpustack-server-url>/docs` in your browser to view and interact with the APIs.
			
 
				 
			
 
				 ![Swagger UI](assets/swagger-ui.png)
			
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,6 +1,6 @@
 
				 # Architecture
			
 
				 
			
 
				-The diagram below provides a high-level view of the GPUStack architecture.
			
 
				+The diagram below provides a high-level view of the MASS-Base architecture.
			
 
				 
			
 
				 ![gpustack-v2-architecture](assets/gpustack-v2-architecture.png)
			
 
				 
			
@@ -10,7 +10,7 @@ The diagram below details the internal components and their interactions.
 
				 
			
 
				 ### Server
			
 
				 
			
 
				-The GPUStack server consists of the following components:
			
 
				+The MASS-Base server consists of the following components:
			
 
				 
			
 
				 - **API Server:** Provides a RESTful interface for clients to interact with the system. It handles authentication and authorization.
			
 
				 - **Scheduler:** Responsible for assigning model instances to workers.
			
@@ -18,24 +18,24 @@ The GPUStack server consists of the following components:
 
				 
			
 
				 ### Worker
			
 
				 
			
 
				-The GPUStack worker consists of the following components:
			
 
				+The MASS-Base worker consists of the following components:
			
 
				 
			
 
				-- **GPUStack Runtime:** Detects GPU devices and interacts with the container runtime to deploy model instances.
			
 
				+- **MASS-Base Runtime:** Detects GPU devices and interacts with the container runtime to deploy model instances.
			
 
				 - **Serving Manager:** Manages the lifecycle of model instances on the worker.
			
 
				 - **Metric Exporter:** Exports metrics about the model instances and their performance.
			
 
				 
			
 
				 ### AI Gateway
			
 
				 
			
 
				-The AI Gateway handles incoming API requests from clients. It routes requests to the appropriate model instances based on the requested model. GPUStack uses [Higress](https://github.com/alibaba/higress) for API routing and load balancing.
			
 
				+The AI Gateway handles incoming API requests from clients. It routes requests to the appropriate model instances based on the requested model. MASS-Base uses [Higress](https://github.com/alibaba/higress) for API routing and load balancing.
			
 
				 
			
 
				 ### SQL Database
			
 
				 
			
 
				-The GPUStack server connects to a SQL database as the datastore. GPUStack uses an Embedded PostgreSQL by default, but you can configure it to use an external PostgreSQL or MySQL as well.
			
 
				+The MASS-Base server connects to a SQL database as the datastore. MASS-Base uses an Embedded PostgreSQL by default, but you can configure it to use an external PostgreSQL or MySQL as well.
			
 
				 
			
 
				 ### Inference Server
			
 
				 
			
 
				-Inference servers are the backends that perform the inference tasks. GPUStack supports [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [Ascend MindIE](https://www.hiascend.com/en/software/mindie) and [VoxBox](https://github.com/gpustack/vox-box) as the built-in inference server. You can also add custom inference backends.
			
 
				+Inference servers are the backends that perform the inference tasks. MASS-Base supports [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [Ascend MindIE](https://www.hiascend.com/en/software/mindie) and [VoxBox](https://github.com/gpustack/vox-box) as the built-in inference server. You can also add custom inference backends.
			
 
				 
			
 
				 ### Ray
			
 
				 
			
 
				-[Ray](https://ray.io) is a distributed computing framework that GPUStack utilizes to run distributed vLLM. GPUStack bootstraps Ray cluster on-demand to run distributed vLLM across multiple workers.
			
 
				+[Ray](https://ray.io) is a distributed computing framework that MASS-Base utilizes to run distributed vLLM. MASS-Base bootstraps Ray cluster on-demand to run distributed vLLM across multiple workers.
			
--- a/docs/cli-reference/reload-config.md
+++ b/docs/cli-reference/reload-config.md
@@ -19,5 +19,5 @@ gpustack reload-config [OPTIONS]
 
				 | `--file` value                      | (empty)                                | Load configuration from a YAML file. Only whitelisted fields are applied. Keys are normalized to snake_case. Values provided via `--set` override those from the file.                                                                                                                                                                         |
			
 
				 | `--list`                            | `False`                                | Show whitelisted fields and values. When present, other options are ignored.                                                                                                                                                                                                                                                                   |
			
 
				 | `--api-key` value                   | (empty)                                | When force-auth-localhost is enabled, provide an API key for server-side authentication as an admin user.                                                                                                                                                                                                                                      |
			
 
				-| `--server-port` value               | `30080`                                | Target port of the GPUStack API server for applying or listing runtime config. When omitted, defaults to `GPUSTACK_API_PORT` if set, otherwise `30080`.                                                                                                                                                                                        |
			
 
				-| `--worker-port` value               | `10150`                                | Target port of the GPUStack worker for applying or listing runtime config. When omitted, defaults to `GPUSTACK_WORKER_PORT` if set, otherwise `10150`.                                                                                                                                                                                         |
			
 
				+| `--server-port` value               | `30080`                                | Target port of the MASS-Base API server for applying or listing runtime config. When omitted, defaults to `GPUSTACK_API_PORT` if set, otherwise `30080`.                                                                                                                                                                                        |
			
 
				+| `--worker-port` value               | `10150`                                | Target port of the MASS-Base worker for applying or listing runtime config. When omitted, defaults to `GPUSTACK_WORKER_PORT` if set, otherwise `10150`.                                                                                                                                                                                         |
			
--- a/docs/cli-reference/start.md
+++ b/docs/cli-reference/start.md
@@ -5,7 +5,7 @@ hide:
 
				 
			
 
				 # gpustack start
			
 
				 
			
 
				-Run GPUStack server or worker.
			
 
				+Run MASS-Base server or worker.
			
 
				 
			
 
				 ```bash
			
 
				 gpustack start [OPTIONS]
			
@@ -44,9 +44,9 @@ gpustack start [OPTIONS]
 
				 | `--huggingface-token` value                 | (empty)                                | User Access Token to authenticate to the Hugging Face Hub. Can also be configured via the `HF_TOKEN` environment variable.            |
			
 
				 | `--bin-dir` value                           | (empty)                                | Directory to store additional binaries, e.g., versioned backend executables.                                                          |
			
 
				 | `--pipx-path` value                         | (empty)                                | Path to the pipx executable, used to install versioned backends.                                                                      |
			
 
				-| `--system-default-container-registry` value | `docker.io`                            | Default container registry for GPUStack to pull system and inference images.                                                          |
			
 
				-| `--image-name-override` value               | (empty)                                | Override the default image name for the GPUStack container.                                                                           |
			
 
				-| `--image-repo` value                        | `gpustack/gpustack`                    | Override the default image repository for the GPUStack container.                                                                     |
			
 
				+| `--system-default-container-registry` value | `docker.io`                            | Default container registry for MASS-Base to pull system and inference images.                                                          |
			
 
				+| `--image-name-override` value               | (empty)                                | Override the default image name for the MASS-Base container.                                                                           |
			
 
				+| `--image-repo` value                        | `gpustack/gpustack`                    | Override the default image repository for the MASS-Base container.                                                                     |
			
 
				 | `--gateway-mode` value                      | `auto`                                 | Gateway running mode. Options: embedded, in-cluster, external, disabled, or auto (default).                                           |
			
 
				 | `--gateway-kubeconfig` value                | (empty)                                | Path to the kubeconfig file for gateway. Only useful for external gateway-mode.                                                       |
			
 
				 | `--gateway-namespace` value                 | `higress-system`                       | The namespace where the gateway component is deployed.                                                                                |
			
@@ -59,13 +59,13 @@ gpustack start [OPTIONS]
 
				 | ------------------------------------------------ | ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
			
 
				 | `--port` value                                   | `80`                                             | Port to bind the server to.                                                                                                                                                                                                                                                                                                                   |
			
 
				 | `--tls-port` value                               | `443`                                            | Port to bind the TLS server to.                                                                                                                                                                                                                                                                                                               |
			
 
				-| `--api-port` value                               | `30080`                                          | Port to bind the GPUStack API server to.                                                                                                                                                                                                                                                                                                      |
			
 
				-| `--proxy-port` value                             | `30079`                                          | Port to bind the GPUStack proxy server to.                                                                                                                                                                                                                                                                                                    |
			
 
				+| `--api-port` value                               | `30080`                                          | Port to bind the MASS-Base API server to.                                                                                                                                                                                                                                                                                                      |
			
 
				+| `--proxy-port` value                             | `30079`                                          | Port to bind the MASS-Base proxy server to.                                                                                                                                                                                                                                                                                                    |
			
 
				 | `--database-port` value                          | `5432`                                           | Port of the embedded PostgresSQL database.                                                                                                                                                                                                                                                                                                    |
			
 
				 | `--metrics-port` value                           | `10161`                                          | Port to expose server metrics.                                                                                                                                                                                                                                                                                                                |
			
 
				 | `--disable-metrics`                              | `False`                                          | Disable server metrics.                                                                                                                                                                                                                                                                                                                       |
			
 
				-| `--disable-worker`                               | (empty)                                          | (DEPRECATED) Disable the embedded worker for the GPUStack server. New installations will not have the embedded worker by default. Use '--enable-worker' to enable the embedded worker if needed. If neither flag is set, for backward compatibility, the embedded worker will be enabled by default for legacy installations prior to v2.0.1. |
			
 
				-| `--enable-worker`                                | `False`                                          | Enable the embedded worker for the GPUStack server.                                                                                                                                                                                                                                                                                           |
			
 
				+| `--disable-worker`                               | (empty)                                          | (DEPRECATED) Disable the embedded worker for the MASS-Base server. New installations will not have the embedded worker by default. Use '--enable-worker' to enable the embedded worker if needed. If neither flag is set, for backward compatibility, the embedded worker will be enabled by default for legacy installations prior to v2.0.1. |
			
 
				+| `--enable-worker`                                | `False`                                          | Enable the embedded worker for the MASS-Base server.                                                                                                                                                                                                                                                                                           |
			
 
				 | `--bootstrap-password` value                     | Auto-generated.                                  | Initial password for the default admin user.                                                                                                                                                                                                                                                                                                  |
			
 
				 | `--database-url` value                           | Embedded PostgreSQL.                             | URL of the database. Supports PostgreSQL 13.0+, and MySQL 8.0.36+. Example: postgresql://user:password@host:port/db_name or mysql://user:password@host:port/db_name                                                                                                                                                                           |
			
 
				 | `--ssl-keyfile` value                            | (empty)                                          | Path to the SSL key file.                                                                                                                                                                                                                                                                                                                     |
			
@@ -129,7 +129,7 @@ gpustack start [OPTIONS]
 
				 | `--enable-hf-xet`                        | `False`                                | [Deprecated] Enable downloading model files using Hugging Face Xet.                                                                                                                             |
			
 
				 | `--worker-ifname` value                  | (empty)                                | Network interface name of the worker node. Auto-detected by default.                                                                                                                            |
			
 
				 | `--proxy-mode` value                     | (empty)                                | Proxy mode for server accessing model instances: direct (server connects directly) or worker (via worker proxy). Default value is direct for embedded worker, and worker for standalone worker. |
			
 
				-| `--benchmark-image-repo` value           | `gpustack/benchmark-runner`            | Override the default benchmark image repo for the GPUStack benchmark container.                                                                                                                 |
			
 
				+| `--benchmark-image-repo` value           | `gpustack/benchmark-runner`            | Override the default benchmark image repo for the MASS-Base benchmark container.                                                                                                                 |
			
 
				 | `--benchmark-dir` value                  | `<data-dir>/benchmarks`                | Directory to store benchmark results.                                                                                                                                                           |
			
 
				 | `--benchmark-max-duration-seconds` value | (empty)                                | Max duration for a benchmark before timeout. Disabled when empty.                                                                                                                               |
			
 
				 
			
@@ -141,7 +141,7 @@ For environment variables beyond the command-line parameters mentioned above, pl
 
				 
			
 
				 ## Config File
			
 
				 
			
 
				-You can configure start options using a YAML-format config file when starting GPUStack server or worker. Here is a complete example:
			
 
				+You can configure start options using a YAML-format config file when starting MASS-Base server or worker. Here is a complete example:
			
 
				 
			
 
				 ```yaml
			
 
				 # Common Options
			
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -1,6 +1,6 @@
 
				-# Contributing to GPUStack
			
 
				+# Contributing to MASS-Base
			
 
				 
			
 
				-Thanks for taking the time to contribute to GPUStack!
			
 
				+Thanks for taking the time to contribute to MASS-Base!
			
 
				 
			
 
				 Please review and follow the [Code of Conduct](./code-of-conduct.md).
			
 
				 
			
@@ -10,7 +10,7 @@ If you find any bugs or are having any trouble, please search the reported issue
 
				 
			
 
				 If you can't find anything related to your issue, contact us by filing an issue. To help us diagnose and resolve, please include as much information as possible, including:
			
 
				 
			
 
				-- Software: GPUStack version, installation method, operating system info, etc.
			
 
				+- Software: MASS-Base version, installation method, operating system info, etc.
			
 
				 - Hardware: Node info, GPU info, etc.
			
 
				 - Steps to reproduce: Provide as much detail on how you got into the reported situation.
			
 
				 - Logs: Please include any relevant logs, such as server logs, worker logs, etc.
			
--- a/docs/deployment/docker-compose.md
+++ b/docs/deployment/docker-compose.md
@@ -0,0 +1,219 @@
 
				+# Docker Compose 部署指南
			
 
				+
			
 
				+本文档介绍如何使用 Docker Compose 部署 MASS-Base 平台。Docker Compose 方式适合单机部署、开发测试和小团队使用。
			
 
				+
			
 
				+## 前置要求
			
 
				+
			
 
				+| 要求 | 说明 |
			
 
				+|------|------|
			
 
				+| 操作系统 | Linux（推荐 Ubuntu 20.04+ / Debian 12+） |
			
 
				+| Docker | 20.10+，已安装 [Docker Compose](https://docs.docker.com/compose/install/) |
			
 
				+| CPU 节点 | 至少 4 核 CPU，8GB 内存（仅 Server 节点） |
			
 
				+| 磁盘空间 | 至少 20GB 可用空间 |
			
 
				+
			
 
				+> **注意：** Server 可运行在无 GPU 的 CPU 节点上。GPU 节点用于部署 Worker，参见 [Worker 节点部署指南](../deployment/worker.md)。
			
 
				+
			
 
				+## 部署模式
			
 
				+
			
 
				+项目提供两套 Docker Compose 配置，可根据需求选择：
			
 
				+
			
 
				+| 配置文件 | 包含组件 | 适用场景 |
			
 
				+|----------|---------|---------|
			
 
				+| `docker-compose.server.yaml` | PostgreSQL + Server | 最小部署，已有外部监控体系 |
			
 
				+| `docker-compose.external-observability.yaml` | PostgreSQL + Server + Prometheus + Grafana | 完整部署，内置可观测性 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 模式一：最小部署（PostgreSQL + Server）
			
 
				+
			
 
				+### 1. 克隆项目
			
 
				+
			
 
				+```bash
			
 
				+git clone https://github.com/your-org/maas-base.git
			
 
				+cd maas-base/docker-compose
			
 
				+```
			
 
				+
			
 
				+### 2. 启动服务
			
 
				+
			
 
				+```bash
			
 
				+docker compose -f docker-compose.server.yaml up -d
			
 
				+```
			
 
				+
			
 
				+该命令会启动以下两个容器：
			
 
				+
			
 
				+- **`gpustack-db`** — PostgreSQL 16 数据库
			
 
				+- **`gpustack-server`** — MASS-Base Server（从 `pack/Dockerfile` 自动构建镜像）
			
 
				+
			
 
				+### 3. 验证部署
			
 
				+
			
 
				+```bash
			
 
				+# 查看容器状态
			
 
				+docker compose -f docker-compose.server.yaml ps
			
 
				+
			
 
				+# 查看 Server 日志
			
 
				+docker compose -f docker-compose.server.yaml logs -f gpustack-server
			
 
				+```
			
 
				+
			
 
				+### 4. 获取初始管理员密码
			
 
				+
			
 
				+```bash
			
 
				+docker exec gpustack-server cat /var/lib/mass-base/initial_admin_password
			
 
				+```
			
 
				+
			
 
				+### 5. 访问 UI
			
 
				+
			
 
				+在浏览器中打开 `http://<服务器IP>`，使用用户名 `admin` 和上一步获取的密码登录。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 模式二：完整部署（含 Prometheus + Grafana）
			
 
				+
			
 
				+### 1. 克隆项目
			
 
				+
			
 
				+```bash
			
 
				+git clone https://github.com/your-org/maas-base.git
			
 
				+cd maas-base/docker-compose
			
 
				+```
			
 
				+
			
 
				+### 2. 启动服务
			
 
				+
			
 
				+```bash
			
 
				+docker compose -f docker-compose.external-observability.yaml up -d
			
 
				+```
			
 
				+
			
 
				+该命令会启动以下四个容器：
			
 
				+
			
 
				+- **`gpustack-db`** — PostgreSQL 16 数据库
			
 
				+- **`gpustack-server`** — MASS-Base Server
			
 
				+- **`gpustack-prometheus`** — Prometheus 指标采集
			
 
				+- **`gpustack-grafana`** — Grafana 监控面板
			
 
				+
			
 
				+### 3. 访问服务
			
 
				+
			
 
				+| 服务 | 地址 | 默认凭据 |
			
 
				+|------|------|---------|
			
 
				+| MASS-Base UI | `http://<服务器IP>:80` | admin / 初始密码 |
			
 
				+| Prometheus | `http://<服务器IP>:9090` | 无 |
			
 
				+| Grafana | `http://<服务器IP>:3000` | admin / grafana |
			
 
				+
			
 
				+### 4. 获取初始管理员密码
			
 
				+
			
 
				+```bash
			
 
				+docker exec gpustack-server cat /var/lib/mass-base/initial_admin_password
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 环境变量配置
			
 
				+
			
 
				+所有环境变量均可通过 `.env` 文件或 `--env-file` 参数传入。
			
 
				+
			
 
				+| 变量 | 默认值 | 说明 |
			
 
				+|------|--------|------|
			
 
				+| `POSTGRES_PASSWORD` | `gpustack` | PostgreSQL 数据库密码 |
			
 
				+| `GPUSTACK_GRAFANA_URL` | `http://localhost:3000` | Grafana 访问地址 |
			
 
				+| `IMAGE_REGISTRY` | `docker.io` | 镜像仓库地址（可用于替换为国内镜像源） |
			
 
				+| `PROMETHEUS_IMAGE_NAMESPACE` | `prom` | Prometheus 镜像命名空间 |
			
 
				+| `PROMETHEUS_TAG` | `latest` | Prometheus 镜像标签 |
			
 
				+| `GRAFANA_IMAGE_NAMESPACE` | `grafana` | Grafana 镜像命名空间 |
			
 
				+| `GRAFANA_TAG` | `latest` | Grafana 镜像标签 |
			
 
				+| `GRAFANA_PASSWORD` | `grafana` | Grafana 管理员密码 |
			
 
				+
			
 
				+### 示例：使用自定义密码
			
 
				+
			
 
				+创建 `.env` 文件：
			
 
				+
			
 
				+```env
			
 
				+POSTGRES_PASSWORD=my_secure_db_password_123
			
 
				+GRAFANA_PASSWORD=my_secure_grafana_password
			
 
				+```
			
 
				+
			
 
				+然后启动：
			
 
				+
			
 
				+```bash
			
 
				+docker compose -f docker-compose.external-observability.yaml --env-file .env up -d
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 数据持久化
			
 
				+
			
 
				+Docker Compose 使用命名卷（named volumes）进行数据持久化：
			
 
				+
			
 
				+| 卷名 | 用途 | 挂载路径 |
			
 
				+|------|------|---------|
			
 
				+| `postgres-data` | PostgreSQL 数据 | `/var/lib/postgresql/data` |
			
 
				+| `prom_data` | Prometheus 指标数据 | `/prometheus` |
			
 
				+| `gpustack-data` | Server 数据（含日志、配置、嵌入式数据库备份） | `/var/lib/gpustack` |
			
 
				+
			
 
				+如需将数据映射到宿主机目录，可修改 `volumes` 配置：
			
 
				+
			
 
				+```yaml
			
 
				+volumes:
			
 
				+  - /data/postgres:/var/lib/postgresql/data
			
 
				+  - /data/prometheus:/prometheus
			
 
				+  - /data/gpustack:/var/lib/gpustack
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 端口说明
			
 
				+
			
 
				+| 端口 | 组件 | 用途 |
			
 
				+|------|------|------|
			
 
				+| `80` | Server API | REST API、Web UI、模型推理接口 |
			
 
				+| `10161` | Server Metrics | Prometheus 抓取指标 |
			
 
				+| `9090` | Prometheus | Prometheus Web UI |
			
 
				+| `3000` | Grafana | Grafana 监控面板 |
			
 
				+
			
 
				+> **安全建议：** 生产环境中建议仅开放 `80` 端口，其他端口通过反向代理或内网访问。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 外部数据库
			
 
				+
			
 
				+如果已有 PostgreSQL 或 MySQL，可以不部署内嵌数据库，通过修改 `GPUSTACK_DATABASE_URL` 环境变量连接：
			
 
				+
			
 
				+```yaml
			
 
				+environment:
			
 
				+  GPUSTACK_DATABASE_URL: postgresql://user:password@host:5432/gpustack
			
 
				+```
			
 
				+
			
 
				+此时可从 `docker-compose.*.yaml` 中移除 `postgres` 服务及其 `depends_on` 依赖。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 停止与清理
			
 
				+
			
 
				+```bash
			
 
				+# 停止服务（保留数据卷）
			
 
				+docker compose -f docker-compose.server.yaml down
			
 
				+
			
 
				+# 停止服务并删除数据卷（数据将丢失）
			
 
				+docker compose -f docker-compose.server.yaml down -v
			
 
				+
			
 
				+# 查看日志
			
 
				+docker compose -f docker-compose.server.yaml logs -f
			
 
				+
			
 
				+# 仅查看 Server 日志
			
 
				+docker compose -f docker-compose.server.yaml logs -f gpustack-server
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 更新镜像
			
 
				+
			
 
				+```bash
			
 
				+# 拉取最新代码
			
 
				+git pull
			
 
				+
			
 
				+# 重新构建并启动
			
 
				+docker compose -f docker-compose.external-observability.yaml up -d --build
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 下一步
			
 
				+
			
 
				+- 部署 Worker 节点以提供 GPU 推理能力 → [Worker 节点部署指南](../deployment/worker.md)
			
 
				+- 使用 Kubernetes 进行大规模部署 → [Kubernetes (Helm) 部署指南](../deployment/kubernetes.md)
			
--- a/docs/deployment/kubernetes.md
+++ b/docs/deployment/kubernetes.md
@@ -0,0 +1,321 @@
 
				+# Kubernetes (Helm) 部署指南
			
 
				+
			
 
				+本文档介绍如何使用 Helm 在 Kubernetes 集群中部署 MASS-Base 平台。Helm 方式适合生产环境和大规模部署。
			
 
				+
			
 
				+> **注意：** Kubernetes 部署模式下，内置 Higress 网关目前为实验性阶段，详见[限制](#限制)部分。
			
 
				+
			
 
				+## 前置要求
			
 
				+
			
 
				+| 组件 | 版本要求 |
			
 
				+|------|---------|
			
 
				+| Kubernetes | >= v1.30.0 |
			
 
				+| Helm | v3.18.4+ |
			
 
				+| 存储 | 默认 StorageClass（用于 PVC）或配置 hostPath |
			
 
				+| GPU 节点 | Linux 节点，已安装 GPU 驱动和对应 Container Runtime |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 1. 安装 Kubernetes 集群
			
 
				+
			
 
				+以下以 k3s 为例，其他发行版（RKE2、kubeadm、云厂商托管 Kubernetes）同样适用。
			
 
				+
			
 
				+```bash
			
 
				+# 安装 k3s v1.30.11，禁用 Traefik（使用 Higress 作为 Ingress 控制器）
			
 
				+curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.30.11+k3s1 \
			
 
				+    INSTALL_K3S_EXEC="--disable=traefik" sh -
			
 
				+```
			
 
				+
			
 
				+验证安装：
			
 
				+
			
 
				+```bash
			
 
				+kubectl version
			
 
				+```
			
 
				+
			
 
				+> **HA 集群：** 对于高可用 k3s 集群，请参考 [k3s 文档](https://docs.k3s.io/datastore/ha)。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 2. 安装 Helm
			
 
				+
			
 
				+如果尚未安装 Helm：
			
 
				+
			
 
				+```bash
			
 
				+curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 3. 获取 Helm Chart
			
 
				+
			
 
				+```bash
			
 
				+git clone https://github.com/your-org/maas-base.git
			
 
				+cd maas-base/charts
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 4. 部署 MASS-Base
			
 
				+
			
 
				+### 4.1 默认部署（内置 Higress 网关）
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace
			
 
				+```
			
 
				+
			
 
				+此命令会部署以下组件：
			
 
				+
			
 
				+- **`gpustack-server`** — StatefulSet（1 副本），含嵌入式 PostgreSQL
			
 
				+- **`gpustack-worker`** — DaemonSet，在每个 GPU 节点上运行 Worker
			
 
				+- **`higress`** — Higress 网关（子 Chart），负责 API 路由和负载均衡
			
 
				+- **`higress-plugins`** — Higress 插件服务 Deployment
			
 
				+- **RBAC** — Server 和 Worker 的 ServiceAccount、ClusterRole、ClusterRoleBinding
			
 
				+
			
 
				+### 4.2 使用已安装的 Higress
			
 
				+
			
 
				+如果集群中已有 Higress，跳过内置网关部署：
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set higress-core.enabled=false \
			
 
				+  --set gateway.ingressClassname=<your-higress-ingressclass>
			
 
				+```
			
 
				+
			
 
				+验证 Higress IngressClass 是否存在：
			
 
				+
			
 
				+```bash
			
 
				+kubectl get ingressclass higress
			
 
				+# NAME      CONTROLLER                      PARAMETERS   AGE
			
 
				+# higress   higress.io/higress-controller   <none>       3m46s
			
 
				+```
			
 
				+
			
 
				+### 4.3 独立安装 Higress
			
 
				+
			
 
				+如需单独安装兼容版本的 Higress：
			
 
				+
			
 
				+```bash
			
 
				+# 添加 Higress Helm 仓库
			
 
				+helm repo add higress.io https://higress.io/helm-charts
			
 
				+
			
 
				+# 安装 higress-core v2.1.9
			
 
				+helm install higress higress.io/higress-core -n higress-system --create-namespace --version 2.1.9
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 5. 访问 MASS-Base
			
 
				+
			
 
				+### 5.1 获取初始管理员密码
			
 
				+
			
 
				+```bash
			
 
				+kubectl exec -it -n gpustack-system gpustack-server-0 -- cat /var/lib/gpustack/initial_admin_password
			
 
				+```
			
 
				+
			
 
				+### 5.2 获取访问地址
			
 
				+
			
 
				+如果配置了 `server.ingress.hostname`：
			
 
				+
			
 
				+```bash
			
 
				+kubectl get ingress -n gpustack-system gpustack
			
 
				+```
			
 
				+
			
 
				+否则，通过 Service 的 LoadBalancer IP 访问：
			
 
				+
			
 
				+```bash
			
 
				+kubectl get svc -n gpustack-system gpustack-server
			
 
				+```
			
 
				+
			
 
				+或使用端口转发临时访问：
			
 
				+
			
 
				+```bash
			
 
				+kubectl port-forward -n gpustack-system svc/gpustack-server 8080:80
			
 
				+# 然后访问 http://localhost:8080
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 6. 自定义配置
			
 
				+
			
 
				+### 6.1 使用外部数据库
			
 
				+
			
 
				+推荐生产环境使用外部 PostgreSQL 而非内置数据库：
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set server.externalDatabaseURL="postgresql://user:password@db-host:5432/gpustack"
			
 
				+```
			
 
				+
			
 
				+### 6.2 配置 TLS
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set server.ingress.hostname=maas.example.com \
			
 
				+  --set server.ingress.tls.cert="$(cat tls.crt)" \
			
 
				+  --set server.ingress.tls.key="$(cat tls.key)"
			
 
				+```
			
 
				+
			
 
				+### 6.3 配置 Worker GPU 厂商
			
 
				+
			
 
				+```bash
			
 
				+# NVIDIA GPU（默认）
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set worker.gpuVendor=nvidia
			
 
				+
			
 
				+# AMD GPU
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set worker.gpuVendor=amd
			
 
				+
			
 
				+# Ascend NPU
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set worker.gpuVendor=ascend
			
 
				+```
			
 
				+
			
 
				+支持的 GPU 厂商见 [values.yaml 配置参考](#7-valuesyaml-配置参考)。
			
 
				+
			
 
				+### 6.4 使用 hostPath 替代 PVC
			
 
				+
			
 
				+如果没有默认 StorageClass，可以使用 hostPath：
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set server.dataVolume.hostPath=/data/gpustack/server
			
 
				+```
			
 
				+
			
 
				+### 6.5 使用自定义镜像
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set image.repository=my-registry/mass-base \
			
 
				+  --set image.tag=v2.2.0 \
			
 
				+  --set image.pullPolicy=Always
			
 
				+```
			
 
				+
			
 
				+### 6.6 完整的自定义 values 文件
			
 
				+
			
 
				+```bash
			
 
				+cat > my-values.yaml << 'EOF'
			
 
				+debug: false
			
 
				+server:
			
 
				+  externalDatabaseURL: postgresql://user:password@db-host:5432/gpustack
			
 
				+  ingress:
			
 
				+    hostname: maas.example.com
			
 
				+  apiPort: 30080
			
 
				+  metricsPort: 10161
			
 
				+worker:
			
 
				+  gpuVendor: nvidia
			
 
				+  port: 10150
			
 
				+  metricsPort: 10151
			
 
				+higress-core:
			
 
				+  enabled: true
			
 
				+  global:
			
 
				+    hub: docker.io/gpustack
			
 
				+EOF
			
 
				+
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace -f my-values.yaml
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 7. values.yaml 配置参考
			
 
				+
			
 
				+| 参数 | 默认值 | 说明 |
			
 
				+|------|--------|------|
			
 
				+| `debug` | `false` | 启用调试模式 |
			
 
				+| `registrationToken` | `null` | Worker 注册令牌，null 时自动生成 |
			
 
				+| `systemDefaultContainerRegistry` | `null` | 默认镜像仓库前缀 |
			
 
				+| `image.repository` | `gpustack/gpustack` | 镜像仓库 |
			
 
				+| `image.tag` | `null` | 镜像标签，默认使用 Chart 的 appVersion |
			
 
				+| `image.pullPolicy` | `IfNotPresent` | 镜像拉取策略 |
			
 
				+| `server.ingress.hostname` | `null` | Ingress 主机名 |
			
 
				+| `server.ingress.tls.cert` | `null` | TLS 证书内容 |
			
 
				+| `server.ingress.tls.key` | `null` | TLS 私钥内容 |
			
 
				+| `server.externalDatabaseURL` | `null` | 外部数据库连接串 |
			
 
				+| `server.dataVolume.hostPath` | `null` | Server 数据 hostPath |
			
 
				+| `server.dataVolume.size` | `10Gi` | Server 数据 PVC 大小 |
			
 
				+| `server.apiPort` | `30080` | API 服务端口 |
			
 
				+| `server.metricsPort` | `10161` | 指标端口 |
			
 
				+| `server.environmentConfig` | `{}` | Server 额外环境变量 |
			
 
				+| `gateway.ingressClassname` | `higress` | Higress IngressClass 名称 |
			
 
				+| `higress-core.enabled` | `true` | 是否部署 Higress 子 Chart |
			
 
				+| `higress-core.global.ingressClass` | `higress` | 需匹配 `gateway.ingressClassname` |
			
 
				+| `higress-core.global.enablePluginServer` | `false` | GPUStack 自行管理插件 |
			
 
				+| `higress-core.global.hub` | `docker.io/gpustack` | Higress 镜像仓库 |
			
 
				+| `higressPlugins.image.repository` | `gpustack/higress-plugins` | Higress 插件镜像仓库 |
			
 
				+| `higressPlugins.image.tag` | `"0.2.0"` | Higress 插件镜像标签 |
			
 
				+| `worker.enabled` | `true` | 启用 Worker 节点 |
			
 
				+| `worker.gpuVendor` | `nvidia` | GPU 厂商 |
			
 
				+| `worker.port` | `10150` | Worker 服务端口 |
			
 
				+| `worker.metricsPort` | `10151` | Worker 指标端口 |
			
 
				+| `worker.environmentConfig` | `{}` | Worker 额外环境变量 |
			
 
				+| `worker.dataDir` | `/var/lib/gpustack` | Worker 数据目录 |
			
 
				+| `worker.extraVolumeMounts` | `[]` | Worker 额外卷挂载 |
			
 
				+| `worker.extraVolumes` | `[]` | Worker 额外卷 |
			
 
				+
			
 
				+### GPU 厂商配置值
			
 
				+
			
 
				+| `worker.gpuVendor` | 适用硬件 | 特殊处理 |
			
 
				+|---------------------|---------|---------|
			
 
				+| `nvidia` | NVIDIA GPU | runtimeClassName: nvidia |
			
 
				+| `mthreads` | Moore Threads GPU | runtimeClassName: mthreads |
			
 
				+| `amd` | AMD GPU (ROCm) | 挂载 `/opt/rocm` |
			
 
				+| `ascend` | Huawei Ascend NPU | 挂载 `/usr/local/Ascend/driver` 和 `/usr/local/Ascend/ascend-toolkit` |
			
 
				+| `hygon` | Hygon DCU | 挂载 `/opt/hyhal` 和 `/opt/dtk` |
			
 
				+| `metax` | MetaX GPU | 挂载 `/opt/mxdriver` 和 `/opt/maca` |
			
 
				+| `iluvatar` | Iluvatar GPU | 挂载 `/usr/local/corex` |
			
 
				+| `cambricon` | Cambricon MLU | 挂载 `/usr/bin/cnmon` 和 `/usr/local/neuware` |
			
 
				+| `thead` | T-Head PPU | 挂载 `/usr/local/PPU_SDK` |
			
 
				+| `null` | CPU-only | 无特殊处理 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 8. 更新与卸载
			
 
				+
			
 
				+### 8.1 更新
			
 
				+
			
 
				+```bash
			
 
				+# 更新 Chart
			
 
				+helm upgrade -n gpustack-system gpustack ./gpustack
			
 
				+
			
 
				+# 使用新 values 更新
			
 
				+helm upgrade -n gpustack-system gpustack ./gpustack -f my-values.yaml
			
 
				+```
			
 
				+
			
 
				+### 8.2 卸载
			
 
				+
			
 
				+```bash
			
 
				+helm uninstall -n gpustack-system gpustack
			
 
				+```
			
 
				+
			
 
				+> **注意：** 卸载 Helm release 不会删除 PVC 中的数据。如需完全清理：
			
 
				+>
			
 
				+> ```bash
			
 
				+> kubectl delete pvc -n gpustack-system -l app=gpustack-server
			
 
				+> ```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 9. 限制
			
 
				+
			
 
				+- **Server 不支持多副本：** Server 以 StatefulSet 部署，目前仅支持 1 个副本
			
 
				+- **内置 PostgreSQL：** 默认使用内置嵌入式 PostgreSQL，推荐使用 `server.externalDatabaseURL` 配置外部数据库
			
 
				+- **PVC 要求：** StatefulSet 使用 `volumeClaimTemplates`（默认 10Gi），需配置默认 StorageClass；或使用 `server.dataVolume.hostPath` 指定 hostPath
			
 
				+- **Higress 插件依赖：** Higress 网关重启时会从 `gpustack/higress-plugins` Deployment 下载插件，该服务不可用时会阻塞网关启动
			
 
				+- **现有 Ingress 控制器冲突：** 如果集群中已有其他 Ingress 控制器，需设置 `higress-core.enabled=false` 并配置 `gateway.ingressClassname`
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 10. 端口说明
			
 
				+
			
 
				+| 端口 | 组件 | 用途 |
			
 
				+|------|------|------|
			
 
				+| `30080` | Server API | REST API + Web UI（NodePort/Ingress） |
			
 
				+| `10161` | Server Metrics | Prometheus 指标 |
			
 
				+| `10150` | Worker | Worker 通信端口 |
			
 
				+| `10151` | Worker Metrics | Worker 指标 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 下一步
			
 
				+
			
 
				+- 部署模型 → [快速开始](../README.md#快速开始)
			
 
				+- Docker Compose 部署方式 → [Docker Compose 部署指南](./docker-compose.md)
			
 
				+- Worker 节点独立部署 → [Worker 节点部署指南](./worker.md)
			
--- a/docs/deployment/worker.md
+++ b/docs/deployment/worker.md
@@ -0,0 +1,273 @@
 
				+# Worker 节点部署指南
			
 
				+
			
 
				+本文档介绍如何部署 MASS-Base Worker 节点，为平台提供 GPU 推理能力。
			
 
				+
			
 
				+## 概述
			
 
				+
			
 
				+Worker 是 MASS-Base 的实际推理执行单元，负责：
			
 
				+
			
 
				+- 检测 GPU 设备并上报资源信息
			
 
				+- 管理模型实例的生命周期（启动、停止、重启）
			
 
				+- 导出性能指标（GPU 利用率、显存、推理延迟等）
			
 
				+- 向 Server 发送心跳并同步状态
			
 
				+
			
 
				+Worker 必须运行在 **Linux 节点**上，且该节点需配备 GPU/NPU 等加速器。Server 可以运行在无 GPU 的 CPU 节点上。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 前置要求
			
 
				+
			
 
				+| 要求 | 说明 |
			
 
				+|------|------|
			
 
				+| 操作系统 | Linux（推荐 Ubuntu 20.04+ / Debian 12+） |
			
 
				+| 加速器 | NVIDIA GPU、AMD GPU、Ascend NPU、Hygon DCU、MThreads GPU、Iluvatar GPU、MetaX GPU、Cambricon MLU、T-Head PPU |
			
 
				+| 驱动 | 已安装对应加速器厂商的驱动 |
			
 
				+| Docker | 20.10+ |
			
 
				+| NVIDIA Container Toolkit | NVIDIA GPU 必需，[安装指南](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) |
			
 
				+| 网络连接 | Worker 需能访问 Server 的 API 端口（默认 80） |
			
 
				+
			
 
				+### GPU 驱动版本参考
			
 
				+
			
 
				+| 厂商 | 驱动 / 工具包 |
			
 
				+|------|--------------|
			
 
				+| NVIDIA | NVIDIA Driver + CUDA |
			
 
				+| AMD | ROCm |
			
 
				+| Ascend | CANN + Ascend Driver |
			
 
				+| Hygon | DTK + HyHal |
			
 
				+| MThreads | MThreads Driver |
			
 
				+| Iluvatar | CoreX |
			
 
				+| MetaX | MACA + MX Driver |
			
 
				+| Cambricon | Neuware + cnmon |
			
 
				+| T-Head | PPU SDK |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方式一：Docker 直接部署 Worker
			
 
				+
			
 
				+### 1. 在 Server 端获取注册令牌
			
 
				+
			
 
				+```bash
			
 
				+# 如果 Server 是通过 Docker Compose 部署的
			
 
				+docker exec gpustack-server cat /var/lib/gpustack/registration_token
			
 
				+```
			
 
				+
			
 
				+### 2. 在 Worker 节点上启动
			
 
				+
			
 
				+```bash
			
 
				+docker run -d --name gpustack-worker \
			
 
				+    --restart unless-stopped \
			
 
				+    --privileged \
			
 
				+    --network host \
			
 
				+    --ipc host \
			
 
				+    -v /var/run/docker.sock:/var/run/docker.sock \
			
 
				+    -v /var/run/cdi:/var/run/cdi \
			
 
				+    -v /var/lib/gpustack:/var/lib/gpustack \
			
 
				+    -e GPUSTACK_SERVER_URL=http://<SERVER_IP>:80 \
			
 
				+    -e GPUSTACK_RUNTIME_DEPLOY=Docker \
			
 
				+    -e GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT=true \
			
 
				+    -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins \
			
 
				+    gpustack/gpustack:latest \
			
 
				+    gpustack start \
			
 
				+    --gateway-mode disabled \
			
 
				+    --worker
			
 
				+```
			
 
				+
			
 
				+> **参数说明：**
			
 
				+> - `--privileged`：允许 Worker 访问 GPU 设备
			
 
				+> - `--network host`：使用宿主机网络，简化端口管理
			
 
				+> - `--ipc host`：共享 IPC 命名空间，某些推理引擎需要
			
 
				+> - `-v /var/run/docker.sock:/var/run/docker.sock`：让 Worker 能调度模型容器
			
 
				+> - `--gateway-mode disabled`：Worker 不启动网关，仅做推理
			
 
				+> - `--worker`：以 Worker 模式运行
			
 
				+
			
 
				+### 3. 验证部署
			
 
				+
			
 
				+```bash
			
 
				+# 查看 Worker 日志
			
 
				+docker logs -f gpustack-worker
			
 
				+
			
 
				+# 在 Server UI 中查看节点是否上线
			
 
				+# 访问 http://<SERVER_IP> -> Clusters 页面
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方式二：Server 与 Worker 合一（嵌入式模式）
			
 
				+
			
 
				+如果单机部署且节点本身有 GPU，可以直接在 Server 容器中启用 Worker 模式：
			
 
				+
			
 
				+```bash
			
 
				+docker run -d --name mass-base \
			
 
				+    --restart unless-stopped \
			
 
				+    --privileged \
			
 
				+    --network host \
			
 
				+    --ipc host \
			
 
				+    -v /var/run/docker.sock:/var/run/docker.sock \
			
 
				+    -v /var/run/cdi:/var/run/cdi \
			
 
				+    -v mass-base-data:/var/lib/mass-base \
			
 
				+    -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins \
			
 
				+    -e NVIDIA_VISIBLE_DEVICES=all \
			
 
				+    -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
			
 
				+    mass-base/mass-base \
			
 
				+    gpustack start \
			
 
				+    --gateway-mode disabled \
			
 
				+    --api-port 80
			
 
				+```
			
 
				+
			
 
				+该模式下，同一个容器既作为 Server 也作为 Worker，适合单 GPU 节点的快速部署。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方式三：通过 Docker Compose 附加 Worker
			
 
				+
			
 
				+可在 `docker-compose` 文件中追加 Worker 服务：
			
 
				+
			
 
				+```yaml
			
 
				+  gpustack-worker:
			
 
				+    image: gpustack/gpustack:latest
			
 
				+    container_name: gpustack-worker
			
 
				+    restart: unless-stopped
			
 
				+    privileged: true
			
 
				+    network_mode: host
			
 
				+    ipc: host
			
 
				+    environment:
			
 
				+      GPUSTACK_SERVER_URL: http://<SERVER_IP>:80
			
 
				+      GPUSTACK_RUNTIME_DEPLOY: "Docker"
			
 
				+      GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT: "true"
			
 
				+      # NVIDIA GPU 环境变量
			
 
				+      NVIDIA_VISIBLE_DEVICES: all
			
 
				+      NVIDIA_DRIVER_CAPABILITIES: compute,utility
			
 
				+    volumes:
			
 
				+      - /var/run/docker.sock:/var/run/docker.sock
			
 
				+      - /var/run/cdi:/var/run/cdi
			
 
				+      - /var/lib/gpustack:/var/lib/gpustack
			
 
				+      - /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins
			
 
				+    command: ["gpustack", "start", "--gateway-mode", "disabled", "--worker"]
			
 
				+```
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 方式四：通过 Kubernetes (Helm) 部署 Worker
			
 
				+
			
 
				+在 Kubernetes 环境中，Worker 以 DaemonSet 形式自动部署到每个 GPU 节点。
			
 
				+
			
 
				+```bash
			
 
				+helm install -n gpustack-system gpustack ./gpustack --create-namespace \
			
 
				+  --set worker.enabled=true \
			
 
				+  --set worker.gpuVendor=nvidia
			
 
				+```
			
 
				+
			
 
				+支持的 GPU 厂商配置：
			
 
				+
			
 
				+| `worker.gpuVendor` | 适用硬件 |
			
 
				+|---------------------|---------|
			
 
				+| `nvidia` | NVIDIA GPU |
			
 
				+| `amd` | AMD GPU (ROCm) |
			
 
				+| `ascend` | Huawei Ascend NPU |
			
 
				+| `hygon` | Hygon DCU |
			
 
				+| `mthreads` | Moore Threads GPU |
			
 
				+| `iluvatar` | Iluvatar GPU |
			
 
				+| `metax` | MetaX GPU |
			
 
				+| `cambricon` | Cambricon MLU |
			
 
				+| `thead` | T-Head PPU |
			
 
				+| `null` | CPU-only（无 GPU 推理） |
			
 
				+
			
 
				+详细说明请参见 [Kubernetes (Helm) 部署指南](./kubernetes.md)。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 环境变量参考
			
 
				+
			
 
				+Worker 容器支持以下环境变量：
			
 
				+
			
 
				+| 变量 | 说明 | 默认值 |
			
 
				+|------|------|--------|
			
 
				+| `GPUSTACK_SERVER_URL` | Server API 地址（必填） | - |
			
 
				+| `GPUSTACK_WORKER_NAME` | Worker 名称 | 主机名 |
			
 
				+| `GPUSTACK_WORKER_IP` | Worker IP 地址 | 自动检测 |
			
 
				+| `GPUSTACK_RUNTIME_DEPLOY` | 部署模式（`Docker` / `Kubernetes`） | 自动检测 |
			
 
				+| `GPUSTACK_RUNTIME_DEPLOY_MIRRORED_DEPLOYMENT` | 启用镜像部署模式 | `true` |
			
 
				+
			
 
				+### GPU 厂商环境变量
			
 
				+
			
 
				+| 厂商 | 环境变量 | 说明 |
			
 
				+|------|---------|------|
			
 
				+| NVIDIA | `NVIDIA_VISIBLE_DEVICES` | 可见 GPU 设备，`all` 表示全部 |
			
 
				+| NVIDIA | `NVIDIA_DRIVER_CAPABILITIES` | 驱动能力，推荐 `compute,utility` |
			
 
				+| NVIDIA | `NVIDIA_DISABLE_REQUIRE` | 不强制要求特定 runtime |
			
 
				+| AMD | `AMD_VISIBLE_DEVICES` | AMD GPU 设备，`all` 表示全部 |
			
 
				+| Ascend | `ASCEND_HOME_PATH` | Ascend 工具包路径 |
			
 
				+| Hygon | `ROCM_PATH` / `ROCM_SMI_LIB_PATH` | Hygon 驱动路径 |
			
 
				+| Iluvatar | `COREX_HOME` | CoreX 工具包路径 |
			
 
				+| MThreads | `MTHREADS_VISIBLE_DEVICES` | MThreads GPU 设备 |
			
 
				+| Cambricon | `CAMBRICON_VISIBLE_DEVICES` | Cambricon MLU 设备 |
			
 
				+| T-Head | `PPU_HOME` | PPU SDK 路径 |
			
 
				+| MetaX | `LD_LIBRARY_PATH` | 需包含 MACA 和 MX 驱动库路径 |
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 多 Worker 部署
			
 
				+
			
 
				+平台支持多个 Worker 节点同时接入 Server，Server 会自动调度和负载均衡：
			
 
				+
			
 
				+```
			
 
				+                    ┌──────────┐
			
 
				+   ┌───────────────▶│ Worker 1 │ (GPU Node A)
			
 
				+   │                └──────────┘
			
 
				+┌──────────┐        ┌──────────┐
			
 
				+│  Server  ├────────▶ Worker 2 │ (GPU Node B)
			
 
				+└──────────┘        └──────────┘
			
 
				+   │                ┌──────────┐
			
 
				+   └───────────────▶│ Worker 3 │ (GPU Node C)
			
 
				+                    └──────────┘
			
 
				+```
			
 
				+
			
 
				+只需在每个 Worker 节点上执行相同的部署命令，使用相同的 `GPUSTACK_SERVER_URL` 和注册令牌即可。
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 常见问题
			
 
				+
			
 
				+### 1. Worker 注册失败
			
 
				+
			
 
				+```
			
 
				+Error: Worker not registered with server
			
 
				+```
			
 
				+
			
 
				+- 检查 `GPUSTACK_SERVER_URL` 是否正确
			
 
				+- 检查 Worker 能否访问 Server 的网络（`curl http://<SERVER_IP>:80`）
			
 
				+- 确认注册令牌有效
			
 
				+
			
 
				+### 2. 无法检测到 GPU
			
 
				+
			
 
				+```
			
 
				+No GPU devices detected
			
 
				+```
			
 
				+
			
 
				+- 确认驱动已正确安装（`nvidia-smi` / `rocm-smi` / `npu-smi` 等）
			
 
				+- 确认使用了 `--privileged` 参数
			
 
				+- NVIDIA 用户确认 `nvidia-container-toolkit` 已安装
			
 
				+- 尝试添加 `-v /dev:/dev:ro` 卷挂载
			
 
				+
			
 
				+### 3. 模型推理失败
			
 
				+
			
 
				+```
			
 
				+Error: CUDA out of memory
			
 
				+```
			
 
				+
			
 
				+- 检查 GPU 显存是否充足（通过 Grafana 或 `nvidia-smi` 查看）
			
 
				+- 在 Server UI 中调整模型的资源配置
			
 
				+- 减少并发请求或选择更小的模型
			
 
				+
			
 
				+### 4. Worker 频繁离线
			
 
				+
			
 
				+- 检查网络稳定性
			
 
				+- 调整 Worker 心跳间隔（通过 `GPUSTACK_WORKER_HEARTBEAT_INTERVAL` 环境变量）
			
 
				+- 检查 Docker 资源限制（CPU、内存）
			
 
				+
			
 
				+---
			
 
				+
			
 
				+## 下一步
			
 
				+
			
 
				+- 在 Server UI 中部署模型 → [快速开始](../README.md#快速开始)
			
 
				+- 使用 Kubernetes 进行大规模部署 → [Kubernetes (Helm) 部署指南](./kubernetes.md)
			
--- a/docs/environment-variables.md
+++ b/docs/environment-variables.md
@@ -1,6 +1,6 @@
 
				 # Environment Variables
			
 
				 
			
 
				-GPUStack supports various environment variables for configuration.
			
 
				+MASS-Base supports various environment variables for configuration.
			
 
				 
			
 
				 Most command line parameters can also be set via environment variables with the `GPUSTACK_` prefix and in uppercase format (e.g., `--data-dir` can be set via `GPUSTACK_DATA_DIR`).
			
 
				 
			
@@ -17,14 +17,14 @@ Configuration values are applied in the following priority order (highest to low
 
				 
			
 
				 This means that command line arguments will always override environment variables, and environment variables will override values in the configuration file.
			
 
				 
			
 
				-## GPUStack Core Environment Variables
			
 
				+## MASS-Base Core Environment Variables
			
 
				 
			
 
				 These environment variables are typically used for third-party service integrations.
			
 
				 
			
 
				 The **Applies to** column indicates where the environment variable should be set:
			
 
				 
			
 
				-- **Server** - Applies to the GPUStack server.
			
 
				-- **Worker** - Applies to GPUStack workers.
			
 
				+- **Server** - Applies to the MASS-Base server.
			
 
				+- **Worker** - Applies to MASS-Base workers.
			
 
				 - **Model** - Applies to model deployment configurations.
			
 
				 
			
 
				 ### Proxy Configuration
			
@@ -76,8 +76,8 @@ The **Applies to** column indicates where the environment variable should be set
 
				 | Variable                                              | Description                                                                                                                        | Default                              | Applies to |
			
 
				 | ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------ | ---------- |
			
 
				 | `GPUSTACK_HIGRESS_EXT_AUTH_TIMEOUT_MS`                | Higress external authentication timeout in milliseconds.                                                                           | `30000`                              | Server     |
			
 
				-| `GPUSTACK_GATEWAY_PORT_CHECK_INTERVAL`                | The interval in seconds of GPUStack Server checking embedded gateway listening port                                                | `2`                                  | Server     |
			
 
				-| `GPUSTACK_GATEWAY_PORT_CHECK_RETRY_COUNT`             | The retry count of GPUStack Server checking embedded gateway listening port                                                        | `300`                                | Server     |
			
 
				+| `GPUSTACK_GATEWAY_PORT_CHECK_INTERVAL`                | The interval in seconds of MASS-Base Server checking embedded gateway listening port                                                | `2`                                  | Server     |
			
 
				+| `GPUSTACK_GATEWAY_PORT_CHECK_RETRY_COUNT`             | The retry count of MASS-Base Server checking embedded gateway listening port                                                        | `300`                                | Server     |
			
 
				 | `GPUSTACK_GATEWAY_AI_STATISTICS_PLUGIN_CONTENT_TYPES` | Comma-separated list of content-types to be monitored by the ai-statistics plugin. Each value should be a valid HTTP Content-Type. | `application/json,text/event-stream` | Server     |
			
 
				 
			
 
				 ### Usage Tracking Configuration
			
@@ -135,7 +135,7 @@ The **Applies to** column indicates where the environment variable should be set
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    These environment variables are **not** set when starting GPUStack. Instead, they should be configured in the **Advanced Options > Environment Variables** section when deploying a model. They are used to customize the model serving behavior.
			
 
				+    These environment variables are **not** set when starting MASS-Base. Instead, they should be configured in the **Advanced Options > Environment Variables** section when deploying a model. They are used to customize the model serving behavior.
			
 
				 
			
 
				 | <div style="width:180px">Variable</div>                   | Description                                                                                                                                                                                    | Default | Applies to |
			
 
				 | --------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------- |
			
@@ -150,7 +150,7 @@ The **Applies to** column indicates where the environment variable should be set
 
				 | `GPUSTACK_MODEL_RUNTIME_UID`                              | Control the user permissions of processes running inside the container.                                                                                                                        | (empty) | Model      |
			
 
				 | `GPUSTACK_MODEL_RUNTIME_GID`                              | Control the group permissions of processes running inside the container.                                                                                                                       | (empty) | Model      |
			
 
				 | `GPUSTACK_MODEL_RUNTIME_SHM_SIZE_GIB`                     | Shared memory size for the container in GiB.                                                                                                                                                   | `10.0`  | Model      |
			
 
				-| `GPUSTACK_MODEL_INFERENCE_HEALTH_CHECK_ENABLED`           | Enable inference health check for this model. When enabled, GPUStack periodically sends minimal inference requests to verify the model is responding correctly.                                | `false` | Model      |
			
 
				+| `GPUSTACK_MODEL_INFERENCE_HEALTH_CHECK_ENABLED`           | Enable inference health check for this model. When enabled, MASS-Base periodically sends minimal inference requests to verify the model is responding correctly.                                | `false` | Model      |
			
 
				 | `GPUSTACK_MODEL_INFERENCE_HEALTH_CHECK_INTERVAL`          | Inference health check interval in seconds (minimum `60`). If recent successful inference traffic is observed within this interval, the active check is skipped.                               | `300`   | Model      |
			
 
				 | `GPUSTACK_MODEL_INFERENCE_HEALTH_CHECK_TIMEOUT`           | Timeout in seconds for each inference health check request.                                                                                                                                    | `15`    | Model      |
			
 
				 | `GPUSTACK_MODEL_INFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD` | Number of consecutive inference health check failures before marking the instance as unhealthy.                                                                                                | `3`     | Model      |
			
@@ -173,9 +173,9 @@ The serving command script automatically handles:
 
				 - Supporting both `uv pip` and `pip` for package installation
			
 
				 - Handling custom PyPI indices via `PIP_INDEX_URL` and `PIP_EXTRA_INDEX_URL`
			
 
				 
			
 
				-## GPUStack Runtime Environment Variables
			
 
				+## MASS-Base Runtime Environment Variables
			
 
				 
			
 
				-These environment variables are used by GPUStack runtime. Commonly used to adjust the behavior of inference backends running in Docker/Kubernetes.
			
 
				+These environment variables are used by MASS-Base runtime. Commonly used to adjust the behavior of inference backends running in Docker/Kubernetes.
			
 
				 
			
 
				 They are only usable within workers. Please set the environment variables in the workers’ containers to ensure they take effect properly.
			
 
				 
			
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -4,7 +4,7 @@
 
				 
			
 
				 ### Hybrid Cluster Support
			
 
				 
			
 
				-GPUStack supports heterogeneous clusters spanning NVIDIA, AMD, Ascend NPUs, Hygon DCUs, Moore Threads, Iluvatar, MetaX, Cambricon and T-head PPUs, and works across both AMD64 and ARM64 architectures.
			
 
				+MASS-Base supports heterogeneous clusters spanning NVIDIA, AMD, Ascend NPUs, Hygon DCUs, Moore Threads, Iluvatar, MetaX, Cambricon and T-head PPUs, and works across both AMD64 and ARM64 architectures.
			
 
				 
			
 
				 ### Distributed Inference Support
			
 
				 
			
@@ -34,7 +34,7 @@ GPUStack supports heterogeneous clusters spanning NVIDIA, AMD, Ascend NPUs, Hygo
 
				 
			
 
				 ### How can I change the registered worker name?
			
 
				 
			
 
				-You can set it to a custom name using the `--worker-name` flag when running GPUStack:
			
 
				+You can set it to a custom name using the `--worker-name` flag when running MASS-Base:
			
 
				 
			
 
				 ```diff
			
 
				 sudo docker run -d --name gpustack \
			
@@ -45,7 +45,7 @@ sudo docker run -d --name gpustack \
 
				 
			
 
				 ### How can I change the registered worker IP?
			
 
				 
			
 
				-You can set it to a custom IP using the `--worker-ip` flag when running GPUStack:
			
 
				+You can set it to a custom IP using the `--worker-ip` flag when running MASS-Base:
			
 
				 
			
 
				 ```diff
			
 
				 sudo docker run -d --name gpustack \
			
@@ -54,9 +54,9 @@ sudo docker run -d --name gpustack \
 
				 +    --worker-ip xx.xx.xx.xx
			
 
				 ```
			
 
				 
			
 
				-### Where are GPUStack's data stored?
			
 
				+### Where are MASS-Base's data stored?
			
 
				 
			
 
				-When running the GPUStack container, the Docker volume is mounted using `--volume/-v` parameter. The default data path is under the Docker data directory, specifically in the volumes subdirectory, and the default path is:
			
 
				+When running the MASS-Base container, the Docker volume is mounted using `--volume/-v` parameter. The default data path is under the Docker data directory, specifically in the volumes subdirectory, and the default path is:
			
 
				 
			
 
				 ```bash
			
 
				 /var/lib/docker/volumes/gpustack-data/_data
			
@@ -83,7 +83,7 @@ sudo docker run -d --name gpustack \
 
				 
			
 
				 ### Where are model files stored?
			
 
				 
			
 
				-When running the GPUStack container, the Docker volume is mounted using `--volume/-v` parameter. The default cache path is under the Docker data directory, specifically in the volumes subdirectory, and the default path is:
			
 
				+When running the MASS-Base container, the Docker volume is mounted using `--volume/-v` parameter. The default cache path is under the Docker data directory, specifically in the volumes subdirectory, and the default path is:
			
 
				 
			
 
				 ```bash
			
 
				 /var/lib/docker/volumes/gpustack-data/_data/cache
			
@@ -164,7 +164,7 @@ If the allocatable GPU memory is less than 90%, but you are sure the model can r
 
				 
			
 
				 **Note**: If the model encounters an error after running and the logs show `CUDA: out of memory`, it means the allocated GPU memory is insufficient. You will need to further adjust `--gpu-memory-utilization`, add more resources, or deploy a smaller model.
			
 
				 
			
 
				-The context size for the model also affects the required GPU memory. You can adjust the `--max-model-len` parameter to set a smaller context. In GPUStack, if this parameter is not set, its default value is 8192. If it is specified in the backend parameters, the actual setting will take effect.
			
 
				+The context size for the model also affects the required GPU memory. You can adjust the `--max-model-len` parameter to set a smaller context. In MASS-Base, if this parameter is not set, its default value is 8192. If it is specified in the backend parameters, the actual setting will take effect.
			
 
				 
			
 
				 You can adjust it to a smaller context as needed, for example, `--max-model-len=2048`. However, keep in mind that the max tokens for each inference request cannot exceed the value of `--max-model-len`. Therefore, setting a very small context may cause inference truncation.
			
 
				 
			
@@ -176,7 +176,7 @@ The `--enforce-eager` parameter also helps reduce GPU memory usage. However, thi
 
				 
			
 
				 ### What should I do if the model is stuck in `Scheduled` state?
			
 
				 
			
 
				-Try restarting the GPUStack container where the model is scheduled. If the issue persists, check the worker logs [here](troubleshooting.md#view-gpustack-logs) to analyze the cause.
			
 
				+Try restarting the MASS-Base container where the model is scheduled. If the issue persists, check the worker logs [here](troubleshooting.md#view-gpustack-logs) to analyze the cause.
			
 
				 
			
 
				 ### What should I do if the model is stuck in `Error` state?
			
 
				 
			
@@ -214,13 +214,13 @@ This is a limitation of vLLM. You can adjust the `--limit-mm-per-prompt` paramet
 
				 
			
 
				 ---
			
 
				 
			
 
				-## Managing GPUStack
			
 
				+## Managing MASS-Base
			
 
				 
			
 
				-### How do I use GPUStack behind a proxy?
			
 
				+### How do I use MASS-Base behind a proxy?
			
 
				 
			
 
				-We recommend passing standard proxy environment variables when running GPUStack.
			
 
				+We recommend passing standard proxy environment variables when running MASS-Base.
			
 
				 
			
 
				-The following case demonstrates how to configure GPUStack to forward all requests to the target proxy, except for requests to addresses specified in the NO_PROXY environment variable.
			
 
				+The following case demonstrates how to configure MASS-Base to forward all requests to the target proxy, except for requests to addresses specified in the NO_PROXY environment variable.
			
 
				 
			
 
				 ```bash
			
 
				 docker run -d --name gpustack \
			
--- a/docs/installation/air-gapped.md
+++ b/docs/installation/air-gapped.md
@@ -1,6 +1,6 @@
 
				 # Air-Gapped Installation
			
 
				 
			
 
				-GPUStack can be installed in an air-gapped (offline) environment with no internet access.
			
 
				+MASS-Base can be installed in an air-gapped (offline) environment with no internet access.
			
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
@@ -18,7 +18,7 @@ If your system supports a container toolkit, install and configure it as needed
 
				 
			
 
				 ### Container Images
			
 
				 
			
 
				-GPUStack offers an [Image Selector](https://docs.gpustack.ai/latest/image-selector/) site to help users easily pick the images they want to download. For more advanced or automated syncing, GPUStack also provides image management commands:
			
 
				+MASS-Base offers an [Image Selector](https://docs.gpustack.ai/latest/image-selector/) site to help users easily pick the images they want to download. For more advanced or automated syncing, MASS-Base also provides image management commands:
			
 
				 
			
 
				 - `gpustack copy-images`: Sync images from one registry to another
			
 
				 - `gpustack save-images`: Download images and save them locally
			
@@ -29,9 +29,9 @@ Below are the details on how to use these CLI commands.
 
				 
			
 
				 - **Copy Images**
			
 
				 
			
 
				-GPUStack provides various container images for different components and inference backends, available on [Docker Hub](https://hub.docker.com/u/gpustack) and [Quay.io](https://quay.io/user/gpustack/).
			
 
				+MASS-Base provides various container images for different components and inference backends, available on [Docker Hub](https://hub.docker.com/u/gpustack) and [Quay.io](https://quay.io/user/gpustack/).
			
 
				 
			
 
				-To transfer the required container images to your internal registry from a machine with internet access, use the GPUStack `copy-images` command:
			
 
				+To transfer the required container images to your internal registry from a machine with internet access, use the MASS-Base `copy-images` command:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run --rm -it --entrypoint "" gpustack/gpustack \
			
@@ -78,7 +78,7 @@ The displayed image list includes all supported accelerators, inference backends
 
				 
			
 
				 If your target environment is air-gapped or does not have internet access, you can first download the required images on a machine with internet connectivity, then transfer and load them into the offline environment.
			
 
				 
			
 
				-GPUStack provides the `save-images` and `load-images` commands for this workflow.
			
 
				+MASS-Base provides the `save-images` and `load-images` commands for this workflow.
			
 
				 
			
 
				 **Copy Images**
			
 
				 
			
@@ -126,7 +126,7 @@ sudo docker run --rm -it --entrypoint "" \
 
				     /gpustack-air-gapped
			
 
				 ```
			
 
				 
			
 
				-This command imports all image packages from the specified directory into the local Docker daemon, making them available for GPUStack.
			
 
				+This command imports all image packages from the specified directory into the local Docker daemon, making them available for MASS-Base.
			
 
				 
			
 
				 !!! note
			
 
				 
			
@@ -136,7 +136,7 @@ For more details on `load-images`, see the [CLI Reference](../cli-reference/load
 
				 
			
 
				 ## Installation
			
 
				 
			
 
				-After preparing the internal container registry with the required images, you can install GPUStack in the air-gapped environment.
			
 
				+After preparing the internal container registry with the required images, you can install MASS-Base in the air-gapped environment.
			
 
				 
			
 
				 ```diff
			
 
				  sudo docker run -d --name gpustack \
			
@@ -152,7 +152,7 @@ After preparing the internal container registry with the required images, you ca
 
				 ### Pulling Inference Backend Images from a Secure Registry
			
 
				 
			
 
				 If your internal container registry requires authentication,  
			
 
				-set the following environment variables when starting the GPUStack worker to allow it to pull the runner image.
			
 
				+set the following environment variables when starting the MASS-Base worker to allow it to pull the runner image.
			
 
				 
			
 
				 ```diff
			
 
				  sudo docker run -d --name gpustack \
			
@@ -167,7 +167,7 @@ set the following environment variables when starting the GPUStack worker to all
 
				 ### Pulling Inference Backend Images from non-default Namespace
			
 
				 
			
 
				 If your internal container registry uses a different namespace than the default `gpustack`,  
			
 
				-set the following environment variable when starting the GPUStack worker to allow it to pull the runner image.
			
 
				+set the following environment variable when starting the MASS-Base worker to allow it to pull the runner image.
			
 
				 
			
 
				 ```diff
			
 
				  sudo docker run -d --name gpustack \
			
--- a/docs/installation/installation.md
+++ b/docs/installation/installation.md
@@ -2,19 +2,19 @@
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
 
				-**GPUStack server:**
			
 
				+**MASS-Base server:**
			
 
				 
			
 
				 - [Docker](https://docs.docker.com/engine/install/) must be installed. Docker Desktop (Windows and macOS) is also supported.
			
 
				 
			
 
				-**GPUStack workers:**
			
 
				+**MASS-Base workers:**
			
 
				 
			
 
				 - [Docker](https://docs.docker.com/engine/install/) must be installed. Docker Desktop is **not** supported.
			
 
				-- Only Linux is supported for GPUStack worker nodes. If you use Windows, consider using WSL2 and avoid using Docker Desktop. macOS is not supported for GPUStack worker nodes.
			
 
				+- Only Linux is supported for MASS-Base worker nodes. If you use Windows, consider using WSL2 and avoid using Docker Desktop. macOS is not supported for MASS-Base worker nodes.
			
 
				 - Ensure the appropriate GPU drivers and container toolkits are installed for your hardware. See the [Installation Requirements](./requirements.md) for details.
			
 
				 
			
 
				-## Install GPUStack Server
			
 
				+## Install MASS-Base Server
			
 
				 
			
 
				-Run the following command to install and start the GPUStack server using Docker:
			
 
				+Run the following command to install and start the MASS-Base server using Docker:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack \
			
@@ -26,17 +26,17 @@ sudo docker run -d --name gpustack \
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    GPUStack v2 uses a single unified container image for all GPU device types.
			
 
				+    MASS-Base v2 uses a single unified container image for all GPU device types.
			
 
				 
			
 
				 ## Startup
			
 
				 
			
 
				-Check the GPUStack container logs:
			
 
				+Check the MASS-Base container logs:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker logs -f gpustack
			
 
				 ```
			
 
				 
			
 
				-If everything is normal, open `http://your_host_ip` in a browser to access the GPUStack UI.
			
 
				+If everything is normal, open `http://your_host_ip` in a browser to access the MASS-Base UI.
			
 
				 
			
 
				 Log in with username `admin` and the default password. Retrieve the initial password with:
			
 
				 
			
@@ -51,7 +51,7 @@ Please follow the UI instructions on the `Clusters` and `Workers` pages to add G
 
				 
			
 
				 ## Custom Configuration
			
 
				 
			
 
				-The following sections describe examples of custom configuration options when starting the GPUStack server container. For a full list of available options, refer to the [CLI Reference](../cli-reference/start.md).
			
 
				+The following sections describe examples of custom configuration options when starting the MASS-Base server container. For a full list of available options, refer to the [CLI Reference](../cli-reference/start.md).
			
 
				 
			
 
				 ### Enable HTTPS with Custom Certificate
			
 
				 
			
@@ -71,7 +71,7 @@ The following sections describe examples of custom configuration options when st
 
				 
			
 
				 ### Using an External Database
			
 
				 
			
 
				-By default, GPUStack uses an embedded PostgreSQL database. To use an external database such as PostgreSQL or MySQL, set the `GPUSTACK_DATABASE_URL` environment variable or use the `--database-url` argument when starting the GPUStack container:
			
 
				+By default, MASS-Base uses an embedded PostgreSQL database. To use an external database such as PostgreSQL or MySQL, set the `GPUSTACK_DATABASE_URL` environment variable or use the `--database-url` argument when starting the MASS-Base container:
			
 
				 
			
 
				 ```diff
			
 
				  sudo docker run -d --name gpustack \
			
@@ -96,7 +96,7 @@ sudo docker run -d --name gpustack \
 
				 
			
 
				 ### Additional Trusted CAs
			
 
				 
			
 
				-If GPUStack needs to communicate with services that use certificates issued by a private or corporate CA (e.g., a self-hosted Identity Provider, a Hugging Face mirror, or an internal API endpoint), mount the CA certificate into the container under `/usr/local/share/ca-certificates/`. GPUStack will automatically import the mounted CA certificates during startup and add them to the system trust store.
			
 
				+If MASS-Base needs to communicate with services that use certificates issued by a private or corporate CA (e.g., a self-hosted Identity Provider, a Hugging Face mirror, or an internal API endpoint), mount the CA certificate into the container under `/usr/local/share/ca-certificates/`. MASS-Base will automatically import the mounted CA certificates during startup and add them to the system trust store.
			
 
				 
			
 
				 ```diff
			
 
				  sudo docker run -d --name gpustack \
			
@@ -137,13 +137,13 @@ git clone -b "$LATEST_TAG" https://github.com/gpustack/gpustack.git
 
				 cd gpustack/docker-compose
			
 
				 ```
			
 
				 
			
 
				-Start the GPUStack server:
			
 
				+Start the MASS-Base server:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker compose -f docker-compose.server.yaml up -d
			
 
				 ```
			
 
				 
			
 
				-If everything is normal, open `http://your_host_ip` in a browser to access the GPUStack UI.
			
 
				+If everything is normal, open `http://your_host_ip` in a browser to access the MASS-Base UI.
			
 
				 
			
 
				 Log in with username `admin` and the default password. Retrieve the initial password with:
			
 
				 
			
--- a/docs/installation/requirements.md
+++ b/docs/installation/requirements.md
@@ -1,10 +1,10 @@
 
				 # Installation Requirements
			
 
				 
			
 
				-This page outlines the software and networking requirements for nodes running GPUStack.
			
 
				+This page outlines the software and networking requirements for nodes running MASS-Base.
			
 
				 
			
 
				 ## Operating System Requirements
			
 
				 
			
 
				-GPUStack supports most modern Linux distributions on **AMD64** and **ARM64** architectures.
			
 
				+MASS-Base supports most modern Linux distributions on **AMD64** and **ARM64** architectures.
			
 
				 
			
 
				 !!! note
			
 
				 
			
@@ -13,7 +13,7 @@ GPUStack supports most modern Linux distributions on **AMD64** and **ARM64** arc
 
				 
			
 
				 ## Accelerator Runtime Requirements
			
 
				 
			
 
				-GPUStack supports a variety of General-Purpose Accelerators as inference backends, including:
			
 
				+MASS-Base supports a variety of General-Purpose Accelerators as inference backends, including:
			
 
				 
			
 
				 - [x] NVIDIA GPU
			
 
				 - [x] AMD GPU
			
@@ -25,7 +25,7 @@ GPUStack supports a variety of General-Purpose Accelerators as inference backend
 
				 - [x] Cambricon MLU (Experimental)
			
 
				 - [x] T-Head PPU (Experimental)
			
 
				 
			
 
				-Ensure all required drivers and toolkits are installed before running GPUStack.
			
 
				+Ensure all required drivers and toolkits are installed before running MASS-Base.
			
 
				 
			
 
				 ### NVIDIA GPU
			
 
				 
			
@@ -235,7 +235,7 @@ sudo ppu-smi
 
				 
			
 
				 ### Connectivity Requirements
			
 
				 
			
 
				-The following network connectivity is required for GPUStack to function properly:
			
 
				+The following network connectivity is required for MASS-Base to function properly:
			
 
				 
			
 
				 **Server-to-Worker:** The server must be able to reach workers to proxy inference requests.
			
 
				 
			
@@ -245,23 +245,23 @@ The following network connectivity is required for GPUStack to function properly
 
				 
			
 
				 ### Port Requirements
			
 
				 
			
 
				-GPUStack uses these ports for communication:
			
 
				+MASS-Base uses these ports for communication:
			
 
				 
			
 
				 #### Server Ports
			
 
				 
			
 
				 | Port      | Description                                                  |
			
 
				 | --------- | ------------------------------------------------------------ |
			
 
				-| TCP 80    | Default port for GPUStack UI and API endpoints               |
			
 
				-| TCP 443   | Default port for GPUStack UI and API endpoints (TLS enabled) |
			
 
				+| TCP 80    | Default port for MASS-Base UI and API endpoints               |
			
 
				+| TCP 443   | Default port for MASS-Base UI and API endpoints (TLS enabled) |
			
 
				 | TCP 10161 | Default port for server metrics endpoint                     |
			
 
				-| TCP 30080 | Default port for GPUStack server internal API                |
			
 
				+| TCP 30080 | Default port for MASS-Base server internal API                |
			
 
				 | TCP 5432  | Default port for embedded Postgres Database                  |
			
 
				 
			
 
				 #### Worker Ports
			
 
				 
			
 
				 | Port            | Description                                                    |
			
 
				 | --------------- | -------------------------------------------------------------- |
			
 
				-| TCP 10150       | Default port for GPUStack worker                               |
			
 
				+| TCP 10150       | Default port for MASS-Base worker                               |
			
 
				 | TCP 10151       | Default port for worker metrics endpoint                       |
			
 
				 | TCP 40000-40063 | Port range for inference services                              |
			
 
				 | TCP 41000-41999 | Port range for Ray services(vLLM distributed deployment using) |
			
--- a/docs/installation/uninstallation.md
+++ b/docs/installation/uninstallation.md
@@ -1,9 +1,9 @@
 
				 # Uninstallation
			
 
				 
			
 
				-GPUStack is typically installed using containerization, 
			
 
				+MASS-Base is typically installed using containerization, 
			
 
				 so uninstallation mainly involves removing the container and any associated data volumes.
			
 
				 
			
 
				-For example, if GPUStack is running in a Docker container named `gpustack`, run:
			
 
				+For example, if MASS-Base is running in a Docker container named `gpustack`, run:
			
 
				 
			
 
				 ```bash
			
 
				 docker rm -f gpustack
			
--- a/docs/integrations/inference-apis.md
+++ b/docs/integrations/inference-apis.md
@@ -2,9 +2,9 @@
 
				 
			
 
				 ## OpenAI-Compatible APIs
			
 
				 
			
 
				-GPUStack provides [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference) at the `/v1` endpoint.
			
 
				+MASS-Base provides [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference) at the `/v1` endpoint.
			
 
				 
			
 
				-You can integrate and use models deployed on GPUStack with any application or framework that supports the OpenAI-compatible API, simply by pointing it to GPUStack's OpenAI-compatible endpoint.
			
 
				+You can integrate and use models deployed on MASS-Base with any application or framework that supports the OpenAI-compatible API, simply by pointing it to MASS-Base's OpenAI-compatible endpoint.
			
 
				 
			
 
				 ### Supported Endpoints
			
 
				 
			
@@ -47,7 +47,7 @@ curl http://your_gpustack_server_url/v1/chat/completions \
 
				 
			
 
				 ## Anthropic-Compatible APIs
			
 
				 
			
 
				-GPUStack provides the Anthropic-compatible [`/v1/messages` API](https://platform.claude.com/docs/en/api/messages/create).
			
 
				+MASS-Base provides the Anthropic-compatible [`/v1/messages` API](https://platform.claude.com/docs/en/api/messages/create).
			
 
				 
			
 
				 ### Usage
			
 
				 
			
@@ -72,7 +72,7 @@ curl http://your_gpustack_server_url/v1/messages \
 
				 
			
 
				 In the context of Retrieval-Augmented Generation (RAG), reranking refers to the process of selecting the most relevant information from retrieved documents or knowledge sources before presenting them to the user or utilizing them for answer generation.
			
 
				 
			
 
				-Note that the OpenAI-compatible APIs **do not** provide a `rerank` endpoint. To fill this gap, GPUStack provides a [Jina-compatible Rerank API](https://jina.ai/reranker/) at the `/v1/rerank` path.
			
 
				+Note that the OpenAI-compatible APIs **do not** provide a `rerank` endpoint. To fill this gap, MASS-Base provides a [Jina-compatible Rerank API](https://jina.ai/reranker/) at the `/v1/rerank` path.
			
 
				 
			
 
				 ### Usage
			
 
				 
			
@@ -133,7 +133,7 @@ Example output:
 
				 
			
 
				 ## Other APIs
			
 
				 
			
 
				-For other API types, GPUStack allows you to enable the **Generic Proxy** feature when deploying a model.
			
 
				+For other API types, MASS-Base allows you to enable the **Generic Proxy** feature when deploying a model.
			
 
				 
			
 
				 Once enabled, there are two ways to address the target model:
			
 
				 
			
--- a/docs/integrations/integrate-with-cherrystudio.md
+++ b/docs/integrations/integrate-with-cherrystudio.md
@@ -1,10 +1,10 @@
 
				 # Integrate with CherryStudio
			
 
				 
			
 
				-CherryStudio integrates with GPUStack to leverage locally hosted LLMs, embeddings and reranking capabilities.
			
 
				+CherryStudio integrates with MASS-Base to leverage locally hosted LLMs, embeddings and reranking capabilities.
			
 
				 
			
 
				 ## Deploying Models
			
 
				 
			
 
				-1. In GPUStack UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				+1. In MASS-Base UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				 
			
 
				     - qwen3-instruct-2507
			
 
				     - qwen2.5-vl-7b
			
@@ -25,9 +25,9 @@ CherryStudio integrates with GPUStack to leverage locally hosted LLMs, embedding
 
				 
			
 
				 3. Copy the API key and save it for later use.
			
 
				 
			
 
				-## Integrating GPUStack into CherryStudio
			
 
				+## Integrating MASS-Base into CherryStudio
			
 
				 
			
 
				-1. Open CherryStudio, go to `Settings` → `Model Provider`, find GPUStack, enable it, and configure it as shown:
			
 
				+1. Open CherryStudio, go to `Settings` → `Model Provider`, find MASS-Base, enable it, and configure it as shown:
			
 
				 
			
 
				     - `API Key`: Input the API key you copied from previous steps.
			
 
				 
			
--- a/docs/integrations/integrate-with-claude-code.md
+++ b/docs/integrations/integrate-with-claude-code.md
@@ -1,11 +1,11 @@
 
				 # Integrate with Claude Code
			
 
				 
			
 
				-Claude Code is an agentic coding tool from Anthropic. Since model deployments on GPUStack are compatible with the Anthropic API, you can easily connect Claude Code to your GPUStack deployment and use it for code generation tasks. In this guide, we will walk through the steps to integrate Claude Code with GPUStack and test the integration by asking Claude to create a Flappy Bird game.
			
 
				+Claude Code is an agentic coding tool from Anthropic. Since model deployments on MASS-Base are compatible with the Anthropic API, you can easily connect Claude Code to your MASS-Base deployment and use it for code generation tasks. In this guide, we will walk through the steps to integrate Claude Code with MASS-Base and test the integration by asking Claude to create a Flappy Bird game.
			
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
 
				 - One or more GPUs with at least 100 GB of VRAM in total
			
 
				-- GPUStack installed and running
			
 
				+- MASS-Base installed and running
			
 
				 - Access to Hugging Face or ModelScope to download model files
			
 
				 
			
 
				 !!! note
			
@@ -14,7 +14,7 @@ Claude Code is an agentic coding tool from Anthropic. Since model deployments on
 
				 
			
 
				 ## Deploy the Model
			
 
				 
			
 
				-1. In the GPUStack UI, navigate to the **Model Catalog** page.
			
 
				+1. In the MASS-Base UI, navigate to the **Model Catalog** page.
			
 
				 
			
 
				 2. Search for `Qwen3-Coder-Next` and deploy the model using the default configuration.
			
 
				 
			
@@ -44,12 +44,12 @@ To easily switch between different model providers, you can use CC-Switch or sim
 
				 
			
 
				 Install [CC-Switch](https://github.com/farion1231/cc-switch) following its documentation.
			
 
				 
			
 
				-## Configure Claude Code with GPUStack
			
 
				+## Configure Claude Code with MASS-Base
			
 
				 
			
 
				 1. Open CC-Switch and add a custom provider with the following settings:
			
 
				 
			
 
				-   - **Provider Name**: `GPUStack`
			
 
				-   - **API Endpoint**: Your GPUStack server URL
			
 
				+   - **Provider Name**: `MASS-Base`
			
 
				+   - **API Endpoint**: Your MASS-Base server URL
			
 
				    - **API Key**: The API key you created earlier
			
 
				 
			
 
				 2. Configure all models to use `qwen3-coder-next`.
			
@@ -81,4 +81,4 @@ Install [CC-Switch](https://github.com/farion1231/cc-switch) following its docum
 
				 
			
 
				 ## Conclusion
			
 
				 
			
 
				-In this guide, we successfully integrated Claude Code with GPUStack and used it to generate a Flappy Bird game. You can now explore more complex coding tasks with Claude Code and leverage the power of GPUStack for efficient model serving.
			
 
				+In this guide, we successfully integrated Claude Code with MASS-Base and used it to generate a Flappy Bird game. You can now explore more complex coding tasks with Claude Code and leverage the power of MASS-Base for efficient model serving.
			
--- a/docs/integrations/integrate-with-dify.md
+++ b/docs/integrations/integrate-with-dify.md
@@ -1,10 +1,10 @@
 
				 # Integrate with Dify
			
 
				 
			
 
				-Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, image generation, Speech-to-Text and Text-to-Speech capabilities.
			
 
				+Dify can integrate with MASS-Base to leverage locally deployed LLMs, embeddings, reranking, image generation, Speech-to-Text and Text-to-Speech capabilities.
			
 
				 
			
 
				 ## Deploying Models
			
 
				 
			
 
				-1. In GPUStack UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				+1. In MASS-Base UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				 
			
 
				 - qwen3-8b
			
 
				 - qwen2.5-vl-3b-instruct
			
@@ -25,9 +25,9 @@ Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings,
 
				 
			
 
				 3. Copy the API key and save it for later use.
			
 
				 
			
 
				-## Integrating GPUStack into Dify
			
 
				+## Integrating MASS-Base into Dify
			
 
				 
			
 
				-1. Access the Dify UI, go to the top right corner and click on `PLUGINS`, select `Install from Marketplace`, search for the GPUStack plugin, and choose to install it.
			
 
				+1. Access the Dify UI, go to the top right corner and click on `PLUGINS`, select `Install from Marketplace`, search for the MASS-Base plugin, and choose to install it.
			
 
				 
			
 
				 ![dify-install-gpustack-plugin](../assets/integrations/integration-dify-install-gpustack-plugin.png)
			
 
				 
			
@@ -35,7 +35,7 @@ Dify can integrate with GPUStack to leverage locally deployed LLMs, embeddings,
 
				 
			
 
				 - Model Type: Select the model type based on the model.
			
 
				 
			
 
				-- Model Name: The name must match the model name deployed on GPUStack.
			
 
				+- Model Name: The name must match the model name deployed on MASS-Base.
			
 
				 
			
 
				 - Server URL: `http://your-gpustack-url`, do not use `localhost`, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the Dify container (you can test this with `curl`).
			
 
				 
			
--- a/docs/integrations/integrate-with-maxkb.md
+++ b/docs/integrations/integrate-with-maxkb.md
@@ -1,10 +1,10 @@
 
				 # Integrate with MaxKB
			
 
				 
			
 
				-MaxKB can integrate with GPUStack to leverage locally deployed **LLMs, embedding models, and reranking models** for building knowledge-based AI assistants.
			
 
				+MaxKB can integrate with MASS-Base to leverage locally deployed **LLMs, embedding models, and reranking models** for building knowledge-based AI assistants.
			
 
				 
			
 
				 ## Deploying Models
			
 
				 
			
 
				-1. In GPUStack UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				+1. In MASS-Base UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				 
			
 
				 * `qwen3.5-35b-a3b`
			
 
				 
			
@@ -29,7 +29,7 @@ MaxKB can integrate with GPUStack to leverage locally deployed **LLMs, embedding
 
				 
			
 
				 ## Obtain Model Access Information
			
 
				 
			
 
				-1. In the GPUStack sidebar, open the **Routes** page.
			
 
				+1. In the MASS-Base sidebar, open the **Routes** page.
			
 
				 
			
 
				 2. Click the **More actions menu** next to the route and select **API Access Info**.
			
 
				 
			
@@ -85,7 +85,7 @@ admin / MaxKB@123..
 
				 
			
 
				 After logging in for the first time, follow the prompt to change the password.
			
 
				 
			
 
				-## Integrating GPUStack into MaxKB
			
 
				+## Integrating MASS-Base into MaxKB
			
 
				 
			
 
				 1. In the MaxKB UI, navigate to **Model** in the top navigation bar.
			
 
				 
			
@@ -99,9 +99,9 @@ After logging in for the first time, follow the prompt to change the password.
 
				 
			
 
				 When configuring the model:
			
 
				 
			
 
				-* **Base Model**: Must match the model name deployed in GPUStack.
			
 
				+* **Base Model**: Must match the model name deployed in MASS-Base.
			
 
				 * **API URL**: `http://your-gpustack-url/v1`
			
 
				-* **API Key**: The API key created in GPUStack.
			
 
				+* **API Key**: The API key created in MASS-Base.
			
 
				 
			
 
				 !!! note
			
 
				 
			
@@ -173,4 +173,4 @@ Open the chat interface to start interacting with the assistant.
 
				 ![](../assets/integrations/maxkb/65.png)
			
 
				 ![](../assets/integrations/maxkb/66.png)
			
 
				 
			
 
				-The assistant can now answer questions based on the connected knowledge base and models deployed on GPUStack.
			
 
				+The assistant can now answer questions based on the connected knowledge base and models deployed on MASS-Base.
			
--- a/docs/integrations/integrate-with-n8n.md
+++ b/docs/integrations/integrate-with-n8n.md
@@ -4,11 +4,11 @@
 
				 
			
 
				 ## Deploy the Model
			
 
				 
			
 
				-Please refer to the **[Model Deployment](../user-guide/model-deployment-management.md#deploy-model)** section in the GPUStack documentation to complete model deployment.
			
 
				+Please refer to the **[Model Deployment](../user-guide/model-deployment-management.md#deploy-model)** section in the MASS-Base documentation to complete model deployment.
			
 
				 
			
 
				 ## API Access Info
			
 
				 
			
 
				-1. Log in to the GPUStack Web UI
			
 
				+1. Log in to the MASS-Base Web UI
			
 
				 2. Navigate to the **Routes** page
			
 
				 3. From the menu on the right side of the target model, select **API Access Info**
			
 
				 
			
@@ -25,9 +25,9 @@ Record the following information (if an API Key has not been created yet, follow
 
				 Follow the official n8n documentation to complete a self-hosted installation, or use the n8n Cloud service directly:
			
 
				 [https://docs.n8n.io/hosting/](https://docs.n8n.io/hosting/)
			
 
				 
			
 
				-## Integrating GPUStack in n8n
			
 
				+## Integrating MASS-Base in n8n
			
 
				 
			
 
				-Since GPUStack provides an OpenAI-compatible API, you can directly use the OpenAI nodes in n8n for configuration:
			
 
				+Since MASS-Base provides an OpenAI-compatible API, you can directly use the OpenAI nodes in n8n for configuration:
			
 
				 
			
 
				 1. Add a **Credential** in n8n
			
 
				 
			
@@ -39,7 +39,7 @@ Since GPUStack provides an OpenAI-compatible API, you can directly use the OpenA
 
				 
			
 
				    ![](../assets/integrations/n8n-05-02.png)
			
 
				 
			
 
				-2. Use the GPUStack Credential
			
 
				+2. Use the MASS-Base Credential
			
 
				 
			
 
				    ![](../assets/integrations/n8n-06.png)
			
 
				    ![](../assets/integrations/n8n-07.png)
			
--- a/docs/integrations/integrate-with-openclaw.md
+++ b/docs/integrations/integrate-with-openclaw.md
@@ -4,11 +4,11 @@
 
				 
			
 
				 ## Deploy the Model
			
 
				 
			
 
				-Please refer to the [**Model Deployment**](../user-guide/model-deployment-management.md#deploy-model) section in the GPUStack documentation to complete model deployment.
			
 
				+Please refer to the [**Model Deployment**](../user-guide/model-deployment-management.md#deploy-model) section in the MASS-Base documentation to complete model deployment.
			
 
				 
			
 
				 ## API Access Info
			
 
				 
			
 
				-1. Log in to the GPUStack Web UI
			
 
				+1. Log in to the MASS-Base Web UI
			
 
				 2. Navigate to the **Routes** page
			
 
				 3. From the menu on the right side of the target model, select **API Access Info**
			
 
				 
			
@@ -27,7 +27,7 @@ Record the following information (if an API Key has not been created yet, follow
 
				 Follow the official OpenClaw documentation to complete the installation:
			
 
				 [https://docs.openclaw.ai/install](https://docs.openclaw.ai/install)
			
 
				 
			
 
				-## Configure GPUStack in OpenClaw
			
 
				+## Configure MASS-Base in OpenClaw
			
 
				 
			
 
				 1. Start the interactive configuration wizard:
			
 
				 
			
@@ -39,7 +39,7 @@ Follow the official OpenClaw documentation to complete the installation:
 
				 
			
 
				    ![](../assets/integrations/openclaw-03.png)
			
 
				 
			
 
				-3. Fill in the information provided by GPUStack as prompted:
			
 
				+3. Fill in the information provided by MASS-Base as prompted:
			
 
				 
			
 
				     * **API Base URL**: Access URL
			
 
				     * **API Key**: API Key
			
@@ -47,7 +47,7 @@ Follow the official OpenClaw documentation to complete the installation:
 
				 
			
 
				    ![](../assets/integrations/openclaw-04.png)
			
 
				 
			
 
				-After completing these steps, OpenClaw will use GPUStack to invoke the corresponding model for inference.
			
 
				+After completing these steps, OpenClaw will use MASS-Base to invoke the corresponding model for inference.
			
 
				 
			
 
				 ## Configure Channels
			
 
				 
			
--- a/docs/integrations/integrate-with-ragflow.md
+++ b/docs/integrations/integrate-with-ragflow.md
@@ -1,10 +1,10 @@
 
				 # Integrate with RAGFlow
			
 
				 
			
 
				-RAGFlow can integrate with GPUStack to leverage locally deployed LLMs, embeddings, reranking, Speech-to-Text and Text-to-Speech capabilities.
			
 
				+RAGFlow can integrate with MASS-Base to leverage locally deployed LLMs, embeddings, reranking, Speech-to-Text and Text-to-Speech capabilities.
			
 
				 
			
 
				 ## Deploying Models
			
 
				 
			
 
				-1. In GPUStack UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				+1. In MASS-Base UI, navigate to the `Deployments` page and click on `Deploy Model` to deploy the models you need. Here are some example models:
			
 
				 
			
 
				 - qwen3-8b
			
 
				 - qwen2.5-vl-3b-instruct
			
@@ -25,13 +25,13 @@ RAGFlow can integrate with GPUStack to leverage locally deployed LLMs, embedding
 
				 
			
 
				 3. Copy the API key and save it for later use.
			
 
				 
			
 
				-## Integrating GPUStack into RAGFlow
			
 
				+## Integrating MASS-Base into RAGFlow
			
 
				 
			
 
				 1. Access the RAGFlow UI, go to the top right corner and click the avatar, select `Model Providers > GPUStack`, then select `Add the model` and fill in:
			
 
				 
			
 
				 - Model type: Select the model type based on the model.
			
 
				 
			
 
				-- Model name: The name must match the model name deployed on GPUStack.
			
 
				+- Model name: The name must match the model name deployed on MASS-Base.
			
 
				 
			
 
				 - Base URL: `http://your-gpustack-url/v1`, the URL should not include the path and do not use `localhost`, as it refers to the container’s internal network. If you’re using a custom port, make sure to include it. Also, ensure the URL is accessible from inside the RAGFlow container (you can test this with `curl`).
			
 
				 
			
--- a/docs/migration.md
+++ b/docs/migration.md
@@ -2,9 +2,9 @@
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    Since v2.0.0, GPUStack Worker officially supports only Linux. If you are using Windows or macOS, please move your data directory to a Linux system to perform the migration.
			
 
				+    Since v2.0.0, MASS-Base Worker officially supports only Linux. If you are using Windows or macOS, please move your data directory to a Linux system to perform the migration.
			
 
				 
			
 
				-    On Windows and macOS, GPUStack Server (without the embedded worker) can still be run using Docker Desktop.
			
 
				+    On Windows and macOS, MASS-Base Server (without the embedded worker) can still be run using Docker Desktop.
			
 
				 
			
 
				 ## Before Migration
			
 
				 
			
@@ -12,7 +12,7 @@
 
				 
			
 
				 #### 1. Removal of Ollama Model Source (since v0.7.x)
			
 
				 
			
 
				-- **Change:** Starting from version 0.7, GPUStack no longer supports `ollama` as a model source.
			
 
				+- **Change:** Starting from version 0.7, MASS-Base no longer supports `ollama` as a model source.
			
 
				 - **Impact:** Models, Model Files, and Model Instances whose source is `ollama` will not be preserved during the upgrade process.
			
 
				 - **Action Required:**  If you are upgrading from a version earlier than v0.7 and currently have models deployed from the `ollama` source, you must migrate these models manually before upgrading.  
			
 
				   We recommend re-deploying affected models using one of the supported sources:
			
@@ -20,7 +20,7 @@
 
				     - ModelScope
			
 
				     - Local path
			
 
				 
			
 
				-    You can perform this migration by re-deploying the models through the **GPUStack UI** before initiating the upgrade.
			
 
				+    You can perform this migration by re-deploying the models through the **MASS-Base UI** before initiating the upgrade.
			
 
				 
			
 
				 ### Backup Your Data
			
 
				 
			
@@ -28,7 +28,7 @@
 
				 
			
 
				       **Backup First:** Before starting the server migration, it’s strongly recommended to back up your database.
			
 
				 
			
 
				-      For default installations on v0.7 or earlier, stop the GPUStack server and create a backup of data dir located inside the container at:
			
 
				+      For default installations on v0.7 or earlier, stop the MASS-Base server and create a backup of data dir located inside the container at:
			
 
				 
			
 
				       ```
			
 
				       /var/lib/gpustack
			
@@ -36,15 +36,15 @@
 
				 
			
 
				 Please go through to the [Installation Requirements](./installation/requirements.md) before starting the migration.
			
 
				 
			
 
				-If you used GPUStack **without Docker** in versions prior to v0.7.1(for example, via pip install or an installation script), please install Docker by following the Docker Engine [Installation Guide](https://docs.docker.com/engine/install/) before proceeding with the migration.
			
 
				+If you used MASS-Base **without Docker** in versions prior to v0.7.1(for example, via pip install or an installation script), please install Docker by following the Docker Engine [Installation Guide](https://docs.docker.com/engine/install/) before proceeding with the migration.
			
 
				 
			
 
				-If you used GPU acceleration for inference in GPUStack prior to v0.7.1, please check whether you need to install the corresponding accelerator runtime’s Container Toolkit or Container Runtime after installing Docker. You can follow the steps in the **Installation Requirements** to check and install them.
			
 
				+If you used GPU acceleration for inference in MASS-Base prior to v0.7.1, please check whether you need to install the corresponding accelerator runtime’s Container Toolkit or Container Runtime after installing Docker. You can follow the steps in the **Installation Requirements** to check and install them.
			
 
				 
			
 
				 ## Migration Steps
			
 
				 
			
 
				 ### Identify Your Legacy Data Directory
			
 
				 
			
 
				-Locate the data directory used by your previous GPUStack installation. The default path is:
			
 
				+Locate the data directory used by your previous MASS-Base installation. The default path is:
			
 
				 
			
 
				 ```
			
 
				 /var/lib/gpustack
			
@@ -70,9 +70,9 @@ Since v2.0.0, you no longer need to specify the GPU computing platform or versio
 
				 
			
 
				 #### Embedded Database Migration (SQLite → PostgreSQL)
			
 
				 
			
 
				-In v0.7 and earlier, GPUStack used an embedded SQLite database by default to store management data. Starting from v2.0.0, GPUStack dropped SQLite support and now uses an embedded PostgreSQL database by default for improved performance and scalability.
			
 
				+In v0.7 and earlier, MASS-Base used an embedded SQLite database by default to store management data. Starting from v2.0.0, MASS-Base dropped SQLite support and now uses an embedded PostgreSQL database by default for improved performance and scalability.
			
 
				 
			
 
				-Start the GPUStack with the `GPUSTACK_DATA_MIGRATION=true` to enable the embedded database migration. Replace `${your-data-dir}` with your legacy data directory containing the original SQLite database and related files:
			
 
				+Start the MASS-Base with the `GPUSTACK_DATA_MIGRATION=true` to enable the embedded database migration. Replace `${your-data-dir}` with your legacy data directory containing the original SQLite database and related files:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack \
			
@@ -101,7 +101,7 @@ Also customizing the `--data-dir`, `GPUSTACK_DATA_DIR` is also supported in data
 
				 
			
 
				 #### External Database Migration
			
 
				 
			
 
				-GPUStack supports using an external database to store the management data. If you previously deployed GPUStack with an external database, start the server with the following command:
			
 
				+MASS-Base supports using an external database to store the management data. If you previously deployed MASS-Base with an external database, start the server with the following command:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack-server \
			
@@ -134,7 +134,7 @@ sudo docker run -d --name gpustack-worker \
 
				 
			
 
				 Please make sure both `--volume /var/run/docker.sock:/var/run/docker.sock` and `--runtime nvidia` are added to the docker command. Those are not required for previous version. For different accelerator runtime, Refer to [Other GPU Architectures](#other-gpu-architectures) to use different option from `--runtime nvidia`.
			
 
				 
			
 
				-This will launch the GPUStack worker using your existing data and connect it to the specified server.
			
 
				+This will launch the MASS-Base worker using your existing data and connect it to the specified server.
			
 
				 
			
 
				 ### Other GPU Architectures
			
 
				 
			
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -27,7 +27,7 @@
 
				 
			
 
				 ## Overview
			
 
				 
			
 
				-GPUStack is an open-source GPU cluster manager designed for efficient AI model deployment. It configures and orchestrates inference engines — vLLM, SGLang, TensorRT-LLM, or your own — to optimize performance across GPU clusters.
			
 
				+MASS-Base is an open-source GPU cluster manager designed for efficient AI model deployment. It configures and orchestrates inference engines — vLLM, SGLang, TensorRT-LLM, or your own — to optimize performance across GPU clusters.
			
 
				 
			
 
				 <div class="grid cards" markdown>
			
 
				 
			
@@ -47,7 +47,7 @@ GPUStack is an open-source GPU cluster manager designed for efficient AI model d
 
				 
			
 
				     ---
			
 
				 
			
 
				-    GPUStack's pluggable engine architecture enables you to deploy new models on the day they are released.
			
 
				+    MASS-Base's pluggable engine architecture enables you to deploy new models on the day they are released.
			
 
				 
			
 
				 -   :material-speedometer:{ .lg .middle .icon-red } __Performance-Optimized__
			
 
				 
			
@@ -65,15 +65,15 @@ GPUStack is an open-source GPU cluster manager designed for efficient AI model d
 
				 
			
 
				 ## Architecture
			
 
				 
			
 
				-GPUStack enables development teams, IT organizations, and service providers to deliver Model-as-a-Service at scale. It supports industry-standard APIs for LLM, voice, image, and video models. The platform includes built-in user authentication and access control, real-time monitoring of GPU performance and utilization, and detailed metering of token usage and API request rates.
			
 
				+MASS-Base enables development teams, IT organizations, and service providers to deliver Model-as-a-Service at scale. It supports industry-standard APIs for LLM, voice, image, and video models. The platform includes built-in user authentication and access control, real-time monitoring of GPU performance and utilization, and detailed metering of token usage and API request rates.
			
 
				 
			
 
				-The figure below illustrates how a single GPUStack server can manage multiple GPU clusters across both on-premises and cloud environments. The GPUStack scheduler allocates GPUs to maximize resource utilization and selects the appropriate inference engines for optimal performance. Administrators also gain full visibility into system health and metrics through integrated Grafana and Prometheus dashboards.
			
 
				+The figure below illustrates how a single MASS-Base server can manage multiple GPU clusters across both on-premises and cloud environments. The MASS-Base scheduler allocates GPUs to maximize resource utilization and selects the appropriate inference engines for optimal performance. Administrators also gain full visibility into system health and metrics through integrated Grafana and Prometheus dashboards.
			
 
				 
			
 
				 ![gpustack-v2-architecture](assets/gpustack-v2-architecture.png)
			
 
				 
			
 
				 ## Optimized Inference Performance
			
 
				 
			
 
				-GPUStack's automated engine selection and parameter optimization deliver strong inference performance out of the box. The following figure shows throughput improvements over default vLLM configurations:
			
 
				+MASS-Base's automated engine selection and parameter optimization deliver strong inference performance out of the box. The following figure shows throughput improvements over default vLLM configurations:
			
 
				 
			
 
				 ![a100-throughput-comparison](assets/a100-throughput-comparison.png)
			
 
				 
			
@@ -81,7 +81,7 @@ For detailed benchmarking methods and results, visit our [Inference Performance
 
				 
			
 
				 ## Supported Accelerators
			
 
				 
			
 
				-GPUStack supports a wide range of accelerators for AI inference:
			
 
				+MASS-Base supports a wide range of accelerators for AI inference:
			
 
				 
			
 
				 <div class="logo-tile-grid">
			
 
				     <div class="logo-tile" data-tooltip="NVIDIA GPU">
			
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -1,17 +1,17 @@
 
				 # Quickstart
			
 
				 
			
 
				-This guide will walk you through running GPUStack on your own self-hosted GPU servers. To use [cloud GPUs](./tutorials/adding-gpucluster-using-digitalocean.md), or integrating with an [existing Kubernetes cluster](./tutorials/adding-gpucluster-using-kubernetes.md), see the relevant tutorials.
			
 
				+This guide will walk you through running MASS-Base on your own self-hosted GPU servers. To use [cloud GPUs](./tutorials/adding-gpucluster-using-digitalocean.md), or integrating with an [existing Kubernetes cluster](./tutorials/adding-gpucluster-using-kubernetes.md), see the relevant tutorials.
			
 
				 
			
 
				 !!! info "Prerequisites"
			
 
				 
			
 
				-    1. A node with at least one NVIDIA GPU. For other GPU types, please check the guidelines in the GPUStack UI when adding a worker, or refer to the [Installation documentation](./installation/requirements.md) for more details.
			
 
				+    1. A node with at least one NVIDIA GPU. For other GPU types, please check the guidelines in the MASS-Base UI when adding a worker, or refer to the [Installation documentation](./installation/requirements.md) for more details.
			
 
				     2. Ensure the NVIDIA driver, [Docker](https://docs.docker.com/engine/install/) and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) are installed on the worker node.
			
 
				-    3. **(Optional)** A CPU node for hosting the GPUStack server. The GPUStack server does not require a GPU and can run on a CPU-only machine. [Docker](https://docs.docker.com/engine/install/) must be installed. Docker Desktop (for Windows and macOS) is also supported. If no dedicated CPU node is available, the GPUStack server can be installed on the same machine as a GPU worker node.
			
 
				-    4. Only Linux is supported for GPUStack worker nodes. If you use Windows, consider using WSL2 and avoid using Docker Desktop. macOS is not supported for GPUStack worker nodes.
			
 
				+    3. **(Optional)** A CPU node for hosting the MASS-Base server. The MASS-Base server does not require a GPU and can run on a CPU-only machine. [Docker](https://docs.docker.com/engine/install/) must be installed. Docker Desktop (for Windows and macOS) is also supported. If no dedicated CPU node is available, the MASS-Base server can be installed on the same machine as a GPU worker node.
			
 
				+    4. Only Linux is supported for MASS-Base worker nodes. If you use Windows, consider using WSL2 and avoid using Docker Desktop. macOS is not supported for MASS-Base worker nodes.
			
 
				 
			
 
				-## Install GPUStack
			
 
				+## Install MASS-Base
			
 
				 
			
 
				-Run the following command to install and start the GPUStack server using [Docker](https://docs.docker.com/engine/install/):
			
 
				+Run the following command to install and start the MASS-Base server using [Docker](https://docs.docker.com/engine/install/):
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack \
			
@@ -34,23 +34,23 @@ sudo docker run -d --name gpustack \
 
				         --system-default-container-registry quay.io
			
 
				     ```
			
 
				 
			
 
				-Check the GPUStack startup logs:
			
 
				+Check the MASS-Base startup logs:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker logs -f gpustack
			
 
				 ```
			
 
				 
			
 
				-After GPUStack starts, run the following command to get the default admin password:
			
 
				+After MASS-Base starts, run the following command to get the default admin password:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker exec gpustack cat /var/lib/gpustack/initial_admin_password
			
 
				 ```
			
 
				 
			
 
				-Open your browser and navigate to `http://your_host_ip` to access the GPUStack UI. Use the default username `admin` and the password you retrieved above to log in.
			
 
				+Open your browser and navigate to `http://your_host_ip` to access the MASS-Base UI. Use the default username `admin` and the password you retrieved above to log in.
			
 
				 
			
 
				 ## Set Up a GPU Cluster
			
 
				 
			
 
				-1. On the GPUStack UI, navigate to the `Clusters` page.
			
 
				+1. On the MASS-Base UI, navigate to the `Clusters` page.
			
 
				 
			
 
				 2. Click the `Add Cluster` button.
			
 
				 
			
@@ -74,13 +74,13 @@ sudo docker run -d --name gpustack-worker \
 
				       --advertise-address 192.168.1.2
			
 
				 ```
			
 
				 
			
 
				-6. Execute the command on the worker node to connect it to the GPUStack server.
			
 
				+6. Execute the command on the worker node to connect it to the MASS-Base server.
			
 
				 
			
 
				-7. After the worker node connects successfully, it will appear on the `Workers` page in the GPUStack UI.
			
 
				+7. After the worker node connects successfully, it will appear on the `Workers` page in the MASS-Base UI.
			
 
				 
			
 
				 ## Deploy a Model
			
 
				 
			
 
				-1. Navigate to the `Catalog` page in the GPUStack UI.
			
 
				+1. Navigate to the `Catalog` page in the MASS-Base UI.
			
 
				 
			
 
				 2. Select the `Qwen3-0.6B` model from the list of available models.
			
 
				 
			
@@ -92,7 +92,7 @@ sudo docker run -d --name gpustack-worker \
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    GPUStack uses containers to run models. The first-time model deployment may take some time to download the model files and container images. You can click `View Logs` in the UI to monitor the deployment progress.
			
 
				+    MASS-Base uses containers to run models. The first-time model deployment may take some time to download the model files and container images. You can click `View Logs` in the UI to monitor the deployment progress.
			
 
				 
			
 
				 ![model is running](assets/quick-start/model-running.png)
			
 
				 
			
@@ -108,7 +108,7 @@ sudo docker run -d --name gpustack-worker \
 
				 
			
 
				 3. Copy the generated API key and save it somewhere safe. Please note that you can only see it once on creation.
			
 
				 
			
 
				-4. You can now use the API key to access the OpenAI-compatible API endpoints provided by GPUStack. For example, use curl as the following:
			
 
				+4. You can now use the API key to access the OpenAI-compatible API endpoints provided by MASS-Base. For example, use curl as the following:
			
 
				 
			
 
				 ```bash
			
 
				 # Replace `your_api_key` and `your_gpustack_server_url`
			
@@ -135,4 +135,4 @@ curl http://your_gpustack_server_url/v1/chat/completions \
 
				 
			
 
				 ## Cleanup
			
 
				 
			
 
				-After you complete using the deployed model, you can go to the `Deployments` page in the GPUStack UI and delete the model to free up resources.
			
 
				+After you complete using the deployed model, you can go to the `Deployments` page in the MASS-Base UI and delete the model to free up resources.
			
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -1,8 +1,8 @@
 
				 # Troubleshooting
			
 
				 
			
 
				-## View GPUStack Logs
			
 
				+## View MASS-Base Logs
			
 
				 
			
 
				-You can view GPUStack logs with the following commands for the default setup:
			
 
				+You can view MASS-Base logs with the following commands for the default setup:
			
 
				 
			
 
				 ```bash
			
 
				 docker logs -f gpustack
			
@@ -10,7 +10,7 @@ docker logs -f gpustack
 
				 
			
 
				 ## Enable Debug Mode
			
 
				 
			
 
				-You can enable the `DEBUG` mode by setting the `--debug` flag when running GPUStack:
			
 
				+You can enable the `DEBUG` mode by setting the `--debug` flag when running MASS-Base:
			
 
				 
			
 
				 ```diff
			
 
				 sudo docker run -d --name gpustack \
			
@@ -20,7 +20,7 @@ sudo docker run -d --name gpustack \
 
				     ...
			
 
				 ```
			
 
				 
			
 
				-You can also enable GPUStack's debug mode at runtime by running the following command inside the **server container**:
			
 
				+You can also enable MASS-Base's debug mode at runtime by running the following command inside the **server container**:
			
 
				 
			
 
				 ```bash
			
 
				 gpustack reload-config --set debug=true
			
@@ -28,13 +28,13 @@ gpustack reload-config --set debug=true
 
				 
			
 
				 ## Configure Log Level
			
 
				 
			
 
				-You can configure log level of the GPUStack server at runtime by running the following command inside the **server container**:
			
 
				+You can configure log level of the MASS-Base server at runtime by running the following command inside the **server container**:
			
 
				 
			
 
				 ```bash
			
 
				 curl -X PUT http://localhost/debug/log_level -d "debug"
			
 
				 ```
			
 
				 
			
 
				-The same applies to GPUStack workers:
			
 
				+The same applies to MASS-Base workers:
			
 
				 
			
 
				 ```bash
			
 
				 curl -X PUT http://localhost:10150/debug/log_level -d "debug"
			
@@ -50,7 +50,7 @@ In case you forgot the admin password, you can reset it by running the following
 
				 gpustack reset-admin-password
			
 
				 ```
			
 
				 
			
 
				-If you changed the default port using `--port` when starting GPUStack, specify the GPUStack URL using the `--server-url` parameter. It must be run locally on the server and accessed via `localhost`:
			
 
				+If you changed the default port using `--port` when starting MASS-Base, specify the MASS-Base URL using the `--server-url` parameter. It must be run locally on the server and accessed via `localhost`:
			
 
				 
			
 
				 ```bash
			
 
				 gpustack reset-admin-password --server-url http://localhost:9090
			
@@ -58,9 +58,9 @@ gpustack reset-admin-password --server-url http://localhost:9090
 
				 
			
 
				 ## Assist in Accelerators Detection Diagnosis
			
 
				 
			
 
				-After successfully deploying the GPUStack Worker as described in the [installation guide](./installation/requirements.md),  
			
 
				+After successfully deploying the MASS-Base Worker as described in the [installation guide](./installation/requirements.md),  
			
 
				 if the Worker fails to detect any devices,  
			
 
				-please enter the corresponding Worker container, run the following command, and report the results to [GPUStack](https://github.com/gpustack/gpustack/issues).
			
 
				+please enter the corresponding Worker container, run the following command, and report the results to [MASS-Base](https://github.com/gpustack/gpustack/issues).
			
 
				 
			
 
				 ```bash
			
 
				 time GPUSTACK_RUNTIME_LOG_LEVEL=debug GPUSTACK_RUNTIME_LOG_EXCEPTION=1 gpustack-runtime detect --format json
			
@@ -69,7 +69,7 @@ time GPUSTACK_RUNTIME_LOG_LEVEL=debug GPUSTACK_RUNTIME_LOG_EXCEPTION=1 gpustack-
 
				 ## Assist in Model Deployment Diagnosis
			
 
				 
			
 
				 If you experience issues after deploying a model, 
			
 
				-please enter the corresponding Worker container, run the following command, and report the results to [GPUStack](https://github.com/gpustack/gpustack/issues).
			
 
				+PLEASE enter the corresponding Worker container, run the following command, and report the results to [MASS-Base](https://github.com/gpustack/gpustack/issues).
			
 
				 
			
 
				 ```bash
			
 
				 gpustack-runtime inspect <model instance name>
			
--- a/docs/tutorials/adding-gpucluster-using-digitalocean.md
+++ b/docs/tutorials/adding-gpucluster-using-digitalocean.md
@@ -10,7 +10,7 @@ You need to sign up for a DigitalOcean account and create a Personal Access Toke
 
				 
			
 
				 > Note: The token scope must be set to Full Access. If you select permissions using Custom Scopes, you may encounter issues deleting droplets.
			
 
				 
			
 
				-When starting the GPUStack Server, you need to specify the `--server-external-url` parameter. This parameter is used to configure the worker's `--server-url` after the droplet is created and the worker is started. If your server is running behind a proxy, please set the proxy address to ensure that droplets running on the public network can access the GPUStack Server API using this address after startup.
			
 
				+When starting the MASS-Base Server, you need to specify the `--server-external-url` parameter. This parameter is used to configure the worker's `--server-url` after the droplet is created and the worker is started. If your server is running behind a proxy, please set the proxy address to ensure that droplets running on the public network can access the MASS-Base Server API using this address after startup.
			
 
				 
			
 
				 ## Create DigitalOcean Cluster
			
 
				 
			
--- a/docs/tutorials/inference-on-cpus.md
+++ b/docs/tutorials/inference-on-cpus.md
@@ -1,6 +1,6 @@
 
				 # Running Inference on CPUs
			
 
				 
			
 
				-GPUStack supports inference on CPUs, offering flexibility when GPU resources are limited or when model sizes exceed allocatable GPU memory. The following CPU inference modes are available:
			
 
				+MASS-Base supports inference on CPUs, offering flexibility when GPU resources are limited or when model sizes exceed allocatable GPU memory. The following CPU inference modes are available:
			
 
				 
			
 
				 - **Hybrid CPU+GPU Inference**: Enables partial acceleration by offloading portions of large models to the CPU when VRAM capacity is insufficient.
			
 
				 - **Full CPU Inference**: Runs entirely on CPU when no GPU resources are available.
			
@@ -9,7 +9,7 @@ GPUStack supports inference on CPUs, offering flexibility when GPU resources are
 
				 
			
 
				     Available for custom backends only.
			
 
				 
			
 
				-    When CPU offloading is enabled, GPUStack will allocate CPU memory if GPU resources are insufficient. You must correctly configure the inference backend to use hybrid CPU+GPU or full CPU inference.
			
 
				+    When CPU offloading is enabled, MASS-Base will allocate CPU memory if GPU resources are insufficient. You must correctly configure the inference backend to use hybrid CPU+GPU or full CPU inference.
			
 
				 
			
 
				     It is strongly recommended to use CPU inference only on CPU workers.
			
 
				 
			
@@ -31,9 +31,9 @@ Execution Command: `--model-id BAAI/bge-large-en-v1.5 --huggingface-hub-cache /v
 
				 
			
 
				     `ghcr.io/huggingface/text-embeddings-inference:cpu-1.8` is the CPU inference image for TEI. See: [TEI Supported Hardware](https://huggingface.co/docs/text-embeddings-inference/supported_models#supported-hardware).
			
 
				 
			
 
				-    `--huggingface-hub-cache /var/lib/gpustack/cache/huggingface` sets the location of the HuggingFace Hub cache for TEI to the path where GPUStack stores downloaded HuggingFace models. The default path is `/var/lib/gpustack/cache/huggingface`. See: [TEI CLI Arguments](https://huggingface.co/docs/text-embeddings-inference/cli_arguments).
			
 
				+    `--huggingface-hub-cache /var/lib/gpustack/cache/huggingface` sets the location of the HuggingFace Hub cache for TEI to the path where MASS-Base stores downloaded HuggingFace models. The default path is `/var/lib/gpustack/cache/huggingface`. See: [TEI CLI Arguments](https://huggingface.co/docs/text-embeddings-inference/cli_arguments).
			
 
				 
			
 
				-    `{{port}}` is a placeholder that represents the port automatically assigned by GPUStack.
			
 
				+    `{{port}}` is a placeholder that represents the port automatically assigned by MASS-Base.
			
 
				 
			
 
				 ![TEI CPU Inference](../assets/tutorials/inference-on-cpus/tei-cpu-inference.png)
			
 
				 
			
--- a/docs/tutorials/inference-with-tool-calling.md
+++ b/docs/tutorials/inference-with-tool-calling.md
@@ -2,7 +2,7 @@
 
				 
			
 
				 Tool calling allows you to connect models to external tools and systems. This is useful for many things such as empowering AI assistants with capabilities, or building deep integrations between your applications and the models.
			
 
				 
			
 
				-In this tutorial, you’ll learn how to set up and use tool calling within GPUStack to extend your AI’s capabilities.
			
 
				+In this tutorial, you’ll learn how to set up and use tool calling within MASS-Base to extend your AI’s capabilities.
			
 
				 
			
 
				 !!! note
			
 
				 
			
@@ -13,7 +13,7 @@ In this tutorial, you’ll learn how to set up and use tool calling within GPUSt
 
				 
			
 
				 Before proceeding, ensure the following:
			
 
				 
			
 
				-- GPUStack is installed and running.
			
 
				+- MASS-Base is installed and running.
			
 
				 - A Linux worker node with a GPU is available. We'll use [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as the model for this tutorial. The model requires a GPU with at least 18GB VRAM.
			
 
				 - Access to Hugging Face for downloading the model files.
			
 
				 
			
@@ -27,7 +27,7 @@ LLMs that support tool calling are marked with the `tools` capability in the cat
 
				 
			
 
				 When you deploy GGUF models using llama-box, tool calling is enabled by default for models that support it.
			
 
				 
			
 
				-1. Navigate to the `Deployments` page in the GPUStack UI and click the `Deploy Model` button. In the dropdown, select `Hugging Face` as the source for your model.
			
 
				+1. Navigate to the `Deployments` page in the MASS-Base UI and click the `Deploy Model` button. In the dropdown, select `Hugging Face` as the source for your model.
			
 
				 2. Enable the `GGUF` checkbox to filter models by GGUF format.
			
 
				 3. Use the search bar to find the `Qwen/Qwen2.5-7B-Instruct-GGUF` model.
			
 
				 4. Click the `Save` button to deploy the model.
			
@@ -38,7 +38,7 @@ When you deploy GGUF models using llama-box, tool calling is enabled by default
 
				 
			
 
				 When you deploy models using vLLM, you need to enable tool calling with additional parameters.
			
 
				 
			
 
				-1. Navigate to the `Deployments` page in the GPUStack UI and click the `Deploy Model` button. In the dropdown, select `Hugging Face` as the source for your model.
			
 
				+1. Navigate to the `Deployments` page in the MASS-Base UI and click the `Deploy Model` button. In the dropdown, select `Hugging Face` as the source for your model.
			
 
				 2. Use the search bar to find the `Qwen/Qwen2.5-7B-Instruct` model.
			
 
				 3. Expand the `Advanced` section in configurations and scroll down to the `Backend Parameters` section.
			
 
				 4. Click on the `Add Parameter` button and add the following parameters:
			
@@ -54,7 +54,7 @@ After deployment, you can monitor the model's status on the `Deployments` page.
 
				 
			
 
				 ## Step 2: Generate an API Key
			
 
				 
			
 
				-We will use the GPUStack API to interact with the model. To do this, you need to generate an API key:
			
 
				+We will use the MASS-Base API to interact with the model. To do this, you need to generate an API key:
			
 
				 
			
 
				 1. Hover over the user avatar and navigate to the `API Keys` page.
			
 
				 2. Click the `New API Key` button.
			
@@ -63,7 +63,7 @@ We will use the GPUStack API to interact with the model. To do this, you need to
 
				 
			
 
				 ## Step 3: Do Inference
			
 
				 
			
 
				-With the model deployed and an API key, you can call the model via the GPUStack API. Here is an example script using `curl` (replace `<your-server-url>` with your GPUStack server URL and `<your-api-key>` with the API key generated in the previous step):
			
 
				+With the model deployed and an API key, you can call the model via the MASS-Base API. Here is an example script using `curl` (replace `<your-server-url>` with your GPUStack server URL and `<your-api-key>` with the API key generated in the previous step):
			
 
				 
			
 
				 ```bash
			
 
				 export GPUSTACK_SERVER_URL=<your-server-url>
			
--- a/docs/tutorials/running-deepseek-r1-671b-with-distributed-vllm.md
+++ b/docs/tutorials/running-deepseek-r1-671b-with-distributed-vllm.md
@@ -2,7 +2,7 @@
 
				 
			
 
				 This tutorial guides you through the process of configuring and running the original **DeepSeek R1 671B** using **Distributed vLLM** on a GPUStack cluster. Due to the extremely large size of the model, distributed inference across multiple workers is usually required.
			
 
				 
			
 
				-GPUStack enables easy setup and orchestration of distributed inference using vLLM, making it possible to run massive models like DeepSeek R1 with minimal manual configuration.
			
 
				+MASS-Base enables easy setup and orchestration of distributed inference using vLLM, making it possible to run massive models like DeepSeek R1 with minimal manual configuration.
			
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
@@ -20,7 +20,7 @@ Before you begin, make sure the following requirements are met:
 
				 
			
 
				 </div>
			
 
				 - High-speed interconnects such as NVLink or InfiniBand are recommended for optimal performance.
			
 
				-- Model files should be downloaded to the same path on each node. While GPUStack supports on-the-fly model downloading, pre-downloading is recommended as it can be time consuming depending on the network speed.
			
 
				+- Model files should be downloaded to the same path on each node. While MASS-Base supports on-the-fly model downloading, pre-downloading is recommended as it can be time consuming depending on the network speed.
			
 
				 
			
 
				 !!! note
			
 
				 
			
@@ -29,7 +29,7 @@ Before you begin, make sure the following requirements are met:
 
				 
			
 
				 ## Step 1: Install GPUStack Server
			
 
				 
			
 
				-According to the [Installation](../installation/installation.md), you can use the following command to start the GPUStack server:
			
 
				+According to the [Installation](../installation/installation.md), you can use the following command to start the MASS-Base server:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack \
			
@@ -45,7 +45,7 @@ sudo docker run -d --name gpustack \
 
				 
			
 
				     - Replace `/path/to/your/model` with the actual path.
			
 
				 
			
 
				-After GPUStack server is up and running, run the following commands to get the initial admin password:
			
 
				+After MASS-Base server is up and running, run the following commands to get the initial admin password:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker exec gpustack \
			
@@ -53,19 +53,19 @@ sudo docker exec gpustack \
 
				 
			
 
				 ```
			
 
				 
			
 
				-## Step 2: Access GPUStack UI
			
 
				+## Step 2: Access MASS-Base UI
			
 
				 
			
 
				-Login to the GPUStack UI using the `admin` user and the obtained password.
			
 
				+Login to the MASS-Base UI using the `admin` user and the obtained password.
			
 
				 
			
 
				 ```
			
 
				 http://your_gpustack_server_ip_or_hostname
			
 
				 ```
			
 
				 
			
 
				-## Step 3: Install GPUStack Workers
			
 
				+## Step 3: Install MASS-Base Workers
			
 
				 
			
 
				-Navigate to the `Workers` page in the GPUStack UI, click `Add Worker` button to get the command for adding workers.
			
 
				+Navigate to the `Workers` page in the MASS-Base UI, click `Add Worker` button to get the command for adding workers.
			
 
				 
			
 
				-And then on **each worker node**, run the worker adding command to start a GPUStack worker:
			
 
				+And then on **each worker node**, run the worker adding command to start a MASS-Base worker:
			
 
				 
			
 
				 ```bash
			
 
				 sudo docker run -d --name gpustack \
			
@@ -87,7 +87,7 @@ sudo docker run -d --name gpustack \
 
				     - Replace the placeholder paths, IP address/hostname, and cluster token accordingly.
			
 
				     - Replace `/path/to/your/model` with the actual path on your system where the DeepSeek R1 model files are stored.
			
 
				 
			
 
				-After all workers are added, return to the GPUStack UI.
			
 
				+After all workers are added, return to the MASS-Base UI.
			
 
				 
			
 
				 Navigate to the `Workers` page to verify that all workers are in the Ready state and their GPUs are listed.
			
 
				 
			
@@ -117,7 +117,7 @@ After the model is running, navigate to the `Workers` page to check GPU utilizat
 
				 
			
 
				 ## Step 6: Run Inference via Playground
			
 
				 
			
 
				-Once the model is deployed and running, you can test it using the GPUStack Playground.
			
 
				+Once the model is deployed and running, you can test it using the MASS-Base Playground.
			
 
				 
			
 
				 1. Navigate to the `Playground` -> `Chat`.
			
 
				 2. If only one model is deployed, it will be selected by default. Otherwise, use the dropdown menu to choose `DeepSeek-R1`.
			
@@ -129,6 +129,6 @@ You can also use the `Compare` tab to test concurrent inference scenarios.
 
				 
			
 
				 ![playground-compare](../assets/tutorials/running-deepseek-r1-671b-with-distributed-vllm/playground-compare.png)
			
 
				 
			
 
				-You have now successfully deployed and run DeepSeek R1 671B using Distributed vLLM on a GPUStack cluster. Explore the model’s performance and capabilities in your own applications.
			
 
				+You have now successfully deployed and run DeepSeek R1 671B using Distributed vLLM on a MASS-Base cluster. Explore the model’s performance and capabilities in your own applications.
			
 
				 
			
 
				-For further assistance, feel free to reach out to the GPUStack community or support team.
			
 
				+For further assistance, feel free to reach out to the MASS-Base community or support team.
			
--- a/docs/tutorials/using-custom-backends.md
+++ b/docs/tutorials/using-custom-backends.md
@@ -1,14 +1,14 @@
 
				 # Using Custom Inference Backends
			
 
				 
			
 
				-This guide explains how to add custom inference backends in GPUStack, including using verified community configurations and creating your own from scratch.
			
 
				+This guide explains how to add custom inference backends in MASS-Base, including using verified community configurations and creating your own from scratch.
			
 
				 
			
 
				 For parameter descriptions, see the [User Guide](../user-guide/inference-backend-management.md).
			
 
				 
			
 
				 ## Backend Types
			
 
				 
			
 
				-GPUStack supports three types of inference backends:
			
 
				+MASS-Base supports three types of inference backends:
			
 
				 
			
 
				-- **Built-in**: Pre-configured backends (vLLM, MindIE, VoxBox, SGLang...) maintained by GPUStack, automatically optimized for different hardware.
			
 
				+- **Built-in**: Pre-configured backends (vLLM, MindIE, VoxBox, SGLang...) maintained by MASS-Base, automatically optimized for different hardware.
			
 
				 - **Community**: Pre-verified custom backend configurations. These are essentially CustomBackends labeled "community" to simplify manual setup.
			
 
				 - **Custom**: Backends you configure yourself with custom Docker images and commands.
			
 
				 
			
--- a/docs/upgrade.md
+++ b/docs/upgrade.md
@@ -1,14 +1,14 @@
 
				 # Upgrade
			
 
				 
			
 
				-You can upgrade GPUStack by pulling and running a newer Docker image.
			
 
				+You can upgrade MASS-Base by pulling and running a newer Docker image.
			
 
				 
			
 
				-The following upgrade instructions apply only to GPUStack v2.0 and later.
			
 
				+The following upgrade instructions apply only to MASS-Base v2.0 and later.
			
 
				 
			
 
				 For installations prior to v0.7, please refer to the [migration guide](migration.md).
			
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    1. When upgrading, upgrade the GPUStack server first, then upgrade the workers.
			
 
				+    1. When upgrading, upgrade the MASS-Base server first, then upgrade the workers.
			
 
				 
			
 
				     2. Please **DO NOT** upgrade from/to the main(dev) version or a release candidate(rc) version, as they may contain breaking changes. Use a fresh installation if you want to try the main or rc versions.
			
 
				 
			
@@ -16,7 +16,7 @@ For installations prior to v0.7, please refer to the [migration guide](migration
 
				 
			
 
				     **Backup First:** Before proceeding with an upgrade, it’s strongly recommended to back up your database.
			
 
				 
			
 
				-    For default installations, stop the GPUStack server and create a backup of the PostgreSQL database directory located inside the container at:
			
 
				+    For default installations, stop the MASS-Base server and create a backup of the PostgreSQL database directory located inside the container at:
			
 
				 
			
 
				     ```
			
 
				     /var/lib/gpustack/postgresql/data
			
--- a/docs/user-guide/benchmarking.md
+++ b/docs/user-guide/benchmarking.md
@@ -1,6 +1,6 @@
 
				 # Benchmarking
			
 
				 
			
 
				-GPUStack can run benchmarks against running model instances. Benchmarks are executed by workers in a dedicated benchmark container image, with results and logs stored on the worker.
			
 
				+MASS-Base can run benchmarks against running model instances. Benchmarks are executed by workers in a dedicated benchmark container image, with results and logs stored on the worker.
			
 
				 
			
 
				 ## Prerequisites
			
 
				 
			
--- a/docs/user-guide/built-in-inference-backends.md
+++ b/docs/user-guide/built-in-inference-backends.md
@@ -1,6 +1,6 @@
 
				 # Built-in Inference Backends
			
 
				 
			
 
				-GPUStack supports the following inference backends:
			
 
				+MASS-Base supports the following inference backends:
			
 
				 
			
 
				 - [vLLM](#vllm)
			
 
				 - [SGLang](#sglang)
			
@@ -31,13 +31,13 @@ vLLM seamlessly supports most state-of-the-art open-source models, including:
 
				 - Embedding Models (e.g. `Qwen3-Embedding`)
			
 
				 - Reranker Models (e.g. `Qwen3-Reranker`)
			
 
				 
			
 
				-By default, GPUStack estimates the VRAM requirement for the model instance based on the model's metadata.
			
 
				+By default, MASS-Base estimates the VRAM requirement for the model instance based on the model's metadata.
			
 
				 
			
 
				 You can customize the parameters to fit your needs. The following vLLM parameters might be useful:
			
 
				 
			
 
				 - `--gpu-memory-utilization` (default: 0.9): The fraction of GPU memory to use for the model instance.
			
 
				-- `--max-model-len`: Model context length. For large-context models, GPUStack automatically sets this parameter to `8192` to simplify model deployment, especially in resource constrained environments. You can customize this parameter to fit your needs.
			
 
				-- `--tensor-parallel-size`: Number of tensor parallel replicas. By default, GPUStack sets this parameter given the GPU resources available and the estimation of the model's memory requirement. You can customize this parameter to fit your needs.
			
 
				+- `--max-model-len`: Model context length. For large-context models, MASS-Base automatically sets this parameter to `8192` to simplify model deployment, especially in resource constrained environments. You can customize this parameter to fit your needs.
			
 
				+- `--tensor-parallel-size`: Number of tensor parallel replicas. By default, MASS-Base sets this parameter given the GPU resources available and the estimation of the model's memory requirement. You can customize this parameter to fit your needs.
			
 
				 
			
 
				 For more details, please refer to [vLLM CLI Reference](https://docs.vllm.ai/en/stable/cli/serve/).
			
 
				 
			
@@ -56,11 +56,11 @@ Please refer to the vLLM [documentation](https://docs.vllm.ai/en/stable/models/s
 
				 - **Video Tasks**: Video generation and editing (e.g., `Wan2.2`)
			
 
				 - **Audio Tasks**: Speech synthesis, voice cloning, and more (e.g., `Qwen3-TTS`)
			
 
				 
			
 
				-GPUStack integrates with vLLM-Omni to deliver a seamless experience for deploying and managing omni-modal models. When a model is deployed via the vLLM backend, GPUStack automatically detects whether it is omni-modal based on its metadata and sets the required parameters for vLLM-Omni.
			
 
				+MASS-Base integrates with vLLM-Omni to deliver a seamless experience for deploying and managing omni-modal models. When a model is deployed via the vLLM backend, GPUStack automatically detects whether it is omni-modal based on its metadata and sets the required parameters for vLLM-Omni.
			
 
				 
			
 
				 #### Distributed Inference Across Workers (Experimental)
			
 
				 
			
 
				-vLLM supports distributed inference across multiple workers using [Ray](https://ray.io). You can enable a Ray cluster in GPUStack by checking the `Allow Distributed Inference Across Workers` option when deploying a model. This allows vLLM to run distributed inference across multiple workers.
			
 
				+vLLM supports distributed inference across multiple workers using [Ray](https://ray.io). You can enable a Ray cluster in MASS-Base by checking the `Allow Distributed Inference Across Workers` option when deploying a model. This allows vLLM to run distributed inference across multiple workers.
			
 
				 
			
 
				 !!! warning "Known Limitations"
			
 
				 
			
@@ -86,15 +86,15 @@ See the full list of supported parameters for vLLM [here](https://docs.vllm.ai/e
 
				 
			
 
				 It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.
			
 
				 
			
 
				-By default, GPUStack estimates the VRAM requirement for the model instance based on model metadata.
			
 
				+By default, MASS-Base estimates the VRAM requirement for the model instance based on model metadata.
			
 
				 
			
 
				-When needed, GPUStack also sets several parameters automatically for large-context models. Common SGLang parameters include:
			
 
				+When needed, MASS-Base also sets several parameters automatically for large-context models. Common SGLang parameters include:
			
 
				 
			
 
				 - `--mem-fraction-static` (default: `0.9`): The per-GPU allocatable VRAM fraction. The scheduler uses this value for resource matching and candidate selection. You can override it via the model's `backend_parameters`.
			
 
				-- `--context-length`: Model context length. For large-context models, if the automatically estimated context length exceeds device capability, GPUStack sets this parameter to `8192` to simplify deployment in resource-constrained environments. You can customize this parameter as needed.
			
 
				-- `--tp-size`: Tensor parallel size. When not explicitly provided, GPUStack infers and sets this parameter based on the selected GPUs.
			
 
				-- `--pp-size`: Pipeline parallel size. In multi-node deployments, GPUStack determines a combination of `--tp-size` and `--pp-size` according to the model and cluster configuration.
			
 
				-- Multi-node arguments: `--nnodes`, `--node-rank`, `--dist-init-addr`. When distributed inference is enabled, GPUStack injects these arguments to initialize multi-node communication.
			
 
				+- `--context-length`: Model context length. For large-context models, if the automatically estimated context length exceeds device capability, MASS-Base sets this parameter to `8192` to simplify deployment in resource-constrained environments. You can customize this parameter as needed.
			
 
				+- `--tp-size`: Tensor parallel size. When not explicitly provided, MASS-Base infers and sets this parameter based on the selected GPUs.
			
 
				+- `--pp-size`: Pipeline parallel size. In multi-node deployments, MASS-Base determines a combination of `--tp-size` and `--pp-size` according to the model and cluster configuration.
			
 
				+- Multi-node arguments: `--nnodes`, `--node-rank`, `--dist-init-addr`. When distributed inference is enabled, MASS-Base injects these arguments to initialize multi-node communication.
			
 
				 
			
 
				 For more details, please refer to [SGLang documentation](https://docs.sglang.ai/index.html).
			
 
				 
			
@@ -108,7 +108,7 @@ SGLang also supports image models. The ones we have verified include: Qwen-Image
 
				 
			
 
				 #### Distributed Inference Across Workers (Experimental)
			
 
				 
			
 
				-You can enable distributed SGLang inference across multiple workers in GPUStack.
			
 
				+You can enable distributed SGLang inference across multiple workers in MASS-Base.
			
 
				 
			
 
				 !!! warning "Known Limitations"
			
 
				 
			
@@ -151,7 +151,7 @@ See the full list of supported parameters for SGLang [here](https://docs.sglang.
 
				 
			
 
				 MindIE supports various models listed [here](https://www.hiascend.com/software/mindie/modellist).
			
 
				 
			
 
				-Within GPUStack, support [large language models (LLMs)](https://www.hiascend.com/software/mindie/modellist) and [multimodal language models (VLMs)](https://www.hiascend.com/software/mindie/modellist).
			
 
				+Within MASS-Base, support [large language models (LLMs)](https://www.hiascend.com/software/mindie/modellist) and [multimodal language models (VLMs)](https://www.hiascend.com/software/mindie/modellist).
			
 
				 
			
 
				 However, _embedding models_ and _multimodal generation models_ are not supported yet.
			
 
				 
			
@@ -159,7 +159,7 @@ However, _embedding models_ and _multimodal generation models_ are not supported
 
				 
			
 
				 MindIE owns a variety of features outlined [here](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_llm0001.html).
			
 
				 
			
 
				-At present, GPUStack supports a subset of these capabilities, including
			
 
				+At present, MASS-Base supports a subset of these capabilities, including
			
 
				 [Quantization](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_llm0279.html),
			
 
				 [Extending Context Size](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_llm0295.html),
			
 
				 [Distributed Inference](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_llm0296.html),
			
@@ -189,7 +189,7 @@ At present, GPUStack supports a subset of these capabilities, including
 
				 
			
 
				 MindIE has configurable [parameters](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_service0285.html) and [environment variables](https://www.hiascend.com/document/detail/zh/mindie/22RC1/mindiellm/llmdev/mindie_llm0416.html).
			
 
				 
			
 
				-To avoid directly configuring JSON, GPUStack provides a set of command line parameters as below.
			
 
				+To avoid directly configuring JSON, MASS-Base provides a set of command line parameters as below.
			
 
				 
			
 
				 | Parameter                                            | Default | Range                    | Scope                                  | Description                                                                                                                                                                                                                                                                     |
			
 
				 |------------------------------------------------------|---------|--------------------------|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
			
@@ -253,9 +253,9 @@ To avoid directly configuring JSON, GPUStack provides a set of command line para
 
				 
			
 
				 !!! note
			
 
				 
			
 
				-    GPUStack allows users to inject custom environment variables during model deployment, however, some variables may be conflicted with GPUStack managment.
			
 
				+    MASS-Base allows users to inject custom environment variables during model deployment, however, some variables may be conflicted with MASS-Base managment.
			
 
				 
			
 
				-    Hence, GPUStack will override/prevent those variables. Please compare the model instance logs' output with your expectations.
			
 
				+    Hence, MASS-Base will override/prevent those variables. Please compare the model instance logs' output with your expectations.
			
 
				 
			
 
				 ## VoxBox
			
 
				 
			
--- a/docs/user-guide/cloud-credential-management.md
+++ b/docs/user-guide/cloud-credential-management.md
@@ -1,6 +1,6 @@
 
				 # Cloud Credential Management
			
 
				 
			
 
				-GPUStack supports cloud credential management, allowing secure connections to external cloud providers. Cloud credentials contain provider information, keys, and options required for API access.
			
 
				+MASS-Base supports cloud credential management, allowing secure connections to external cloud providers. Cloud credentials contain provider information, keys, and options required for API access.
			
 
				 
			
 
				 ## Supported Providers
			
 
				 
			
--- a/docs/user-guide/cluster-management.md
+++ b/docs/user-guide/cluster-management.md
@@ -1,6 +1,6 @@
 
				 # Cluster Management
			
 
				 
			
 
				-GPUStack supports cluster-based worker management and provides multiple cluster types. You can provision a cluster through a `Cloud Provider` such as `DigitalOcean`, or create a self-hosted cluster and add workers using `Docker` run commands. Alternatively, you can register all nodes in a self-hosted `Kubernetes` cluster as GPUStack workers.
			
 
				+MASS-Base supports cluster-based worker management and provides multiple cluster types. You can provision a cluster through a `Cloud Provider` such as `DigitalOcean`, or create a self-hosted cluster and add workers using `Docker` run commands. Alternatively, you can register all nodes in a self-hosted `Kubernetes` cluster as MASS-Base workers.
			
 
				 
			
 
				 ## Create Cluster
			
 
				 
			
@@ -47,7 +47,7 @@ The kubernetes can be registerred after the cluster is created.
 
				 
			
 
				 ### Creating DigitalOcean Cluster
			
 
				 
			
 
				-1. In the `Basic Configuration` step, the `Name` field is required and `Description` is optional. Create or select a Cloud Credential for communicating with the DigitalOcean API. Select a Region that supports GPU Droplets. You must also configure the `GPUStack Server URL`, which will be accessible from the newly created DigitalOcean Droplets.
			
 
				+1. In the `Basic Configuration` step, the `Name` field is required and `Description` is optional. Create or select a Cloud Credential for communicating with the DigitalOcean API. Select a Region that supports GPU Droplets. You must also configure the `MASS-Base Server URL`, which will be accessible from the newly created DigitalOcean Droplets.
			
 
				 2. Click `Next`.
			
 
				 3. Adding one or more `Worker Pools`. For each pool, `Name`, `Instance Type`, `OS Image`, `Replicas`, `Batch Size`, `Labels` and `Volumes` can be specified.
			
 
				 4. Click `Save` after the worker pools are configured.
			
@@ -120,4 +120,4 @@ huggingface_token: xxxxxx
 
				 enable_hf_transfer: false
			
 
				 ```
			
 
				 
			
 
				-The above YAML lists all currently supported options for the `Worker Configuration YAML`. For the meaning of each option, refer to the full GPUStack [config file documentation](../cli-reference/start.md#config-file).
			
 
				+The above YAML lists all currently supported options for the `Worker Configuration YAML`. For the meaning of each option, refer to the full MASS-Base [config file documentation](../cli-reference/start.md#config-file).
			
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,9 +1,9 @@
 
				 # Project information
			
 
				-site_name: GPUStack
			
 
				+site_name: MASS-Base
			
 
				 site_url: https://docs.gpustack.ai
			
 
				-site_author: GPUStack.ai
			
 
				+site_author: MASS-Base
			
 
				 site_description: >-
			
 
				-  GPUStack is an open-source GPU cluster manager designed for efficient AI model deployment.
			
 
				+  MASS-Base is an open-source GPU cluster manager designed for efficient AI model deployment.
			
 
				   It lets you run models efficiently on your own GPU hardware by choosing the best inference engines,
			
 
				   scheduling GPU resources, analyzing model architectures, and automatically configuring deployment parameters.