há 2 semanas atrás · 490af8e669
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,94 @@
 
				+# CLAUDE.md
			
 
				+
			
 
				+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
			
 
				+
			
 
				+## Project Overview
			
 
				+
			
 
				+GPUStack is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control.
			
 
				+
			
 
				+**Tech stack:** Python 3.10–3.12, FastAPI, SQLModel, Pydantic, uv (package manager), hatchling (build), Alembic (migrations), pytest, Higress (API gateway).
			
 
				+
			
 
				+## Code Architecture
			
 
				+
			
 
				+```
			
 
				+gpustack/
			
 
				+├── api/            # REST API layer (auth, middlewares, tenant, OpenAI extensions)
			
 
				+├── client/         # Generated + custom HTTP clients for server/worker communication
			
 
				+├── cloud_providers/ # Cloud provider integrations (DigitalOcean, etc.)
			
 
				+├── cmd/            # CLI subcommands (version, db migration, admin reset, etc.)
			
 
				+├── codegen/        # OpenAPI client code generation
			
 
				+├── config/         # Configuration and registration logic
			
 
				+├── detectors/      # GPU/device detection (fastfetch, runtime, custom)
			
 
				+├── envs/           # Environment variable management
			
 
				+├── exporter/       # Prometheus metrics exporting
			
 
				+├── gateway/        # Higress AI gateway integration (routing, plugins, k8s CRDs)
			
 
				+├── http_proxy/     # Load balancing and proxy strategies
			
 
				+├── k8s/            # Kubernetes manifest templates
			
 
				+├── migrations/     # Alembic database migrations
			
 
				+├── mixins/         # SQLAlchemy mixins (active record, timestamps)
			
 
				+├── policies/       # Scheduling policies (resource fit selectors for various backends)
			
 
				+├── routes/         # HTTP route handlers
			
 
				+├── schemas/        # Database models / SQLModel schemas
			
 
				+├── server/         # Server components (scheduler, controllers, API server)
			
 
				+├── worker/         # Worker components (runtime, serving manager, metric exporter)
			
 
				+├── websocket_proxy/ # WebSocket proxying
			
 
				+├── main.py         # Entry point (`gpustack` CLI command)
			
 
				+└── security.py     # Security utilities
			
 
				+```
			
 
				+
			
 
				+**Key components:**
			
 
				+- **Server:** API Server (FastAPI) + Scheduler + Controllers. Handles model instance assignment and resource state management.
			
 
				+- **Worker:** GPUStack Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes.
			
 
				+- **AI Gateway:** Uses Higress for API routing and load balancing.
			
 
				+- **Database:** Embedded PostgreSQL by default; external PostgreSQL/MySQL supported. Alembic for migrations under `gpustack/migrations/`.
			
 
				+
			
 
				+## Commands
			
 
				+
			
 
				+### Prerequisites
			
 
				+
			
 
				+- Python 3.10–3.12
			
 
				+- `uv` package manager (auto-installed via `make install`)
			
 
				+- A database (PostgreSQL or MySQL) for development
			
 
				+
			
 
				+### Development Commands
			
 
				+
			
 
				+| Command | Description |
			
 
				+|---------|-------------|
			
 
				+| `make install` | Install uv, sync dependencies, setup pre-commit hooks |
			
 
				+| `make deps` | Sync and lock dependencies with uv |
			
 
				+| `make generate` | Generate code (OpenAPI client, etc.) |
			
 
				+| `make lint` | Run pre-commit checks (flake8, black, etc.) |
			
 
				+| `make test` | Run pytest |
			
 
				+| `make build` | Build wheel package (outputs to `dist/`) |
			
 
				+| `make build-docs` | Build documentation (Linux/macOS only) |
			
 
				+| `make serve-docs` | Serve documentation locally (Linux/macOS only) |
			
 
				+| `make package` | Build container images (Linux/macOS only) |
			
 
				+| `make ci` | Full CI pipeline: install → deps → lint → test → build |
			
 
				+
			
 
				+### Running Locally
			
 
				+
			
 
				+```bash
			
 
				+# Start in disabled gateway mode for development
			
 
				+uv run gpustack start --database-url postgresql://postgres:mysecretpassword@localhost:5432/postgres --gateway-mode disabled --api-port 80
			
 
				+```
			
 
				+
			
 
				+### Adding Dependencies
			
 
				+
			
 
				+```bash
			
 
				+uv add <package>          # runtime dependency
			
 
				+uv add --dev <package>    # dev/test dependency
			
 
				+```
			
 
				+
			
 
				+### Running a Single Test
			
 
				+
			
 
				+```bash
			
 
				+uv run pytest tests/path/to/test_file.py -k test_name
			
 
				+```
			
 
				+
			
 
				+## Important Notes
			
 
				+
			
 
				+- The project uses `uv` for dependency management (not pip directly). `pyproject.toml` is the source of truth.
			
 
				+- Database migrations live in `gpustack/migrations/versions/`. Use Alembic for schema changes.
			
 
				+- The UI is downloaded at install time from a CDN — not committed to the repo.
			
 
				+- Windows support exists via `hack/windows/*.ps1` scripts, but worker nodes require Linux.
			
 
				+- Community inference backends are pulled from `gpustack/community-inference-backends` repo during `make install`.
			
--- a/README.md
+++ b/README.md
@@ -1,234 +1,125 @@
 
				-<br>
			
 
				+# MASS-Base
			
 
				 
			
 
				-<p align="center">
			
 
				-    <img alt="GPUStack" src="https://raw.githubusercontent.com/gpustack/gpustack/main/docs/assets/gpustack-logo.png" width="300px"/>
			
 
				-</p>
			
 
				-<br>
			
 
				+MASS-Base 是一个开源的模型服务（Model-as-a-Service）基础平台，用于高效管理和调度 AI 模型推理服务。它支持多种推理引擎（vLLM、SGLang、TensorRT-LLM 等），可跨多节点进行性能优化与资源编排。
			
 
				 
			
 
				-<p align="center">
			
 
				-    <a href="https://docs.gpustack.ai" target="_blank">
			
 
				-        <img alt="Documentation" src="https://img.shields.io/badge/Docs-GPUStack-blue?logo=readthedocs&logoColor=white"></a>
			
 
				-    <a href="./LICENSE" target="_blank">
			
 
				-        <img alt="License" src="https://img.shields.io/github/license/gpustack/gpustack?logo=github&logoColor=white&label=License&color=blue"></a>
			
 
				-    <a href="./docs/assets/wechat-group-qrcode.jpg" target="_blank">
			
 
				-        <img alt="WeChat" src="https://img.shields.io/badge/WeChat-GPUStack-blue?logo=wechat&logoColor=white"></a>
			
 
				-    <a href="https://discord.gg/VXYJzuaqwD" target="_blank">
			
 
				-        <img alt="Discord" src="https://img.shields.io/badge/Discord-GPUStack-blue?logo=discord&logoColor=white"></a>
			
 
				-    <a href="https://twitter.com/intent/follow?screen_name=gpustack_ai" target="_blank">
			
 
				-        <img alt="Follow on X(Twitter)" src="https://img.shields.io/twitter/follow/gpustack_ai?logo=X"></a>
			
 
				-</p>
			
 
				-<br>
			
 
				+## 核心特性
			
 
				 
			
 
				-<p align="center">
			
 
				-  <a href="./README.md">English</a> |
			
 
				-  <a href="./README_CN.md">简体中文</a> |
			
 
				-  <a href="./README_JP.md">日本語</a>
			
 
				-</p>
			
 
				+- **多集群管理**：统一管理多个环境中的计算节点，支持本地服务器和云平台。
			
 
				+- **可插拔推理引擎**：自动配置 vLLM、SGLang、TensorRT-LLM 等高性能推理引擎，也支持自定义引擎接入。
			
 
				+- **开箱即用的模型部署**：新模型发布即可快速部署。
			
 
				+- **性能优化配置**：内置低延迟与高吞吐预调优模式，支持扩展 KV Cache（如 LMCache、HiCache）以降低 TTFT，并内置投机解码（EAGLE3、MTP、N-grams）支持。
			
 
				+- **企业级运维能力**：支持自动故障恢复、负载均衡、监控、认证与访问控制。
			
 
				 
			
 
				-<br>
			
 
				+## 架构
			
 
				 
			
 
				-## Overview
			
 
				+MASS-Base 由以下核心组件构成：
			
 
				 
			
 
				-GPUStack is an open-source GPU cluster manager designed for efficient AI model deployment. It configures and orchestrates inference engines — vLLM, SGLang, TensorRT-LLM, or your own — to optimize performance across GPU clusters. Its core features include:
			
 
				-- **Multi-Cluster GPU Management.** Manages GPU clusters across multiple environments. This includes on-premises servers, Kubernetes clusters, and cloud providers.
			
 
				-- **Pluggable Inference Engines.** Automatically configures high-performance inference engines such as vLLM, SGLang, and TensorRT-LLM. You can also add custom inference engines as needed.
			
 
				-- **Day 0 Model Support.** GPUStack's pluggable engine architecture enables you to deploy new models on the day they are released.
			
 
				-- **Performance-Optimized Configurations.** Offers pre-tuned modes for low latency or high throughput. GPUStack supports extended KV cache systems like LMCache and HiCache to reduce TTFT. It also includes built-in support for speculative decoding methods such as EAGLE3, MTP, and N-grams.
			
 
				-- **Enterprise-Grade Operations.** Offers support for automated failure recovery, load balancing, monitoring, authentication, and access control.
			
 
				+- **API Server**：基于 FastAPI 构建的 RESTful 接口层，处理认证与授权。
			
 
				+- **Scheduler**：负责将模型实例调度分配到工作节点。
			
 
				+- **Controllers**：管理系统资源状态，处理模型实例的扩缩容。
			
 
				+- **Worker**：检测 GPU 设备，管理模型实例的生命周期并导出性能指标。
			
 
				+- **AI Gateway**：基于 Higress 构建，负责 API 路由与负载均衡。
			
 
				+- **SQL Database**：默认使用嵌入式 PostgreSQL，也支持外部 PostgreSQL 或 MySQL。
			
 
				 
			
 
				-## Architecture
			
 
				+![architecture](docs/assets/gpustack-v2-architecture.png)
			
 
				 
			
 
				-GPUStack enables development teams, IT organizations, and service providers to deliver Model-as-a-Service at scale. It supports industry-standard APIs for LLM, voice, image, and video models. The platform includes built-in user authentication and access control, real-time monitoring of GPU performance and utilization, and detailed metering of token usage and API request rates.
			
 
				+## 快速开始
			
 
				 
			
 
				-The figure below illustrates how a single GPUStack server can manage multiple GPU clusters across both on-premises and cloud environments. The GPUStack scheduler allocates GPUs to maximize resource utilization and selects the appropriate inference engines for optimal performance. Administrators also gain full visibility into system health and metrics through integrated Grafana and Prometheus dashboards.
			
 
				+### 前置要求
			
 
				 
			
 
				-![gpustack-v2-architecture](docs/assets/gpustack-v2-architecture.png)
			
 
				+1. 至少一台 Linux 节点（支持 NVIDIA GPU、AMD GPU、Ascend NPU、Hygon DCU、MThreads GPU、Iluvatar GPU、MetaX GPU、Cambricon MLU、T-Head PPU 等加速器）。
			
 
				+2. 工作节点需安装驱动、[Docker](https://docs.docker.com/engine/install/) 和 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。
			
 
				+3. 服务端可运行在无 GPU 的 CPU 节点上，需安装 Docker。
			
 
				 
			
 
				-## Optimized Inference Performance
			
 
				-
			
 
				-GPUStack's automated engine selection and parameter optimization deliver strong inference performance out of the box. The following figure shows throughput improvements over default vLLM configurations:
			
 
				-
			
 
				-![a100-throughput-comparison](docs/assets/a100-throughput-comparison.png)
			
 
				-
			
 
				-For detailed benchmarking methods and results, visit our [Inference Performance Lab](https://docs.gpustack.ai/latest/performance-lab/overview/).
			
 
				-
			
 
				-## Supported Accelerators
			
 
				-
			
 
				-GPUStack supports a wide range of accelerators for AI inference:
			
 
				-
			
 
				-- **NVIDIA GPU**
			
 
				-- **AMD GPU**
			
 
				-- **Ascend NPU**
			
 
				-- **Hygon DCU**
			
 
				-- **MThreads GPU**
			
 
				-- **Iluvatar GPU**
			
 
				-- **MetaX GPU**
			
 
				-- **Cambricon MLU**
			
 
				-- **T-Head PPU**
			
 
				-
			
 
				-For detailed requirements and setup instructions, see the [Installation Requirements](https://docs.gpustack.ai/latest/installation/requirements/) documentation.
			
 
				-
			
 
				-## Quick Start
			
 
				-
			
 
				-### Prerequisites
			
 
				-
			
 
				-1. A node with at least one NVIDIA GPU. For other GPU types, please check the guidelines in the GPUStack UI when adding a worker, or refer to the [Installation documentation](https://docs.gpustack.ai/latest/installation/requirements/) for more details.
			
 
				-2. Ensure the NVIDIA driver, [Docker](https://docs.docker.com/engine/install/) and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) are installed on the worker node.
			
 
				-3. (Optional) A CPU node for hosting the GPUStack server. The GPUStack server does not require a GPU and can run on a CPU-only machine. [Docker](https://docs.docker.com/engine/install/) must be installed. Docker Desktop (for Windows and macOS) is also supported. If no dedicated CPU node is available, the GPUStack server can be installed on the same machine as a GPU worker node.
			
 
				-4. Only Linux is supported for GPUStack worker nodes. If you use Windows, consider using WSL2 and avoid using Docker Desktop. macOS is not supported for GPUStack worker nodes.
			
 
				-
			
 
				-### Install GPUStack
			
 
				-
			
 
				-Run the following command to install and start the GPUStack server using Docker:
			
 
				+### 安装服务端
			
 
				 
			
 
				 ```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				+sudo docker run -d --name mass-base \
			
 
				     --restart unless-stopped \
			
 
				     -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    gpustack/gpustack
			
 
				+    --volume mass-base-data:/var/lib/mass-base \
			
 
				+    mass-base/mass-base
			
 
				 ```
			
 
				 
			
 
				-<details>
			
 
				-<summary>Alternative: Use Quay Container Registry Mirror</summary>
			
 
				-
			
 
				-If you cannot pull images from `Docker Hub` or the download is very slow, you can use our `Quay.io` mirror by pointing your registry to `quay.io`:
			
 
				+启动后查看日志：
			
 
				 
			
 
				 ```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				-    --restart unless-stopped \
			
 
				-    -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    quay.io/gpustack/gpustack \
			
 
				-    --system-default-container-registry quay.io
			
 
				+sudo docker logs -f mass-base
			
 
				 ```
			
 
				-</details>
			
 
				 
			
 
				-Check the GPUStack startup logs:
			
 
				+获取默认管理员密码：
			
 
				 
			
 
				 ```bash
			
 
				-sudo docker logs -f gpustack
			
 
				+sudo docker exec mass-base cat /var/lib/mass-base/initial_admin_password
			
 
				 ```
			
 
				 
			
 
				-After GPUStack starts, run the following command to get the default admin password:
			
 
				-
			
 
				-```bash
			
 
				-sudo docker exec gpustack cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				-Open your browser and navigate to `http://your_host_ip` to access the GPUStack UI. Use the default username `admin` and the password you retrieved above to log in.
			
 
				-
			
 
				-### Set Up a GPU Cluster
			
 
				-
			
 
				-1. On the GPUStack UI, navigate to the `Clusters` page.
			
 
				-
			
 
				-2. Click the `Add Cluster` button.
			
 
				+在浏览器中访问 `http://your_host_ip`，使用用户名 `admin` 和获取到的密码登录。
			
 
				 
			
 
				-3. Select `Docker` as the cluster provider.
			
 
				+### 部署模型
			
 
				 
			
 
				-4. Fill in the `Name` and `Description` fields for the new cluster, then click the `Save` button.
			
 
				+1. 在 MASS-Base UI 中进入 **Catalog** 页面。
			
 
				+2. 选择可用模型，通过兼容性检查后点击 **Save** 部署。
			
 
				+3. 部署状态变为 **Running** 后即可通过 UI Playground 或 API 调用。
			
 
				 
			
 
				-5. Follow the UI guidelines to configure the new worker node. You will need to run a Docker command on the worker node to connect it to the GPUStack server. The command will look similar to the following:
			
 
				+### 使用 API
			
 
				 
			
 
				-    ```bash
			
 
				-    sudo docker run -d --name gpustack-worker \
			
 
				-          --restart=unless-stopped \
			
 
				-          --privileged \
			
 
				-          --network=host \
			
 
				-          --volume /var/run/docker.sock:/var/run/docker.sock \
			
 
				-          --volume gpustack-data:/var/lib/gpustack \
			
 
				-          --runtime nvidia \
			
 
				-          gpustack/gpustack \
			
 
				-          --server-url http://your_gpustack_server_url \
			
 
				-          --token your_worker_token \
			
 
				-          --advertise-address 192.168.1.2
			
 
				-    ```
			
 
				-
			
 
				-6. Execute the command on the worker node to connect it to the GPUStack server.
			
 
				-
			
 
				-7. After the worker node connects successfully, it will appear on the `Workers` page in the GPUStack UI.
			
 
				-
			
 
				-### Deploy a Model
			
 
				-
			
 
				-1. Navigate to the `Catalog` page in the GPUStack UI.
			
 
				-
			
 
				-2. Select the `Qwen3 0.6B` model from the list of available models.
			
 
				-
			
 
				-3. After the deployment compatibility checks pass, click the `Save` button to deploy the model.
			
 
				-
			
 
				-![deploy qwen3 from catalog](docs/assets/quick-start/quick-start-qwen3.png)
			
 
				-
			
 
				-4. GPUStack will start downloading the model files and deploying the model. When the deployment status shows `Running`, the model has been deployed successfully.
			
 
				-
			
 
				-![model is running](docs/assets/quick-start/model-running.png)
			
 
				-
			
 
				-5. Click `Playground - Chat` in the navigation menu, check that the model `qwen3-0.6b` is selected from the top-right `Model` dropdown. Now you can chat with the model in the UI playground.
			
 
				-
			
 
				-![quick chat](docs/assets/quick-start/quick-chat.png)
			
 
				-
			
 
				-### Use the model via API
			
 
				-
			
 
				-1. Hover over the user avatar and navigate to the `API Keys` page, then click the `New API Key` button.
			
 
				-
			
 
				-2. Fill in the `Name` and click the `Save` button.
			
 
				-
			
 
				-3. Copy the generated API key and save it somewhere safe. Please note that you can only see it once on creation.
			
 
				-
			
 
				-4. You can now use the API key to access the OpenAI-compatible API endpoints provided by GPUStack. For example, use curl as the following:
			
 
				+1. 在 UI 中进入 **API Keys** 页面，创建新的 API Key。
			
 
				+2. 使用 API Key 调用 OpenAI 兼容接口：
			
 
				 
			
 
				 ```bash
			
 
				-# Replace `your_api_key` and `your_gpustack_server_url`
			
 
				-# with your actual API key and GPUStack server URL.
			
 
				-export GPUSTACK_API_KEY=your_api_key
			
 
				-curl http://your_gpustack_server_url/v1/chat/completions \
			
 
				+export MASS_API_KEY=your_api_key
			
 
				+curl http://your_mass_base_server_url/v1/chat/completions \
			
 
				   -H "Content-Type: application/json" \
			
 
				-  -H "Authorization: Bearer $GPUSTACK_API_KEY" \
			
 
				+  -H "Authorization: Bearer $MASS_API_KEY" \
			
 
				   -d '{
			
 
				-    "model": "qwen3-0.6b",
			
 
				+    "model": "your-model-name",
			
 
				     "messages": [
			
 
				-      {
			
 
				-        "role": "system",
			
 
				-        "content": "You are a helpful assistant."
			
 
				-      },
			
 
				-      {
			
 
				-        "role": "user",
			
 
				-        "content": "Tell me a joke."
			
 
				-      }
			
 
				+      { "role": "system", "content": "You are a helpful assistant." },
			
 
				+      { "role": "user", "content": "Tell me a joke." }
			
 
				     ],
			
 
				     "stream": true
			
 
				   }'
			
 
				 ```
			
 
				 
			
 
				-## Documentation
			
 
				+## 构建
			
 
				 
			
 
				-Please see the [official docs site](https://docs.gpustack.ai) for complete documentation.
			
 
				+1. 安装 Python 3.10 ~ 3.12。
			
 
				 
			
 
				-## Build
			
 
				+2. 执行构建：
			
 
				 
			
 
				-1. Install Python (version 3.10 to 3.12).
			
 
				+```bash
			
 
				+make build
			
 
				+```
			
 
				+
			
 
				+构建产物位于 `dist` 目录。
			
 
				+
			
 
				+## 开发
			
 
				 
			
 
				-2. Run `make build`.
			
 
				+```bash
			
 
				+# 安装开发依赖
			
 
				+make install
			
 
				+
			
 
				+# 本地开发启动（需先运行数据库）
			
 
				+uv run gpustack start \
			
 
				+  --database-url postgresql://postgres:mysecretpassword@localhost:5432/postgres \
			
 
				+  --gateway-mode disabled \
			
 
				+  --api-port 80
			
 
				+```
			
 
				 
			
 
				-You can find the built wheel package in `dist` directory.
			
 
				+更多开发指南请参考 [Development Guide](docs/development.md)。
			
 
				 
			
 
				-## Contributing
			
 
				+## 文档
			
 
				 
			
 
				-Please read the [Contributing Guide](./docs/contributing.md) if you're interested in contributing to GPUStack.
			
 
				+完整文档请访问 [官方文档站点](https://docs.gpustack.ai)。
			
 
				 
			
 
				-## Join Community
			
 
				+## 加入社区
			
 
				 
			
 
				-Any issues or have suggestions, feel free to join our [Community](https://discord.gg/VXYJzuaqwD) for support.
			
 
				+有任何问题或建议，欢迎加入我们的 [Discord 社区](https://discord.gg/VXYJzuaqwD) 获取支持。
			
 
				 
			
 
				 ## License
			
 
				 
			
 
				-Copyright (c) 2024-2026 The GPUStack authors
			
 
				+Copyright (c) 2024-2026 The MASS-Base authors
			
 
				 
			
 
				-Licensed under the Apache License, Version 2.0 (the "License");
			
 
				-you may not use this file except in compliance with the License.
			
 
				-You may obtain a copy of the License at [LICENSE](./LICENSE) file for details.
			
 
				+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [LICENSE](./LICENSE).
			
 
				 
			
 
				-Unless required by applicable law or agreed to in writing, software
			
 
				-distributed under the License is distributed on an "AS IS" BASIS,
			
 
				-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
			
 
				-See the License for the specific language governing permissions and
			
 
				-limitations under the License.
			
 
				+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
			
--- a/README_CN.md
+++ b/README_CN.md
@@ -1,222 +0,0 @@
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-    <img alt="GPUStack" src="https://raw.githubusercontent.com/gpustack/gpustack/main/docs/assets/gpustack-logo.png" width="300px"/>
			
 
				-</p>
			
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-    <a href="https://docs.gpustack.ai" target="_blank">
			
 
				-        <img alt="Documentation" src="https://img.shields.io/badge/文档-GPUStack-blue?logo=readthedocs&logoColor=white"></a>
			
 
				-    <a href="./LICENSE" target="_blank">
			
 
				-        <img alt="License" src="https://img.shields.io/github/license/gpustack/gpustack?logo=github&logoColor=white&label=License&color=blue"></a>
			
 
				-    <a href="./docs/assets/wechat-group-qrcode.jpg" target="_blank">
			
 
				-        <img alt="WeChat" src="https://img.shields.io/badge/微信群-GPUStack-blue?logo=wechat&logoColor=white"></a>
			
 
				-    <a href="https://discord.gg/VXYJzuaqwD" target="_blank">
			
 
				-        <img alt="Discord" src="https://img.shields.io/badge/Discord-GPUStack-blue?logo=discord&logoColor=white"></a>
			
 
				-    <a href="https://twitter.com/intent/follow?screen_name=gpustack_ai" target="_blank">
			
 
				-        <img alt="Follow on X(Twitter)" src="https://img.shields.io/twitter/follow/gpustack_ai?logo=X"></a>
			
 
				-</p>
			
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-  <a href="./README.md">English</a> |
			
 
				-  <a href="./README_CN.md">简体中文</a> |
			
 
				-  <a href="./README_JP.md">日本語</a>
			
 
				-</p>
			
 
				-
			
 
				-<br>
			
 
				-
			
 
				-## 概述
			
 
				-
			
 
				-GPUStack 是一个开源的 GPU 集群管理器，专为高效的 AI 模型部署而设计。它配置和编排推理引擎（vLLM、SGLang、TensorRT-LLM 或您自定义的引擎），以优化跨 GPU 集群的性能。其核心功能包括：
			
 
				-- **多集群 GPU 管理。** 跨多个环境管理 GPU 集群。这包括本地服务器、Kubernetes 集群和云提供商。
			
 
				-- **可插拔推理引擎。** 自动配置高性能推理引擎，如 vLLM、SGLang 和 TensorRT-LLM。您也可以根据需要添加自定义推理引擎。
			
 
				-- **Day 0 模型支持。** GPUStack 的可插拔引擎架构使您能够在新模型发布当天即可部署。
			
 
				-- **性能优化配置。** 提供预调优模式，用于低延迟或高吞吐量。GPUStack 支持扩展的 KV 缓存系统，如 LMCache 和 HiCache，以减少 TTFT。它还包括对推测性解码方法（如 EAGLE3、MTP 和 N-grams）的内置支持。
			
 
				-- **企业级运维能力。** 支持自动故障恢复、负载均衡、监控、认证和访问控制。
			
 
				-
			
 
				-## 架构
			
 
				-
			
 
				-GPUStack 使开发团队、IT 组织和服务提供商能够大规模地提供模型即服务。它支持用于 LLM、语音、图像和视频模型的行业标准 API。该平台内置用户认证和访问控制、GPU 性能和利用率的实时监控，以及令牌使用量和 API 请求率的详细计量。
			
 
				-
			
 
				-下图展示了单个 GPUStack 服务器如何管理跨本地和云环境的多个 GPU 集群。GPUStack 调度器分配 GPU 以最大化资源利用率，并选择合适的推理引擎以实现最佳性能。管理员还可以通过集成的 Grafana 和 Prometheus 仪表板全面了解系统运行状况和指标。
			
 
				-
			
 
				-![gpustack-v2-architecture](docs/assets/gpustack-v2-architecture.png)
			
 
				-
			
 
				-## 优化的推理性能
			
 
				-
			
 
				-GPUStack 的自动化引擎选择和参数优化可开箱即用地提供强大的推理性能。下图展示了相较于默认 vLLM 配置的吞吐量提升：
			
 
				-
			
 
				-![a100-throughput-comparison](docs/assets/a100-throughput-comparison.png)
			
 
				-
			
 
				-有关详细的基准测试方法和结果，请访问我们的 [推理性能实验室](https://docs.gpustack.ai/latest/performance-lab/overview/)。
			
 
				-
			
 
				-## 支持的加速器
			
 
				-
			
 
				-GPUStack 支持多种 AI 推理加速器：
			
 
				-
			
 
				-- **NVIDIA GPU**
			
 
				-- **AMD GPU**
			
 
				-- **Ascend NPU**
			
 
				-- **Hygon DCU**
			
 
				-- **MThreads GPU**
			
 
				-- **Iluvatar GPU**
			
 
				-- **MetaX GPU**
			
 
				-- **Cambricon MLU**
			
 
				-- **T-Head PPU**
			
 
				-
			
 
				-有关详细的要求和设置说明，请参阅[安装要求](https://docs.gpustack.ai/latest/installation/requirements/)文档。
			
 
				-
			
 
				-## 快速入门
			
 
				-
			
 
				-### 前提条件
			
 
				-
			
 
				-1.  一个至少配备一块 NVIDIA GPU 的节点。对于其他类型的 GPU，请在 GPUStack UI 中添加 worker 时查看指南，或参阅[安装文档](https://docs.gpustack.ai/latest/installation/requirements/)获取更多详细信息。
			
 
				-2.  确保 worker 节点上已安装 NVIDIA 驱动程序、[Docker](https://docs.docker.com/engine/install/) 和 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)。
			
 
				-3.  （可选）一个用于托管 GPUStack server 的 CPU 节点。GPUStack server 不需要 GPU，可以在仅有 CPU 的机器上运行。必须安装 [Docker](https://docs.docker.com/engine/install/)。同时支持 Docker Desktop（适用于 Windows 和 macOS）。如果没有专用的 CPU 节点，可以将 GPUStack server 安装在 GPU worker 节点所在的同一台机器上。
			
 
				-4.  GPUStack worker 节点仅支持 Linux。如果你使用 Windows，可考虑使用 WSL2 并避免使用 Docker Desktop。macOS 不支持作为 GPUStack worker 节点。
			
 
				-
			
 
				-### 安装 GPUStack
			
 
				-
			
 
				-运行以下命令，使用 Docker 安装并启动 GPUStack server：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				-    --restart unless-stopped \
			
 
				-    -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    gpustack/gpustack
			
 
				-```
			
 
				-
			
 
				-<details>
			
 
				-<summary>备选方案：使用 Quay 容器仓库镜像</summary>
			
 
				-
			
 
				-如果你无法从 `Docker Hub` 拉取镜像或下载速度很慢，可以通过指向 `quay.io` 来使用我们的镜像：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				-    --restart unless-stopped \
			
 
				-    -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    quay.io/gpustack/gpustack \
			
 
				-    --system-default-container-registry quay.io
			
 
				-```
			
 
				-</details>
			
 
				-
			
 
				-检查 GPUStack 启动日志：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker logs -f gpustack
			
 
				-```
			
 
				-
			
 
				-GPUStack 启动后，运行以下命令获取默认管理员密码：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker exec gpustack cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				-打开浏览器，访问 `http://你的主机IP` 以进入 GPUStack UI。使用默认用户名 `admin` 和上面获取的密码登录。
			
 
				-
			
 
				-### 设置 GPU 集群
			
 
				-
			
 
				-1.  在 GPUStack UI 中，导航到 `集群` 页面。
			
 
				-2.  点击 `添加集群` 按钮。
			
 
				-3.  选择 `Docker` 作为集群提供商。
			
 
				-4.  填写新集群的 `名称` 和 `描述` 字段，然后点击 `保存` 按钮。
			
 
				-5.  按照界面指南配置新的 worker 节点。你需要在 worker 节点上运行一个 Docker 命令以将其连接到 GPUStack server。命令将类似于以下内容：
			
 
				-    ```bash
			
 
				-    sudo docker run -d --name gpustack-worker \
			
 
				-          --restart=unless-stopped \
			
 
				-          --privileged \
			
 
				-          --network=host \
			
 
				-          --volume /var/run/docker.sock:/var/run/docker.sock \
			
 
				-          --volume gpustack-data:/var/lib/gpustack \
			
 
				-          --runtime nvidia \
			
 
				-          gpustack/gpustack \
			
 
				-          --server-url http://你的_gpustack_server_url \
			
 
				-          --token 你的_worker_token \
			
 
				-          --advertise-address 192.168.1.2
			
 
				-    ```
			
 
				-6.  在 worker 节点上执行该命令以连接到 GPUStack server。
			
 
				-7.  worker 节点成功连接后，它将出现在 GPUStack UI 的 `Workers` 页面中。
			
 
				-
			
 
				-### 部署模型
			
 
				-
			
 
				-1.  在 GPUStack 用户界面中导航到 `Catalog` 页面。
			
 
				-2.  从可用模型列表中选择 `Qwen3 0.6B` 模型。
			
 
				-3.  部署兼容性检查通过后，点击 `Save` 按钮部署模型。
			
 
				-
			
 
				-![从目录部署 qwen3](docs/assets/quick-start/quick-start-qwen3.png)
			
 
				-
			
 
				-4.  GPUStack 将开始下载模型文件并部署模型。当部署状态显示为 `Running` 时，表示模型已成功部署。
			
 
				-
			
 
				-![模型运行中](docs/assets/quick-start/model-running.png)
			
 
				-
			
 
				-5.  点击导航菜单中的 `Playground - Chat`，检查右上角 `Model` 下拉菜单中是否选中了 `qwen3-0.6b` 模型。现在您可以在 UI  playground 中与模型聊天了。
			
 
				-
			
 
				-![快速聊天](docs/assets/quick-start/quick-chat.png)
			
 
				-
			
 
				-### 通过 API 使用模型
			
 
				-
			
 
				-1.  将鼠标悬停在用户头像上，导航到 `API Keys` 页面，然后点击 `New API Key` 按钮。
			
 
				-2.  填写 `Name` 并点击 `Save` 按钮。
			
 
				-3.  复制生成的 API 密钥并将其保存在安全的地方。请注意，该密钥仅在创建时可见一次。
			
 
				-4.  您现在可以使用该 API 密钥访问 GPUStack 提供的 OpenAI 兼容 API 端点。例如，使用 curl 如下所示：
			
 
				-
			
 
				-```bash
			
 
				-# 将 `your_api_key` 和 `your_gpustack_server_url`
			
 
				-# 替换为您实际的 API 密钥和 GPUStack 服务器 URL。
			
 
				-export GPUSTACK_API_KEY=your_api_key
			
 
				-curl http://your_gpustack_server_url/v1/chat/completions \
			
 
				-  -H "Content-Type: application/json" \
			
 
				-  -H "Authorization: Bearer $GPUSTACK_API_KEY" \
			
 
				-  -d '{
			
 
				-    "model": "qwen3-0.6b",
			
 
				-    "messages": [
			
 
				-      {
			
 
				-        "role": "system",
			
 
				-        "content": "You are a helpful assistant."
			
 
				-      },
			
 
				-      {
			
 
				-        "role": "user",
			
 
				-        "content": "Tell me a joke."
			
 
				-      }
			
 
				-    ],
			
 
				-    "stream": true
			
 
				-  }'
			
 
				-```
			
 
				-
			
 
				-## 文档
			
 
				-
			
 
				-请参阅 [官方文档站点](https://docs.gpustack.ai) 获取完整文档。
			
 
				-
			
 
				-## 构建
			
 
				-
			
 
				-1.  安装 Python（版本 3.10 到 3.12）。
			
 
				-2.  运行 `make build`。
			
 
				-
			
 
				-您可以在 `dist` 目录中找到构建好的 wheel 包。
			
 
				-
			
 
				-## 贡献
			
 
				-
			
 
				-如果您有兴趣为 GPUStack 做贡献，请阅读 [贡献指南](./docs/contributing.md)。
			
 
				-
			
 
				-## 加入社区
			
 
				-
			
 
				-扫码加入社区群：
			
 
				-
			
 
				-<p align="left">
			
 
				-    <img alt="Wechat-group" src="./docs/assets/wechat-group-qrcode.jpg" width="300px"/>
			
 
				-</p>
			
 
				-
			
 
				-## 许可证
			
 
				-
			
 
				-版权所有 (c) 2024-2026 GPUStack 作者
			
 
				-
			
 
				-根据 Apache License, Version 2.0（"许可证"）授权；
			
 
				-除非符合许可证，否则您不得使用此文件。
			
 
				-您可以在 [LICENSE](./LICENSE) 文件中获取许可证副本。
			
 
				-
			
 
				-除非适用法律要求或书面同意，根据许可证分发的软件按"原样"分发，无任何明示或暗示的担保或条件。
			
 
				-请参阅许可证中规定的特定语言管理权限及许可证下的限制。
			
--- a/README_JP.md
+++ b/README_JP.md
@@ -1,224 +0,0 @@
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-    <img alt="GPUStack" src="https://raw.githubusercontent.com/gpustack/gpustack/main/docs/assets/gpustack-logo.png" width="300px"/>
			
 
				-</p>
			
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-    <a href="https://docs.gpustack.ai" target="_blank">
			
 
				-        <img alt="Documentation" src="https://img.shields.io/badge/ドキュメント-GPUStack-blue?logo=readthedocs&logoColor=white"></a>
			
 
				-    <a href="./LICENSE" target="_blank">
			
 
				-        <img alt="License" src="https://img.shields.io/github/license/gpustack/gpustack?logo=github&logoColor=white&label=License&color=blue"></a>
			
 
				-    <a href="./docs/assets/wechat-group-qrcode.jpg" target="_blank">
			
 
				-        <img alt="WeChat" src="https://img.shields.io/badge/微信群-GPUStack-blue?logo=wechat&logoColor=white"></a>
			
 
				-    <a href="https://discord.gg/VXYJzuaqwD" target="_blank">
			
 
				-        <img alt="Discord" src="https://img.shields.io/badge/Discord-GPUStack-blue?logo=discord&logoColor=white"></a>
			
 
				-    <a href="https://twitter.com/intent/follow?screen_name=gpustack_ai" target="_blank">
			
 
				-        <img alt="Follow on X(Twitter)" src="https://img.shields.io/twitter/follow/gpustack_ai?logo=X"></a>
			
 
				-</p>
			
 
				-<br>
			
 
				-
			
 
				-<p align="center">
			
 
				-  <a href="./README.md">English</a> |
			
 
				-  <a href="./README_CN.md">简体中文</a> |
			
 
				-  <a href="./README_JP.md">日本語</a>
			
 
				-</p>
			
 
				-
			
 
				-<br>
			
 
				-
			
 
				-## 概要
			
 
				-
			
 
				-GPUStackは、効率的なAIモデルデプロイメントのために設計されたオープンソースのGPUクラスタマネージャーです。推論エンジン（vLLM、SGLang、TensorRT-LLM、またはカスタムエンジン）を構成・オーケストレーションし、GPUクラスタ全体のパフォーマンスを最適化します。主な機能は以下の通りです：
			
 
				-- **マルチクラスタGPU管理。** 複数の環境にわたるGPUクラスタを管理します。これには、オンプレミスサーバー、Kubernetesクラスタ、およびクラウドプロバイダが含まれます。
			
 
				-- **プラグ可能な推論エンジン。** vLLM、SGLang、TensorRT-LLMなどの高性能推論エンジンを自動的に設定します。必要に応じてカスタム推論エンジンを追加することもできます。
			
 
				-- **Day 0モデルサポート。** GPUStackのプラグ可能なエンジンアーキテクチャにより、新しいモデルがリリースされた当日にデプロイできます。
			
 
				-- **パフォーマンス最適化設定。** 低レイテンシまたは高スループット向けの事前調整済みモードを提供します。GPUStackは、LMCacheやHiCacheなどの拡張KVキャッシュシステムをサポートし、TTFTを削減します。また、EAGLE3、MTP、N-gramなどの投機的デコード手法の組み込みサポートも含まれます。
			
 
				-- **エンタープライズグレードの運用。** 自動化された障害回復、負荷分散、監視、認証、およびアクセス制御のサポートを提供します。
			
 
				-
			
 
				-## アーキテクチャ
			
 
				-
			
 
				-GPUStackは、開発チーム、IT組織、およびサービスプロバイダーが大規模なモデルサービスを提供できるようにします。LLM、音声、画像、ビデオモデル向けの業界標準APIをサポートしています。このプラットフォームには、組み込みのユーザー認証とアクセス制御、GPUパフォーマンスと使用率のリアルタイム監視、トークン使用量とAPIリクエストレートの詳細なメータリングが含まれています。
			
 
				-
			
 
				-以下の図は、単一のGPUStackサーバーがオンプレミスとクラウド環境の両方にまたがる複数のGPUクラスタをどのように管理できるかを示しています。GPUStackスケジューラは、リソース使用率を最大化するためにGPUを割り当て、最適なパフォーマンスを得るために適切な推論エンジンを選択します。管理者は、統合されたGrafanaおよびPrometheusダッシュボードを通じて、システムの健全性とメトリクスに関する完全な可視性も得ます。
			
 
				-
			
 
				-![gpustack-v2-architecture](docs/assets/gpustack-v2-architecture.png)
			
 
				-
			
 
				-## 最適化された推論パフォーマンス
			
 
				-
			
 
				-GPUStackの自動化されたエンジン選択とパラメータ最適化により、すぐに使える強力な推論パフォーマンスを提供します。以下の図は、デフォルトのvLLM設定と比較したスループットの向上を示しています：
			
 
				-
			
 
				-![a100-throughput-comparison](docs/assets/a100-throughput-comparison.png)
			
 
				-
			
 
				-詳細なベンチマーク方法と結果については、[推論パフォーマンスラボ](https://docs.gpustack.ai/latest/performance-lab/overview/)をご覧ください。
			
 
				-
			
 
				-## サポートされているアクセラレータ
			
 
				-
			
 
				-GPUStack は AI 推論用の幅広いアクセラレータをサポートしています：
			
 
				-
			
 
				-- **NVIDIA GPU**
			
 
				-- **AMD GPU**
			
 
				-- **Ascend NPU**
			
 
				-- **Hygon DCU**
			
 
				-- **MThreads GPU**
			
 
				-- **Iluvatar GPU**
			
 
				-- **MetaX GPU**
			
 
				-- **Cambricon MLU**
			
 
				-- **T-Head PPU**
			
 
				-
			
 
				-詳細な要件とセットアップ手順については、[インストール要件](https://docs.gpustack.ai/latest/installation/requirements/)ドキュメントを参照してください。
			
 
				-
			
 
				-## クイックスタート
			
 
				-
			
 
				-### 前提条件
			
 
				-
			
 
				-1.  少なくとも1つの NVIDIA GPU を搭載したノード。他の GPU タイプについては、GPUStack UI で worker を追加する際のガイドラインを参照するか、詳細については[インストールドキュメント](https://docs.gpustack.ai/latest/installation/requirements/)を参照してください。
			
 
				-2.  worker ノードに NVIDIA ドライバー、[Docker](https://docs.docker.com/engine/install/)、[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) がインストールされていることを確認してください。
			
 
				-3.  （オプション）GPUStack server をホストするための CPU ノード。GPUStack server は GPU を必要とせず、CPU のみのマシンで実行できます。[Docker](https://docs.docker.com/engine/install/) がインストールされている必要があります。Docker Desktop（Windows および macOS 用）もサポートされています。専用の CPU ノードがない場合は、GPU worker ノードと同じマシンに GPUStack server をインストールできます。
			
 
				-4.  GPUStack worker ノードは Linux のみをサポートしています。Windows を使用する場合は、WSL2 の使用を検討し、Docker Desktop の使用は避けてください。macOS は GPUStack worker ノードとしてサポートされていません。
			
 
				-
			
 
				-### GPUStack のインストール
			
 
				-
			
 
				-以下のコマンドを実行して、Docker を使用して GPUStack server をインストールし起動します：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				-    --restart unless-stopped \
			
 
				-    -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    gpustack/gpustack
			
 
				-```
			
 
				-
			
 
				-<details>
			
 
				-<summary>代替案：Quay コンテナレジストリミラーの使用</summary>
			
 
				-
			
 
				-`Docker Hub` からイメージをプルできない場合やダウンロードが非常に遅い場合は、`quay.io` を指定することで当社のミラーを使用できます：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker run -d --name gpustack \
			
 
				-    --restart unless-stopped \
			
 
				-    -p 80:80 \
			
 
				-    --volume gpustack-data:/var/lib/gpustack \
			
 
				-    quay.io/gpustack/gpustack \
			
 
				-    --system-default-container-registry quay.io
			
 
				-```
			
 
				-</details>
			
 
				-
			
 
				-GPUStack の起動ログを確認します：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker logs -f gpustack
			
 
				-```
			
 
				-
			
 
				-GPUStack が起動したら、以下のコマンドを実行してデフォルトの管理者パスワードを取得します：
			
 
				-
			
 
				-```bash
			
 
				-sudo docker exec gpustack cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				-ブラウザを開き、`http://あなたのホストIP` にアクセスして GPUStack UI にアクセスします。デフォルトのユーザー名 `admin` と上記で取得したパスワードを使用してログインします。
			
 
				-
			
 
				-### GPU クラスターのセットアップ
			
 
				-
			
 
				-1.  GPUStack UI で、`Clusters` ページに移動します。
			
 
				-2.  `Add Cluster` ボタンをクリックします。
			
 
				-3.  クラスタープロバイダーとして `Docker` を選択します。
			
 
				-4.  新しいクラスターの `Name` と `Description` フィールドに入力し、`Save` ボタンをクリックします。
			
 
				-5.  UI のガイドラインに従って新しい worker ノードを設定します。worker ノードを GPUStack server に接続するには、worker ノードで Docker コマンドを実行する必要があります。コマンドは以下のようになります：
			
 
				-    ```bash
			
 
				-    sudo docker run -d --name gpustack-worker \
			
 
				-          --restart=unless-stopped \
			
 
				-          --privileged \
			
 
				-          --network=host \
			
 
				-          --volume /var/run/docker.sock:/var/run/docker.sock \
			
 
				-          --volume gpustack-data:/var/lib/gpustack \
			
 
				-          --runtime nvidia \
			
 
				-          gpustack/gpustack \
			
 
				-          --server-url http://your_gpustack_server_url \
			
 
				-          --token your_worker_token \
			
 
				-          --advertise-address 192.168.1.2
			
 
				-    ```
			
 
				-6.  worker ノードでこのコマンドを実行して GPUStack server に接続します。
			
 
				-7.  worker ノードが正常に接続されると、GPUStack UI の `Workers` ページに表示されます。
			
 
				-
			
 
				-### モデルのデプロイ
			
 
				-
			
 
				-1. GPUStack UIの`Catalog`ページに移動します。
			
 
				-
			
 
				-2. 利用可能なモデルのリストから`Qwen3 0.6B`モデルを選択します。
			
 
				-
			
 
				-3. デプロイ互換性チェックが通過した後、`Save`ボタンをクリックしてモデルをデプロイします。
			
 
				-
			
 
				-![カタログからqwen3をデプロイ](docs/assets/quick-start/quick-start-qwen3.png)
			
 
				-
			
 
				-4. GPUStackはモデルファイルのダウンロードとモデルのデプロイを開始します。デプロイステータスが`Running`と表示されたら、モデルは正常にデプロイされています。
			
 
				-
			
 
				-![モデルが実行中](docs/assets/quick-start/model-running.png)
			
 
				-
			
 
				-5. ナビゲーションメニューで`Playground - Chat`をクリックし、右上の`Model`ドロップダウンからモデル`qwen3-0.6b`が選択されていることを確認します。これでUIプレイグラウンドでモデルとチャットできるようになります。
			
 
				-
			
 
				-![クイックチャット](docs/assets/quick-start/quick-chat.png)
			
 
				-
			
 
				-### API経由でモデルを使用
			
 
				-
			
 
				-1. ユーザーアバターにカーソルを合わせて`API Keys`ページに移動し、`New API Key`ボタンをクリックします。
			
 
				-
			
 
				-2. `Name`を入力し、`Save`ボタンをクリックします。
			
 
				-
			
 
				-3. 生成されたAPIキーをコピーし、安全な場所に保存します。このキーは作成時に一度しか確認できないことに注意してください。
			
 
				-
			
 
				-4. これで、このAPIキーを使用して、GPUStackが提供するOpenAI互換のAPIエンドポイントにアクセスできます。例えば、以下のようにcurlを使用します：
			
 
				-
			
 
				-```bash
			
 
				-# `your_api_key` と `your_gpustack_server_url` を
			
 
				-# 実際のAPIキーとGPUStackサーバーのURLに置き換えてください。
			
 
				-export GPUSTACK_API_KEY=your_api_key
			
 
				-curl http://your_gpustack_server_url/v1/chat/completions \
			
 
				-  -H "Content-Type: application/json" \
			
 
				-  -H "Authorization: Bearer $GPUSTACK_API_KEY" \
			
 
				-  -d '{
			
 
				-    "model": "qwen3-0.6b",
			
 
				-    "messages": [
			
 
				-      {
			
 
				-        "role": "system",
			
 
				-        "content": "あなたは役立つアシスタントです。"
			
 
				-      },
			
 
				-      {
			
 
				-        "role": "user",
			
 
				-        "content": "ジョークを教えてください。"
			
 
				-      }
			
 
				-    ],
			
 
				-    "stream": true
			
 
				-  }'
			
 
				-```
			
 
				-
			
 
				-## ドキュメント
			
 
				-
			
 
				-完全なドキュメントについては、[公式ドキュメントサイト](https://docs.gpustack.ai)を参照してください。
			
 
				-
			
 
				-## ビルド
			
 
				-
			
 
				-1. Python（バージョン3.10から3.12）をインストールします。
			
 
				-
			
 
				-2. `make build`を実行します。
			
 
				-
			
 
				-ビルドされたwheelパッケージは`dist`ディレクトリにあります。
			
 
				-
			
 
				-## 貢献
			
 
				-
			
 
				-GPUStackへの貢献に興味がある場合は、[貢献ガイド](./docs/contributing.md)をお読みください。
			
 
				-
			
 
				-## コミュニティに参加
			
 
				-
			
 
				-問題がある場合、または提案がある場合は、お気軽に私たちの[コミュニティ](https://discord.gg/VXYJzuaqwD)に参加してサポートを受けてください。
			
 
				-
			
 
				-## ライセンス
			
 
				-
			
 
				-Copyright (c) 2024-2026 The GPUStack authors
			
 
				-
			
 
				-Apache License, Version 2.0（「ライセンス」）に基づいてライセンスされます。
			
 
				-ライセンスに準拠しない限り、このファイルを使用することはできません。
			
 
				-ライセンスのコピーは[LICENSE](./LICENSE)ファイルで入手できます。
			
 
				-
			
 
				-適用される法律で要求されない限り、または書面で合意されない限り、本ライセンスに基づいて配布されるソフトウェアは、明示黙示を問わず、いかなる保証も条件もなしに「現状のまま」配布されます。
			
 
				-ライセンスの権利と制限を規定する特定の言語については、ライセンスを参照してください。
			
--- a/docs/deployment-from-source-docker.md
+++ b/docs/deployment-from-source-docker.md
@@ -1,210 +0,0 @@
 
				-# 基于源码的 Docker 部署指南
			
 
				-
			
 
				-Higress 已内置于 Docker 镜像中（通过 s6-overlay 管理），无需单独部署。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 一、构建镜像
			
 
				-
			
 
				-### 环境要求
			
 
				-
			
 
				-- Linux（x86_64 或 arm64）
			
 
				-- Docker 24.0+，启用 BuildKit
			
 
				-- Git
			
 
				-
			
 
				-### 1. 克隆代码
			
 
				-
			
 
				-```bash
			
 
				-git clone <your-repo-url> /opt/gpustack-src
			
 
				-cd /opt/gpustack-src
			
 
				-```
			
 
				-
			
 
				-### 2. 初始化 buildx（首次执行）
			
 
				-
			
 
				-```bash
			
 
				-docker run --rm --privileged tonistiigi/binfmt:qemu-v9.2.2-52 --install all
			
 
				-docker buildx create \
			
 
				-    --name gpustack \
			
 
				-    --driver docker-container \
			
 
				-    --driver-opt "network=host,default-load=true" \
			
 
				-    --bootstrap
			
 
				-```
			
 
				-
			
 
				-### 3. 构建镜像
			
 
				-
			
 
				-```bash
			
 
				-# 使用项目脚本构建（推荐）
			
 
				-PACKAGE_TAG=my-build PACKAGE_PUSH=false bash hack/package.sh
			
 
				-```
			
 
				-
			
 
				-构建完成后镜像名为 `gpustack/gpustack:my-build`。
			
 
				-
			
 
				-也可以直接用 docker buildx：
			
 
				-
			
 
				-```bash
			
 
				-docker buildx build \
			
 
				-    --builder gpustack \
			
 
				-    --platform linux/amd64 \
			
 
				-    --tag gpustack/gpustack:my-build \
			
 
				-    --file pack/Dockerfile \
			
 
				-    --ulimit nofile=65536:65536 \
			
 
				-    --shm-size 16G \
			
 
				-    --load \
			
 
				-    .
			
 
				-```
			
 
				-
			
 
				-> 构建时间较长（30~60 分钟），会下载 Higress、Prometheus、Grafana 等组件。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 二、部署 Server
			
 
				-
			
 
				-Server 负责 API、调度、数据库、Gateway，是集群的控制节点。
			
 
				-
			
 
				-### 1. 进入 docker-compose 目录
			
 
				-
			
 
				-```bash
			
 
				-cd /opt/gpustack-src/docker-compose
			
 
				-```
			
 
				-
			
 
				-### 2. 创建 `.env` 文件
			
 
				-
			
 
				-```bash
			
 
				-cat > .env <<EOF
			
 
				-POSTGRES_PASSWORD=your_strong_password
			
 
				-EOF
			
 
				-```
			
 
				-
			
 
				-### 3. 启动 Server
			
 
				-
			
 
				-```bash
			
 
				-docker compose -f docker-compose.server.yaml up -d --build
			
 
				-```
			
 
				-
			
 
				-### 4. 查看初始管理员密码
			
 
				-
			
 
				-```bash
			
 
				-docker exec gpustack-server cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				-### 5. 获取 Worker 注册 Token
			
 
				-
			
 
				-Worker 节点加入集群时需要此 Token：
			
 
				-
			
 
				-```bash
			
 
				-docker exec gpustack-server cat /var/lib/gpustack/token
			
 
				-```
			
 
				-
			
 
				-### 6. 访问
			
 
				-
			
 
				-浏览器打开 `http://<Server-IP>`，使用 `admin` 和初始密码登录。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 三、部署 Worker
			
 
				-
			
 
				-Worker 负责运行模型推理实例，可部署在多台 GPU 机器上。
			
 
				-
			
 
				-> 前提：Server 已启动并可访问。
			
 
				-
			
 
				-### 1. 在 Worker 机器上克隆代码并构建镜像
			
 
				-
			
 
				-```bash
			
 
				-git clone <your-repo-url> /opt/gpustack-src
			
 
				-cd /opt/gpustack-src
			
 
				-
			
 
				-PACKAGE_TAG=my-build PACKAGE_PUSH=false bash hack/package.sh
			
 
				-```
			
 
				-
			
 
				-### 2. 启动 Worker 容器
			
 
				-
			
 
				-```bash
			
 
				-docker run -d \
			
 
				-    --name gpustack-worker \
			
 
				-    --restart unless-stopped \
			
 
				-    --ulimit nofile=65535:65535 \
			
 
				-    -v gpustack-worker-data:/var/lib/gpustack \
			
 
				-    gpustack/gpustack:my-build \
			
 
				-    --server-url http://<Server-IP> \
			
 
				-    --token <上一步获取的Token>
			
 
				-```
			
 
				-
			
 
				-### 3. 验证 Worker 注册
			
 
				-
			
 
				-在 Server 的 Web UI 中查看 Workers 页面，确认新 Worker 已上线。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 四、含监控部署（Prometheus + Grafana）
			
 
				-
			
 
				-```bash
			
 
				-cat > .env <<EOF
			
 
				-POSTGRES_PASSWORD=your_strong_password
			
 
				-GRAFANA_PASSWORD=your_grafana_password
			
 
				-GPUSTACK_GRAFANA_URL=http://<Server-IP>:3000
			
 
				-EOF
			
 
				-
			
 
				-docker compose -f docker-compose.external-observability.yaml up -d --build
			
 
				-```
			
 
				-
			
 
				-| 服务 | 地址 | 默认账号 |
			
 
				-|------|------|----------|
			
 
				-| GPUStack | `http://<IP>:80` | admin / 见 initial_admin_password |
			
 
				-| Grafana | `http://<IP>:3000` | admin / 见 .env |
			
 
				-| Prometheus | `http://<IP>:9090` | - |
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 五、常用运维命令
			
 
				-
			
 
				-```bash
			
 
				-# 查看 Server 日志
			
 
				-docker logs -f gpustack-server
			
 
				-
			
 
				-# 查看 Worker 日志
			
 
				-docker logs -f gpustack-worker
			
 
				-
			
 
				-# 重启 Server
			
 
				-docker compose -f docker-compose.server.yaml restart gpustack-server
			
 
				-
			
 
				-# 重启 Worker
			
 
				-docker restart gpustack-worker
			
 
				-
			
 
				-# 重新构建并更新 Server
			
 
				-PACKAGE_TAG=new-build bash hack/package.sh
			
 
				-docker compose -f docker-compose.server.yaml up -d --build
			
 
				-
			
 
				-# 重新构建并更新 Worker
			
 
				-PACKAGE_TAG=new-build bash hack/package.sh
			
 
				-docker rm -f gpustack-worker
			
 
				-docker run -d --name gpustack-worker ... # 同上启动命令
			
 
				-
			
 
				-# 停止所有服务
			
 
				-docker compose -f docker-compose.server.yaml down
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 六、注意事项
			
 
				-
			
 
				-1. **构建需要访问 GitHub**：Higress、s6-overlay 等组件从 GitHub 下载，网络不通时需配置代理：
			
 
				-   ```bash
			
 
				-   export HTTPS_PROXY=http://your-proxy:port
			
 
				-   ```
			
 
				-
			
 
				-2. **磁盘空间**：构建过程需要约 20GB 空间（含构建缓存）。
			
 
				-
			
 
				-3. **NVIDIA GPU 支持**：需提前安装 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)，Worker 启动时添加：
			
 
				-   ```bash
			
 
				-   docker run -d \
			
 
				-       --name gpustack-worker \
			
 
				-       --restart unless-stopped \
			
 
				-       --gpus all \
			
 
				-       --ulimit nofile=65535:65535 \
			
 
				-       -v gpustack-worker-data:/var/lib/gpustack \
			
 
				-       gpustack/gpustack:my-build \
			
 
				-       --server-url http://<Server-IP> \
			
 
				-       --token <Token>
			
 
				-   ```
			
 
				-
			
 
				-4. **端口占用**：确保 Server 机器的 80 端口未被占用，Worker 机器的 40000-40063 端口（推理服务端口）未被占用。
			
--- a/docs/deployment-linux-docker.md
+++ b/docs/deployment-linux-docker.md
@@ -1,187 +0,0 @@
 
				-# Linux Docker 部署指南
			
 
				-
			
 
				-## 前置要求
			
 
				-
			
 
				-- Linux（Ubuntu 22.04+ 或 CentOS 8+）
			
 
				-- Docker 24.0+
			
 
				-- Docker Compose v2.20+
			
 
				-
			
 
				-安装 Docker：
			
 
				-```bash
			
 
				-curl -fsSL https://get.docker.com | sh
			
 
				-systemctl enable --now docker
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 一、基础部署（含内置 PostgreSQL）
			
 
				-
			
 
				-适合快速上手，所有组件运行在同一台机器。
			
 
				-
			
 
				-### 1. 进入 docker-compose 目录
			
 
				-
			
 
				-```bash
			
 
				-cd /path/to/maas-base/docker-compose
			
 
				-```
			
 
				-
			
 
				-### 2. 启动服务
			
 
				-
			
 
				-```bash
			
 
				-docker compose -f docker-compose.server.yaml up -d
			
 
				-```
			
 
				-
			
 
				-### 3. 查看初始管理员密码
			
 
				-
			
 
				-```bash
			
 
				-docker exec gpustack-server cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				-### 4. 访问
			
 
				-
			
 
				-浏览器打开 `http://<服务器IP>`，使用 `admin` 和上一步获取的密码登录。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 二、使用外部 PostgreSQL
			
 
				-
			
 
				-如果已有 PostgreSQL 实例，通过环境变量指定连接地址。
			
 
				-
			
 
				-### 1. 创建 `.env` 文件
			
 
				-
			
 
				-```bash
			
 
				-cat > .env <<EOF
			
 
				-POSTGRES_PASSWORD=your_strong_password
			
 
				-EOF
			
 
				-```
			
 
				-
			
 
				-### 2. 修改 `docker-compose.server.yaml` 中的数据库配置
			
 
				-
			
 
				-将 `postgres` 服务替换为外部数据库连接：
			
 
				-
			
 
				-```yaml
			
 
				-environment:
			
 
				-  GPUSTACK_DATABASE_URL: postgresql://gpustack:your_password@your_db_host:5432/gpustack
			
 
				-```
			
 
				-
			
 
				-并删除 `postgres` 服务和 `postgres-data` volume。
			
 
				-
			
 
				-确保外部数据库已执行授权：
			
 
				-```sql
			
 
				-GRANT CREATE ON SCHEMA public TO gpustack;
			
 
				-GRANT ALL PRIVILEGES ON DATABASE gpustack TO gpustack;
			
 
				-```
			
 
				-
			
 
				-### 3. 启动
			
 
				-
			
 
				-```bash
			
 
				-docker compose -f docker-compose.server.yaml up -d
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 三、含外部监控部署（Prometheus + Grafana）
			
 
				-
			
 
				-适合需要独立监控面板的场景。
			
 
				-
			
 
				-### 1. 创建 `.env` 文件
			
 
				-
			
 
				-```bash
			
 
				-cat > .env <<EOF
			
 
				-POSTGRES_PASSWORD=your_strong_password
			
 
				-GRAFANA_PASSWORD=your_grafana_password
			
 
				-GPUSTACK_GRAFANA_URL=http://<服务器IP>:3000
			
 
				-EOF
			
 
				-```
			
 
				-
			
 
				-> `GPUSTACK_GRAFANA_URL` 必须是浏览器可访问的地址（不能是容器内部地址）。
			
 
				-
			
 
				-### 2. 启动
			
 
				-
			
 
				-```bash
			
 
				-docker compose -f docker-compose.external-observability.yaml up -d
			
 
				-```
			
 
				-
			
 
				-### 3. 访问
			
 
				-
			
 
				-| 服务 | 地址 | 默认账号 |
			
 
				-|------|------|----------|
			
 
				-| GPUStack | `http://<IP>:80` | admin / 见 initial_admin_password |
			
 
				-| Grafana | `http://<IP>:3000` | admin / 见 .env |
			
 
				-| Prometheus | `http://<IP>:9090` | - |
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 四、常用运维命令
			
 
				-
			
 
				-```bash
			
 
				-# 查看服务状态
			
 
				-docker compose -f docker-compose.server.yaml ps
			
 
				-
			
 
				-# 查看日志
			
 
				-docker logs -f gpustack-server
			
 
				-
			
 
				-# 停止服务
			
 
				-docker compose -f docker-compose.server.yaml down
			
 
				-
			
 
				-# 停止并删除数据（危险）
			
 
				-docker compose -f docker-compose.server.yaml down -v
			
 
				-
			
 
				-# 更新镜像
			
 
				-docker compose -f docker-compose.server.yaml pull
			
 
				-docker compose -f docker-compose.server.yaml up -d
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 五、GPU 支持
			
 
				-
			
 
				-### NVIDIA GPU
			
 
				-
			
 
				-需要安装 [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)：
			
 
				-
			
 
				-```bash
			
 
				-curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
			
 
				-curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
			
 
				-  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
			
 
				-  tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
			
 
				-apt-get update && apt-get install -y nvidia-container-toolkit
			
 
				-nvidia-ctk runtime configure --runtime=docker
			
 
				-systemctl restart docker
			
 
				-```
			
 
				-
			
 
				-在 `docker-compose.server.yaml` 的 `gpustack-server` 服务中添加：
			
 
				-
			
 
				-```yaml
			
 
				-deploy:
			
 
				-  resources:
			
 
				-    reservations:
			
 
				-      devices:
			
 
				-        - driver: nvidia
			
 
				-          count: all
			
 
				-          capabilities: [gpu]
			
 
				-```
			
 
				-
			
 
				-### AMD GPU
			
 
				-
			
 
				-在 `gpustack-server` 服务中添加：
			
 
				-
			
 
				-```yaml
			
 
				-devices:
			
 
				-  - /dev/kfd:/dev/kfd
			
 
				-  - /dev/dri:/dev/dri
			
 
				-group_add:
			
 
				-  - video
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 六、注意事项
			
 
				-
			
 
				-1. **端口冲突**：确保 80、5432、9090、3000 端口未被占用。
			
 
				-2. **防火墙**：开放对应端口：
			
 
				-   ```bash
			
 
				-   ufw allow 80/tcp
			
 
				-   ufw allow 3000/tcp   # Grafana（如需外部访问）
			
 
				-   ```
			
 
				-3. **数据持久化**：数据存储在 Docker volume 中，删除容器不会丢失数据，但 `down -v` 会清除所有数据。
			
 
				-4. **生产环境**：建议修改 `.env` 中的所有默认密码。
			
--- a/docs/deployment-linux.md
+++ b/docs/deployment-linux.md
@@ -1,212 +0,0 @@
 
				-# Linux 部署指南
			
 
				-
			
 
				-## 环境要求
			
 
				-
			
 
				-- OS：Ubuntu 22.04 / 24.04（推荐）或 CentOS 8+
			
 
				-- Python：3.11
			
 
				-- PostgreSQL：16+（外部实例或自建）
			
 
				-- uv：Python 包管理器
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 一、安装依赖
			
 
				-
			
 
				-```bash
			
 
				-# 系统依赖
			
 
				-apt-get update && apt-get install -y \
			
 
				-    python3.11 python3.11-venv python3.11-dev \
			
 
				-    build-essential libssl-dev libffi-dev \
			
 
				-    libpq-dev git curl
			
 
				-
			
 
				-# 安装 uv
			
 
				-curl -LsSf https://astral.sh/uv/install.sh | sh
			
 
				-source $HOME/.local/bin/env
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 二、准备代码
			
 
				-
			
 
				-```bash
			
 
				-git clone <your-repo-url> /opt/gpustack
			
 
				-cd /opt/gpustack
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 三、创建虚拟环境并安装依赖
			
 
				-
			
 
				-```bash
			
 
				-cd /opt/gpustack
			
 
				-uv venv .venv --python 3.11
			
 
				-source .venv/bin/activate
			
 
				-uv pip install -e .
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 四、准备数据库
			
 
				-
			
 
				-使用已有 PostgreSQL 实例，确保数据库和用户已创建并授权：
			
 
				-
			
 
				-```sql
			
 
				-CREATE USER gpustack WITH PASSWORD 'your_password';
			
 
				-CREATE DATABASE gpustack OWNER gpustack;
			
 
				-GRANT ALL ON SCHEMA public TO gpustack;
			
 
				-GRANT ALL PRIVILEGES ON DATABASE gpustack TO gpustack;
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 五、准备前端 UI
			
 
				-
			
 
				-将前端构建产物放到 `/opt/gpustack/gpustack/ui/`：
			
 
				-
			
 
				-```bash
			
 
				-# 在前端项目目录执行构建
			
 
				-cd /path/to/maas-base-ui
			
 
				-npm install && npm run build
			
 
				-
			
 
				-# 将 dist 目录复制为 ui 目录
			
 
				-cp -r dist /opt/gpustack/gpustack/ui
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 六、启动服务
			
 
				-
			
 
				-### 开发/测试模式（禁用 Gateway，直接暴露 API）
			
 
				-
			
 
				-```bash
			
 
				-cd /opt/gpustack
			
 
				-source .venv/bin/activate
			
 
				-
			
 
				-gpustack start \
			
 
				-  --database-url "postgresql://gpustack:your_password@db_host:5432/gpustack" \
			
 
				-  --gateway-mode disabled \
			
 
				-  --api-port 80 \
			
 
				-  --debug
			
 
				-```
			
 
				-
			
 
				-### 生产模式（含 Higress Gateway，推荐 Docker）
			
 
				-
			
 
				-见下方 Docker 部署。
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 七、Docker 部署（生产推荐）
			
 
				-
			
 
				-项目提供了完整的 Docker 镜像，内置 PostgreSQL、Higress、Prometheus、Grafana。
			
 
				-
			
 
				-### 使用外部 PostgreSQL
			
 
				-
			
 
				-```bash
			
 
				-docker run -d \
			
 
				-  --name gpustack \
			
 
				-  --restart unless-stopped \
			
 
				-  -p 80:80 \
			
 
				-  -v gpustack-data:/var/lib/gpustack \
			
 
				-  -e GPUSTACK_DATABASE_URL="postgresql://gpustack:your_password@db_host:5432/gpustack" \
			
 
				-  gpustack/gpustack:latest
			
 
				-```
			
 
				-
			
 
				-### 使用内置 PostgreSQL
			
 
				-
			
 
				-```bash
			
 
				-docker run -d \
			
 
				-  --name gpustack \
			
 
				-  --restart unless-stopped \
			
 
				-  -p 80:80 \
			
 
				-  -v gpustack-data:/var/lib/gpustack \
			
 
				-  gpustack/gpustack:latest
			
 
				-```
			
 
				-
			
 
				-### 查看初始管理员密码
			
 
				-
			
 
				-```bash
			
 
				-docker exec gpustack cat /var/lib/gpustack/initial_admin_password
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 八、Systemd 服务（裸机部署）
			
 
				-
			
 
				-创建 `/etc/systemd/system/gpustack.service`：
			
 
				-
			
 
				-```ini
			
 
				-[Unit]
			
 
				-Description=GPUStack Server
			
 
				-After=network.target postgresql.service
			
 
				-
			
 
				-[Service]
			
 
				-Type=simple
			
 
				-User=root
			
 
				-WorkingDirectory=/opt/gpustack
			
 
				-Environment="PATH=/opt/gpustack/.venv/bin:/usr/local/bin:/usr/bin:/bin"
			
 
				-ExecStart=/opt/gpustack/.venv/bin/gpustack start \
			
 
				-    --database-url postgresql://gpustack:your_password@db_host:5432/gpustack \
			
 
				-    --gateway-mode disabled \
			
 
				-    --api-port 80
			
 
				-Restart=on-failure
			
 
				-RestartSec=5
			
 
				-
			
 
				-[Install]
			
 
				-WantedBy=multi-user.target
			
 
				-```
			
 
				-
			
 
				-```bash
			
 
				-systemctl daemon-reload
			
 
				-systemctl enable gpustack
			
 
				-systemctl start gpustack
			
 
				-systemctl status gpustack
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 九、常用参数说明
			
 
				-
			
 
				-| 参数 | 说明 | 默认值 |
			
 
				-|------|------|--------|
			
 
				-| `--database-url` | PostgreSQL 连接 URL | 内置 SQLite |
			
 
				-| `--gateway-mode` | Gateway 模式：`embedded`/`disabled` | `auto` |
			
 
				-| `--api-port` | API 服务端口 | `30080` |
			
 
				-| `--port` | Gateway 对外端口（embedded 模式） | `80` |
			
 
				-| `--debug` | 开启调试日志 | `false` |
			
 
				-| `--data-dir` | 数据目录 | `~/.local/share/gpustack` |
			
 
				-| `--bootstrap-password` | 初始管理员密码 | 随机生成 |
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 十、验证部署
			
 
				-
			
 
				-```bash
			
 
				-# 检查 API 是否正常
			
 
				-curl http://localhost:80/v2/users/me
			
 
				-
			
 
				-# 查看 API 文档
			
 
				-open http://localhost:80/docs
			
 
				-```
			
 
				-
			
 
				----
			
 
				-
			
 
				-## 注意事项
			
 
				-
			
 
				-1. **80 端口权限**：Linux 下监听 1024 以下端口需要 root 权限，或使用 `setcap`：
			
 
				-   ```bash
			
 
				-   setcap 'cap_net_bind_service=+ep' /opt/gpustack/.venv/bin/python3.11
			
 
				-   ```
			
 
				-
			
 
				-2. **Gateway 模式**：`embedded` 模式需要 Higress 组件（仅 Docker 镜像内置），裸机部署建议使用 `--gateway-mode disabled`，通过 Nginx 反向代理到 `api-port`。
			
 
				-
			
 
				-3. **Nginx 反向代理示例**：
			
 
				-   ```nginx
			
 
				-   server {
			
 
				-       listen 80;
			
 
				-       location / {
			
 
				-           proxy_pass http://127.0.0.1:30080;
			
 
				-           proxy_set_header Host $host;
			
 
				-           proxy_set_header X-Real-IP $remote_addr;
			
 
				-           proxy_read_timeout 300s;
			
 
				-       }
			
 
				-   }
			
 
				-   ```