CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

GPUStack is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control.

Tech stack: Python 3.10–3.12, FastAPI, SQLModel, Pydantic, uv (package manager), hatchling (build), Alembic (migrations), pytest, Higress (API gateway).

Code Architecture

gpustack/
├── api/            # REST API layer (auth, middlewares, tenant, OpenAI extensions)
├── client/         # Generated + custom HTTP clients for server/worker communication
├── cloud_providers/ # Cloud provider integrations (DigitalOcean, etc.)
├── cmd/            # CLI subcommands (version, db migration, admin reset, etc.)
├── codegen/        # OpenAPI client code generation
├── config/         # Configuration and registration logic
├── detectors/      # GPU/device detection (fastfetch, runtime, custom)
├── envs/           # Environment variable management
├── exporter/       # Prometheus metrics exporting
├── gateway/        # Higress AI gateway integration (routing, plugins, k8s CRDs)
├── http_proxy/     # Load balancing and proxy strategies
├── k8s/            # Kubernetes manifest templates
├── migrations/     # Alembic database migrations
├── mixins/         # SQLAlchemy mixins (active record, timestamps)
├── policies/       # Scheduling policies (resource fit selectors for various backends)
├── routes/         # HTTP route handlers
├── schemas/        # Database models / SQLModel schemas
├── server/         # Server components (scheduler, controllers, API server)
├── worker/         # Worker components (runtime, serving manager, metric exporter)
├── websocket_proxy/ # WebSocket proxying
├── main.py         # Entry point (`gpustack` CLI command)
└── security.py     # Security utilities

Key components:

Server: API Server (FastAPI) + Scheduler + Controllers. Handles model instance assignment and resource state management.
Worker: GPUStack Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes.
AI Gateway: Uses Higress for API routing and load balancing.
Database: Embedded PostgreSQL by default; external PostgreSQL/MySQL supported. Alembic for migrations under gpustack/migrations/.

Commands

Prerequisites

Python 3.10–3.12
uv package manager (auto-installed via make install)
A database (PostgreSQL or MySQL) for development

Development Commands

Command	Description
`make install`	Install uv, sync dependencies, setup pre-commit hooks
`make deps`	Sync and lock dependencies with uv
`make generate`	Generate code (OpenAPI client, etc.)
`make lint`	Run pre-commit checks (flake8, black, etc.)
`make test`	Run pytest
`make build`	Build wheel package (outputs to `dist/`)
`make build-docs`	Build documentation (Linux/macOS only)
`make serve-docs`	Serve documentation locally (Linux/macOS only)
`make package`	Build container images (Linux/macOS only)
`make ci`	Full CI pipeline: install → deps → lint → test → build

Running Locally

# Start in disabled gateway mode for development
uv run gpustack start --database-url postgresql://postgres:mysecretpassword@localhost:5432/postgres --gateway-mode disabled --api-port 80

Adding Dependencies

uv add <package>          # runtime dependency
uv add --dev <package>    # dev/test dependency

Running a Single Test

uv run pytest tests/path/to/test_file.py -k test_name

Important Notes

The project uses uv for dependency management (not pip directly). pyproject.toml is the source of truth.
Database migrations live in gpustack/migrations/versions/. Use Alembic for schema changes.
The UI is downloaded at install time from a CDN — not committed to the repo.
Windows support exists via hack/windows/*.ps1 scripts, but worker nodes require Linux.
Community inference backends are pulled from gpustack/community-inference-backends repo during make install.

CLAUDE.md 4.3 KB História Raw