# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview MaaS-Base is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control. **Tech stack:** Python 3.10–3.12, FastAPI, SQLModel, Pydantic, uv (package manager), hatchling (build), Alembic (migrations), pytest, Higress (API gateway). ## Code Architecture ``` gpustack/ ├── api/ # REST API layer (auth, middlewares, tenant, OpenAI extensions) ├── client/ # Generated + custom HTTP clients for server/worker communication ├── cloud_providers/ # Cloud provider integrations (DigitalOcean, etc.) ├── cmd/ # CLI subcommands (version, db migration, admin reset, etc.) ├── codegen/ # OpenAPI client code generation ├── config/ # Configuration and registration logic ├── detectors/ # GPU/device detection (fastfetch, runtime, custom) ├── envs/ # Environment variable management ├── exporter/ # Prometheus metrics exporting ├── gateway/ # Higress AI gateway integration (routing, plugins, k8s CRDs) ├── http_proxy/ # Load balancing and proxy strategies ├── k8s/ # Kubernetes manifest templates ├── migrations/ # Alembic database migrations ├── mixins/ # SQLAlchemy mixins (active record, timestamps) ├── policies/ # Scheduling policies (resource fit selectors for various backends) ├── routes/ # HTTP route handlers ├── schemas/ # Database models / SQLModel schemas ├── server/ # Server components (scheduler, controllers, API server) ├── worker/ # Worker components (runtime, serving manager, metric exporter) ├── websocket_proxy/ # WebSocket proxying ├── main.py # Entry point (`gpustack` CLI command) └── security.py # Security utilities ``` **Key components:** - **Server:** API Server (FastAPI) + Scheduler + Controllers. Handles model instance assignment and resource state management. - **Worker:** MaaS-Base Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes. - **AI Gateway:** Uses Higress for API routing and load balancing. - **Database:** Embedded PostgreSQL by default; external PostgreSQL/MySQL supported. Alembic for migrations under `gpustack/migrations/`. ## Commands ### Prerequisites - Python 3.10–3.12 - `uv` package manager (auto-installed via `make install`) - A database (PostgreSQL or MySQL) for development ### Development Commands | Command | Description | |---------|-------------| | `make install` | Install uv, sync dependencies, setup pre-commit hooks | | `make deps` | Sync and lock dependencies with uv | | `make generate` | Generate code (OpenAPI client, etc.) | | `make lint` | Run pre-commit checks (flake8, black, etc.) | | `make test` | Run pytest | | `make build` | Build wheel package (outputs to `dist/`) | | `make build-docs` | Build documentation (Linux/macOS only) | | `make serve-docs` | Serve documentation locally (Linux/macOS only) | | `make package` | Build container images (Linux/macOS only) | | `make ci` | Full CI pipeline: install → deps → lint → test → build | ### Running Locally ```bash # Start in disabled gateway mode for development uv run gpustack start --database-url postgresql://postgres:mysecretpassword@localhost:5432/postgres --gateway-mode disabled --api-port 80 ``` ### Adding Dependencies ```bash uv add # runtime dependency uv add --dev # dev/test dependency ``` ### Running a Single Test ```bash uv run pytest tests/path/to/test_file.py -k test_name ``` ## Important Notes - The project uses `uv` for dependency management (not pip directly). `pyproject.toml` is the source of truth. - Database migrations live in `gpustack/migrations/versions/`. Use Alembic for schema changes. - The UI is downloaded at install time from a CDN — not committed to the repo. - Windows support exists via `hack/windows/*.ps1` scripts, but worker nodes require Linux. - Community inference backends are pulled from `gpustack/community-inference-backends` repo during `make install`.