# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

MaaS-Base is an open-source GPU cluster manager for AI model deployment. It orchestrates inference engines (vLLM, SGLang, TensorRT-LLM, etc.) across GPU clusters, providing multi-cluster management, load balancing, monitoring, and access control.

**Tech stack:** Python 3.10–3.12, FastAPI, SQLModel, Pydantic, uv (package manager), hatchling (build), Alembic (migrations), pytest, Higress (API gateway).

## Code Architecture

```
gpustack/
├── api/            # REST API layer (auth, middlewares, tenant, OpenAI extensions)
├── client/         # Generated + custom HTTP clients for server/worker communication
├── cloud_providers/ # Cloud provider integrations (DigitalOcean, etc.)
├── cmd/            # CLI subcommands (version, db migration, admin reset, etc.)
├── codegen/        # OpenAPI client code generation
├── config/         # Configuration and registration logic
├── detectors/      # GPU/device detection (fastfetch, runtime, custom)
├── envs/           # Environment variable management
├── exporter/       # Prometheus metrics exporting
├── gateway/        # Higress AI gateway integration (routing, plugins, k8s CRDs)
├── http_proxy/     # Load balancing and proxy strategies
├── k8s/            # Kubernetes manifest templates
├── migrations/     # Alembic database migrations
├── mixins/         # SQLAlchemy mixins (active record, timestamps)
├── policies/       # Scheduling policies (resource fit selectors for various backends)
├── routes/         # HTTP route handlers
├── schemas/        # Database models / SQLModel schemas
├── server/         # Server components (scheduler, controllers, API server)
├── worker/         # Worker components (runtime, serving manager, metric exporter)
├── websocket_proxy/ # WebSocket proxying
├── main.py         # Entry point (`gpustack` CLI command)
└── security.py     # Security utilities
```

**Key components:**
- **Server:** API Server (FastAPI) + Scheduler + Controllers. Handles model instance assignment and resource state management.
- **Worker:** MaaS-Base Runtime + Serving Manager + Metric Exporter. Manages model instance lifecycle on GPU nodes.
- **AI Gateway:** Uses Higress for API routing and load balancing.
- **Database:** Embedded PostgreSQL by default; external PostgreSQL/MySQL supported. Alembic for migrations under `gpustack/migrations/`.

## Commands

### Prerequisites

- Python 3.10–3.12
- `uv` package manager (auto-installed via `make install`)
- A database (PostgreSQL or MySQL) for development

### Development Commands

| Command | Description |
|---------|-------------|
| `make install` | Install uv, sync dependencies, setup pre-commit hooks |
| `make deps` | Sync and lock dependencies with uv |
| `make generate` | Generate code (OpenAPI client, etc.) |
| `make lint` | Run pre-commit checks (flake8, black, etc.) |
| `make test` | Run pytest |
| `make build` | Build wheel package (outputs to `dist/`) |
| `make build-docs` | Build documentation (Linux/macOS only) |
| `make serve-docs` | Serve documentation locally (Linux/macOS only) |
| `make package` | Build container images (Linux/macOS only) |
| `make ci` | Full CI pipeline: install → deps → lint → test → build |

### Running Locally

```bash
# Start in disabled gateway mode for development
uv run gpustack start --database-url postgresql://postgres:mysecretpassword@localhost:5432/postgres --gateway-mode disabled --api-port 80
```

### Adding Dependencies

```bash
uv add <package>          # runtime dependency
uv add --dev <package>    # dev/test dependency
```

### Running a Single Test

```bash
uv run pytest tests/path/to/test_file.py -k test_name
```

## Important Notes

- The project uses `uv` for dependency management (not pip directly). `pyproject.toml` is the source of truth.
- Database migrations live in `gpustack/migrations/versions/`. Use Alembic for schema changes.
- The UI is downloaded at install time from a CDN — not committed to the repo.
- Windows support exists via `hack/windows/*.ps1` scripts, but worker nodes require Linux.
- Community inference backends are pulled from `gpustack/community-inference-backends` repo during `make install`.