GPUStack performs a compatibility check prior to model deployment. This check provides detailed information about the model’s compatibility with the current GPUStack environment. The following compatibility checks are performed:
Checks whether the selected inference backend is compatible with the current environment, including operating system, GPU, and architecture.
Determines whether the model is supported by the selected inference backend. This includes checking for supported model formats and architectures (e.g., LlamaForCausalLM, Qwen3ForCausalLM, etc.). This check is based on built-in inference backends and their supported models. If a custom backend or backend version is used, this check will be skipped.
Evaluates whether the model can be scheduled in the current environment. This includes verifying available resources such as RAM and VRAM, as well as configured scheduling rules.
Scheduling rules (including worker selectors, GPU selectors, and scheduling policies) are used to determine whether a model can be scheduled in the current environment.
The resource check ensures that sufficient system resources are available to deploy the model. GPUStack estimates the required resources and compares them with available resources in the environment. Estimations are performed using the following methods:
$$ \text{VRAM} = \text{WEIGHT_SIZE} \times 1.2 + \text{FRAMEWORK_FOOTPRINT} $$
WEIGHT_SIZE refers to the size of the model weights in bytes.FRAMEWORK_FOOTPRINT is a constant representing the framework’s memory overhead. For example, vLLM may use several gigabytes of VRAM for CUDA graphs.This formula provides a rough estimate and may not be accurate for all models. Typically, it reflects a lower-bound estimate of the required VRAM. If the estimation is insufficient, users can perform fine-grained scheduling by manually selecting workers and GPUs, or by adjusting advanced backend parameters. For instance, with vLLM, users can specify --tensor-parallel-size and --pipeline-parallel-size to control GPU allocation for the model.