GPUStack allows admins to configure inference backends and backend versions.
This article serves as an operational guide for the Inference Backend page. For supported built-in backends and their capabilities, see Built-in Inference Backends.
For guidelines for configuring custom backends and examples of custom backends that have been verified to work, see Custom Inference Backends.
GPUStack supports three types of inference backends:
| Parameter Name | Description | Required |
|---|---|---|
| Name | Inference backend name | Yes |
| Health Check Path | Health check path used to verify the backend is up and responding. Default: /v1/models (OpenAI-compatible). | No |
| Default Execution Command | Container startup command/args. For example (vLLM): vllm serve {{model_path}} --port {{port}} --served-model-name {{model_name}} --host {{worker_ip}}. The placeholders {{model_path}}, {{model_name}}, {{port}}, {{worker_ip}}, and {{VAR_NAME}} (for environment variables) are automatically substituted when the deployment is scheduled to a worker; after placeholder substitution, arguments are split using POSIX-style. Quote values with spaces and avoid shell operators. |
No |
| Default Entrypoint | Container entrypoint override. If set, it replaces the image entrypoint for this backend. Arguments are split using POSIX-style. | No |
| Default Environment Variables | Environment variables to set for all versions of this backend. Can be referenced in commands using {{VAR_NAME}} syntax. Version-specific environment variables take precedence. |
No |
| Default Backend Parameters | Pre-populate the Advanced Backend Parameters section during deployment; you can adjust them before launching | No |
| Description | Description | No |
| Version Configs | Configure available versions of this backend | Yes |
| Default Version | Preselected during deployment. If you don't choose a version, its image is used | No |
You can define environment variables at two levels:
Default Environment Variables): Applied to all versions of the backendEnvironment Variables): Specific to a version, overrides backend-level variablesEnvironment variables can be referenced in commands using {{VAR_NAME}} syntax.
Example:
backend_name: my-backend-custom
default_env:
MODEL_CACHE_DIR: /cache
LOG_LEVEL: info
version_configs:
v1:
image_name: my-image:v1
custom_framework: cuda
run_command: "serve {{model_path}} --cache {{MODEL_CACHE_DIR}} --log-level {{LOG_LEVEL}} --port {{port}}"
env:
MODEL_CACHE_DIR: /custom-cache # Overrides backend-level value
In this example:
MODEL_CACHE_DIR is set to /cache at the backend levelv1 overrides it to /custom-cacheLOG_LEVEL remains info for all versions{{VAR_NAME}} syntaxVersion Configs parameter description:
| Parameter Name | Description | Required |
|---|---|---|
| Version | Version name shown in the Backend Version dropdown during deployment | Yes |
| Image Name | Container image name for the backend (e.g., ghcr.io/org/image:tag) |
Yes |
| Framework (custom_framework) |
Backend framework (internal identifier: custom_framework). Deployment and scheduling are filtered by supported frameworks |
Yes |
| Environment Variables | Environment variables specific to this version. Overrides backend-level Default Environment Variables. Can be referenced in Execution Command using {{VAR_NAME}} syntax. |
No |
| Entrypoint | Version-specific container entrypoint override. If omitted, Default Entrypoint is used. Arguments are split using POSIX-style. |
No |
| Execution Command | Version-specific startup command. If omitted, the Default Execution Command is used. Parsing and splitting rules are identical to Default Execution Command. |
No |
There are two ways to add a custom inference backend:
These are essentially custom backends with a "community" source label, allowing you to quickly create custom backends without manual configuration.
0.19.0vllm/vllm-openai:v0.19.0cudavllm serve{{model_path}} --host {{worker_ip}} --port {{port}} --served-model-name {{model_name}}!!! note
vLLM has changed the entrypoint of its Docker image since v0.11.1. Therefore, when adding a custom version for vLLM v0.11.1 or later, you must specify the `Override Image Entrypoint` and `Execution Command` field; otherwise, the model will fail to start. If you use newer versions of `gpustack/runner` images, you don't need to set the `Execution Command` field.
0.5.10lmsysorg/sglang:v0.5.10cudasglang serve--model-path {{model_path}} --host {{worker_ip}} --port {{port}}On the Inference Backend page, click anywhere on the backend card (except the action buttons) to open a modal where you can browse all built-in and custom-added versions.
Use this mode to quickly verify or tweak the image and startup command without editing the backend definition.
image_name and run_command. These override the backend configuration for this deployment only.Custom inference backends do not support distributed inference across multiple workers.