This guide introduces how to use model routes, covering several common use cases and their configuration methods.
When deploying models, the Enable Model Route checkbox is automatically enabled. It will create a model route for this deployment with the same name. This allows users to access the model directly using the same name after deployment.
When a new version of a model is released, the administrator may want to upgrade the model while keeping the same model name. In this case, the administrator can deploy a new version of the model with the model route disabled, and switch traffic by editing the existing model route.
Enable Model Route option in the model deploy drawer.Routes page.When the request volume for a self-hosted model increases, latency may occur. If there are no resources available for scaling up, introducing Public MaaS is an effective solution. By configuring both the deployment model target and the provider’s model target in the model route and assigning weights, you can use Public MaaS services to help handle the current model’s access load.
Providers page.Routes page and locate the model to edit.Deployments and models from Providers can be selected as targets in the same route.Although assigning a Public MaaS model target to a model route is a convenient approach, it can also incur significant costs. The traffic distribution rules are always in effect, so even when the self-hosted model is not under heavy load, traffic will still be forwarded to Public MaaS according to the configuration. In such cases, using the Model Route Fallback feature can be very effective.
Routes page and locate the model route you want to set a fallback for.Fallback Route Target. Like other route targets, it can be a model from GPUStack Deployments or from Providers.If a running inference service (such as ollama or lm-studio) wants to use GPUStack for proxying, access control, and token usage statistics, you can create a custom-path OpenAI Model Provider for hosting.
Providers page and click the Add Provider button.OpenAI as the type and set the Custom Base URL in the form of http://<ip>:<port>/v1 for your OpenAI-compatible inference server. Set the name, API key, and description as needed./v1/models API.Save button.Add Route in the Operations column for this provider.Save button to apply the model route. Your model is now proxied by GPUStack.Access Setting in the Operations column.