Vision Language Models can process both visual (image) and language (text) data simultaneously, making them versatile tools for various applications, such as image captioning, visual question answering, and more. In this guide, you will learn how to deploy and interact with Vision Language Models (VLMs) in GPUStack.
The procedure for deploying and interacting with these models in GPUStack is similar. The main difference is the parameters you need to set when deploying the models. For more information on the parameters you can set, please refer to Backend Parameters .
In this guide, we will cover the deployment of the following models:
Before you begin, ensure that you have the following:
!!! note
An Ubuntu node equipped with one H100 (80GB) GPU is used throughout this guide.
Please follow the Installation Documentation to install GPUStack.
After the server starts, run the following command to get the default admin password:
docker exec gpustack cat /var/lib/gpustack/initial_admin_password
Open your browser and navigate to http://your_host_ip to access the GPUStack UI. Use the default username admin and the password you retrieved above to log in.
Deployments page in the GPUStack UI.Deploy Model button, then select Hugging Face in the dropdown.Qwen/Qwen3-VL-4B-Instruct in the search bar.Save button. The default configurations should work as long as you have enough GPU resources.Deployments page in the GPUStack UI.Deploy Model button, then select Hugging Face in the dropdown.meta-llama/Llama-3.2-11B-Vision-Instruct in the search bar.Advanced section in configurations and scroll down to the Backend Parameters section.Add Parameter button multiple times and add the following parameters:--enforce-eager--max-num-seqs=16--max-model-len=8192Save button.Deployments page in the GPUStack UI.Deploy Model button, then select Hugging Face in the dropdown.mistralai/Pixtral-12B-2409 in the search bar.Advanced section in configurations and scroll down to the Backend Parameters section.Add Parameter button multiple times and add the following parameters:--tokenizer-mode=mistral--limit-mm-per-prompt=image=4Save button.Deployments page in the GPUStack UI.Deploy Model button, then select Hugging Face in the dropdown.microsoft/Phi-3.5-vision-instruct in the search bar.Advanced section in configurations and scroll down to the Backend Parameters section.Add Parameter button and add the following parameter:--trust-remote-codeSave button.Chat page in the GPUStack UI.Upload Image button above the input text area and upload an image.Submit button to generate the output.