Large Language Models (LLMs) are powerful AI models capable of understanding and generating human-like text, making them essential for applications such as chatbots, content generation, code completion, and more.
In this guide, you will learn how to deploy and interact with LLMs in GPUStack.
Before you begin, ensure that you have the following:
Large language models in the catalog are marked with the LLM category. When you select a large language model from the catalog, the default configurations should work as long as you have enough GPU resources and the backend is compatible with your setup (e.g., vLLM backend requires an amd64 Linux worker).
Here, we take the deployment of Qwen3 0.6B as an example.
Follow these steps to deploy the model from Catalog:
Catalog page in the GPUStack UI.LLM.Qwen3 0.6B from the catalog.Save button to deploy the model.After deployment, you can monitor the model deployment's status on the Deployments page and wait for it to start running.
Playground > Chat page in the GPUStack UI.Model dropdown.Parameters on the right based on your needs.Submit button to generate the text.The generated chain of thought and result will be displayed in the UI.
By following these steps, you can leverage LLMs for AI-powered text generation and natural language tasks in GPUStack. Experiment with different prompts and settings to explore the full capabilities of LLMs!