# Running Inference on CPUs MaaS-Base supports inference on CPUs, offering flexibility when GPU resources are limited or when model sizes exceed allocatable GPU memory. The following CPU inference modes are available: - **Hybrid CPU+GPU Inference**: Enables partial acceleration by offloading portions of large models to the CPU when VRAM capacity is insufficient. - **Full CPU Inference**: Runs entirely on CPU when no GPU resources are available. !!! note Available for custom backends only. When CPU offloading is enabled, MaaS-Base will allocate CPU memory if GPU resources are insufficient. You must correctly configure the inference backend to use hybrid CPU+GPU or full CPU inference. It is strongly recommended to use CPU inference only on CPU workers. For example, to deploy a model with CPU inference by [Text Embeddings Inference](https://huggingface.co/docs/text-embeddings-inference/index), follow the configuration below: Source: `HuggingFace` Repo ID: `BAAI/bge-large-en-v1.5` Backend: `Custom` Image Name: `ghcr.io/huggingface/text-embeddings-inference:cpu-1.8` Execution Command: `--model-id BAAI/bge-large-en-v1.5 --huggingface-hub-cache /var/lib/gpustack/cache/huggingface --port {{port}}` !!! note `TEI (Text Embeddings Inference)` only supports deploying models from `HuggingFace`. `ghcr.io/huggingface/text-embeddings-inference:cpu-1.8` is the CPU inference image for TEI. See: [TEI Supported Hardware](https://huggingface.co/docs/text-embeddings-inference/supported_models#supported-hardware). `--huggingface-hub-cache /var/lib/gpustack/cache/huggingface` sets the location of the HuggingFace Hub cache for TEI to the path where MaaS-Base stores downloaded HuggingFace models. The default path is `/var/lib/gpustack/cache/huggingface`. See: [TEI CLI Arguments](https://huggingface.co/docs/text-embeddings-inference/cli_arguments). `{{port}}` is a placeholder that represents the port automatically assigned by MaaS-Base. ![TEI CPU Inference](../assets/tutorials/inference-on-cpus/tei-cpu-inference.png) And in Advanced Settings, check `Allow CPU Offloading`: ![Allow CPU Offloading](../assets/tutorials/inference-on-cpus/allow-cpu-offloading.png) If you need to access non-OpenAI-compatible APIs, you can also check `Enable Generic Proxy`. For more details, see [Enable Generic Proxy](../user-guide/model-deployment-management.md#enable-generic-proxy).