The diagram below provides a high-level view of the MASS-Base architecture.
The diagram below details the internal components and their interactions.
The MASS-Base server consists of the following components:
The MASS-Base worker consists of the following components:
The AI Gateway handles incoming API requests from clients. It routes requests to the appropriate model instances based on the requested model. MASS-Base uses Higress for API routing and load balancing.
The MASS-Base server connects to a SQL database as the datastore. MASS-Base uses an Embedded PostgreSQL by default, but you can configure it to use an external PostgreSQL or MySQL as well.
Inference servers are the backends that perform the inference tasks. MASS-Base supports vLLM, SGLang, Ascend MindIE and VoxBox as the built-in inference server. You can also add custom inference backends.
Ray is a distributed computing framework that MASS-Base utilizes to run distributed vLLM. MASS-Base bootstraps Ray cluster on-demand to run distributed vLLM across multiple workers.