Models-as-a-Service (MaaS) provides tier-based governance for large language model serving in Open Data Hub. MaaS enables administrators to define service tiers with different rate limits and quotas, control access through API key authentication, and track resource consumption for cost allocation. Users can discover available models, generate authentication credentials, and integrate models into their applications using OpenAI-compatible APIs.
Deploy and manage Models-as-a-Service
You can deploy Models-as-a-Service (MaaS) to provide tier-based governance for large language model serving. MaaS enables you to define service tiers with different rate limits and quotas, control access through API key authentication, and track resource consumption for cost allocation.
Models-as-a-Service overview
You can deploy Models-as-a-Service (MaaS) in Open Data Hub to provide tier-based governance for large language model (LLM) serving across your organization. This platform helps you manage resource consumption and governance challenges when you serve models to a large user base.
With MaaS, you can perform the following tasks:
-
Define service tiers with different rate limits and quotas.
-
Control access through API key authentication.
-
Track resource consumption for cost allocation.
As an administrator, you can use this tier-based system to expose models through managed API endpoints. This structure allows you to enforce different consumption policies for different user groups and deliver AI models as shared resources with appropriate access levels.
The Models-as-a-Service platform acts as a governance layer between users and model serving infrastructure. When users request access to a model, MaaS authenticates the user and determines their assigned tier based on group membership. Policy enforcement checks rate limits and quotas for that tier. Authorized requests are forwarded to the appropriate model endpoint, and usage metrics are recorded for tracking and cost allocation purposes. This architecture enables centralized policy enforcement without modifying the underlying model serving infrastructure.
MaaS includes the following capabilities:
- Tier-based access control
-
Define multiple service tiers, such as free, premium, or enterprise, with different rate limits, quotas, and model access permissions for different user groups.
- Policy and quota management
-
Enforce rate limiting and quota policies to prevent resource exhaustion.
- API key authentication
-
Control access to models through tier-based resource allocation and API key-based authentication.
- Usage tracking
-
Track consumption metrics for cost allocation and billing.
- Observability
-
Monitor model access patterns, policy enforcement, and resource utilization.
The following table compares use cases for MaaS and standard model serving to help you choose the right approach.
| MaaS | Standalone model serving |
|---|---|
Centralized governance across multiple teams or projects is required. |
You are deploying models for single-team or single-user use cases. |
You need rate limiting, quota enforcement, and usage tracking for cost control. |
You are prototyping or developing models in a single-user environment where governance overhead is unnecessary. |
You prefer declarative configuration management via GitOps. |
Simplified deployment is preferred over centralized control. |
MaaS administration is divided into initial configuration and ongoing management, with distinct responsibilities for cluster administrators and Open Data Hub administrators.
| Phase | Cluster administrators | Open Data Hub administrators |
|---|---|---|
Initial configuration |
|
|
Ongoing operations |
|
|
Prerequisites for installing MaaS
Before installing Models-as-a-Service, verify that your cluster has the required platform components, operators, and infrastructure resources.
MaaS dependencies
MaaS requires several platform components to function:
-
KServe provides the underlying model serving infrastructure that hosts the large language models.
-
Kuadrant, which includes Authorino and Limitador, enforces authentication, authorization, and rate limiting policies.
-
Gateway API routes traffic to models and applies policy enforcement at the network level.
|
Note
|
Deploying large language models may require additional dependencies based on the model size and serving runtime. For comprehensive model serving infrastructure requirements, see Component requirements. |
|
Note
|
To access MaaS through the dashboard interface:
|
Requirements
Before you install MaaS, ensure that your environment meets the following requirements:
Platform and access requirements:
-
You have an OpenShift Container Platform cluster version 4.19.9 or later.
-
You have cluster administrator access to install operators and create cluster-scoped resources.
-
You have installed the OpenShift CLI,
oc.
Operator requirements:
-
You have installed Kuadrant Operator v1.3 or later:
-
You have created the
kuadrant-systemnamespace and OperatorGroup. -
You have configured the
ISTIO_GATEWAY_CONTROLLER_NAMEenvironment variable to"openshift.io/gateway-controller/v1". -
You have created a
Kuadrantcustom resource in thekuadrant-systemnamespace with ready status.
-
Open Data Hub requirements:
-
You have installed the
opendatahub-operator. -
You have initialized the platform with a
DSCInitializationresource. -
You have created a
DataScienceClusterresource with thekservecomponent set toManaged. -
You have created a
GatewayClassresource configured for the OpenShift Gateway Controller (openshift.io/gateway-controller) and a Gateway namedmaas-default-gatewayin theopenshift-ingressnamespace. For information about creating Gateway API resources, see Enabling the Gateway API.
Prerequisite verification
To verify that your environment meets these requirements:
-
Verify that you have cluster administrator access:
$ oc auth can-i create clusterroleExample outputyes
-
Verify that KServe is installed and managed:
$ oc get datasciencecluster -o jsonpath='{.items[0].spec.components.kserve.managementState}'Example outputManaged
-
Verify that Kuadrant is running:
$ oc get pods -n kuadrant-systemExample outputNAME READY STATUS RESTARTS AGE authorino-operator-controller-manager-xxx 2/2 Running 0 5m kuadrant-operator-controller-manager-xxx 1/1 Running 0 5m limitador-limitador-xxx 1/1 Running 0 5m
-
Verify that the Gateway exists:
$ oc get gateway maas-default-gateway -n openshift-ingressExample outputNAME CLASS ADDRESS PROGRAMMED AGE maas-default-gateway openshift-gateway-controller 192.168.1.100 True 10m
Enable the models-as-a-service component
You can enable Models-as-a-Service (MaaS) in your Open Data Hub deployment by updating the modelsAsService component in the DataScienceCluster custom resource (CR).
Enabling the Models-as-a-Service component deploys the following resources in your cluster:
-
The
maas-apiservice that handles authentication, authorization, and model discovery -
Default tier configuration with free, premium, and enterprise tiers
-
ServiceAccounts for token issuance and policy enforcement
-
RBAC roles for model access control
-
You have cluster administrator privileges for your OpenShift Container Platform cluster.
-
Log in to the OpenShift Container Platform console as a cluster administrator.
-
Navigate to Operators → Installed Operators, and then click the Open Data Hub Operator.
-
Click the Data Science Cluster tab.
-
On the DataScienceClusters page, click your DataScienceCluster object, commonly named
default-dsc. -
Click the YAML tab.
-
In the
spec.componentssection, locate thekservecomponent and add or update themodelsAsServiceconfiguration:# ... existing configuration ... spec: components: kserve: managementState: Managed modelsAsService: managementState: Managed # ... remaining configuration ...where:
modelsAsService-
Specifies the MaaS component nested under the kserve component.
managementState: Managed-
Specifies that MaaS functionality is enabled.
-
Click Save.
Verify that the MaaS component is running:
-
Verify that the
maas-apipod is running:$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-apiExample outputNAME READY STATUS RESTARTS AGE maas-api-7f8c9d4b5-xyz12 1/1 Running 0 3m
-
Verify that the tier configuration ConfigMap exists:
$ oc get configmap tier-to-group-mapping -n opendatahubExample outputNAME DATA AGE tier-to-group-mapping 1 3m
Enable the Models-as-a-Service dashboard interface
To view the Models-as-a-Service (MaaS) dashboard interface in Open Data Hub, you must enable the Gen AI studio menu item and the MaaS feature in the OdhDashboardConfig custom resource (CR). The MaaS dashboard interface appears in the AI asset endpoints page within Gen AI studio.
The MaaS dashboard interface provides a user interface for:
-
Viewing available models and their status
-
Generating authentication tokens for model access
-
Creating and managing named API keys
-
Viewing your assigned tier and rate limits
-
Accessing API documentation and examples
-
You have cluster administrator privileges for your OpenShift Container Platform cluster.
-
You have Open Data Hub administrator privileges.
-
You have enabled the Llama Stack Operator on the OpenShift cluster by setting its
managementStatefield toManagedin theDataScienceClustercustom resource (CR) of the Open Data Hub Operator. For more information, see Activating the Llama Stack Operator. -
You have enabled the MaaS component.
-
Log in to the OpenShift Container Platform console as a cluster administrator.
-
In the OpenShift console, navigate to Home → API Explorer.
-
In the search bar, enter
OdhDashboardConfigto filter by kind. -
Click the OdhDashboardConfig custom resource to open the resource details page.
-
From the Project list, select the Open Data Hub applications namespace (typically
opendatahub). -
Click the Instances tab.
-
Click the odh-dashboard-config instance to open the details page.
-
Click the YAML tab.
-
In the
spec.dashboardConfigsection, add or update themodelAsServiceandgenAiStudioconfiguration:# ... existing configuration ... spec: dashboardConfig: genAiStudio: true modelAsService: true # ... remaining configuration ...where:
genAiStudio: true-
Specifies that Open Data Hub displays the Gen AI studio menu item in the dashboard.
modelAsService: true-
Specifies that Open Data Hub displays the MaaS dashboard interface in the Gen AI studio menu item.
-
Click Save.
The configuration is automatically applied to the dashboard pods.
Verify that the MaaS dashboard interface is enabled and accessible:
-
Verify that both features are enabled in the OdhDashboardConfig:
$ oc get odhdashboardconfig odh-dashboard-config -n opendatahub -o jsonpath='{.spec.dashboardConfig.genAiStudio}{"\n"}{.spec.dashboardConfig.modelAsService}'Example outputtrue true
-
Verify that the dashboard pods have reloaded with the new configuration:
$ oc get pods -n opendatahub | grep dashboardCheck that the pods are running and have been recently restarted.
-
Log in to the Open Data Hub dashboard.
-
In the left navigation menu, click Gen AI studio → AI asset endpoints.
-
Verify that the Models as a service tab is displayed.
The MaaS tab provides access to model discovery, token generation, and API key management.
Models-as-a-Service tiers
The Models-as-a-Service (MaaS) platform uses a tier-based model to manage access control and rate limiting for AI model serving. Tiers allow platform administrators to define different levels of service and control model access based on user group membership.
Tier-based access control
When multiple teams share large language models, tiers enable you to:
-
Prevent resource exhaustion from unlimited usage
-
Provide different service levels for different user groups
-
Track and allocate costs based on team consumption
-
Control which teams can access expensive or sensitive models
MaaS assigns users to tiers based on their OpenShift group membership. When a user belongs to multiple groups, the system assigns the tier with the highest level number.
Default tier configurations
The MaaS platform includes three default configurable tiers. Each tier has a dedicated namespace following the pattern {instance-name}-tier-{tier-name}.
| Tier | Description | Level | Default group | Namespace |
|---|---|---|---|---|
Free |
Entry-level tier for basic users |
0 (Lowest) |
|
|
Premium |
Intermediate tier for higher throughput |
1 |
|
|
Enterprise |
Top tier for critical workloads |
2 (Highest) |
|
|
All tiers are fully configurable. You can modify the default tiers or create additional custom tiers to meet your organization’s needs.
|
Note
|
Tier name values are case-sensitive and must match exactly with rate limit policy predicates. |
Tier configuration
MaaS tiers are defined in the tier-to-group-mapping ConfigMap in the opendatahub namespace. A tier configuration includes the following properties:
- Name
-
A unique identifier for the tier, such as
free,premium, orenterprise. - Display name
-
A human-readable label shown in the dashboard.
- Level
-
A numeric hierarchy that determines tier precedence. Higher numbers indicate higher priority.
- Groups
-
A list of OpenShift groups whose members are assigned to this tier.
Access control
Administrators control which tiers can access specific models by configuring RBAC permissions for the tier’s namespace.
When a user belongs to multiple groups across different tiers, MaaS assigns them to the tier with the highest level number. Users can only access models made available to their assigned tier and cannot access models in lower-level tiers, even if they belong to groups associated with those lower tiers.
For example, if a user belongs to both the premium-users group (tier level 1) and the enterprise-users group (tier level 2), they are assigned to the enterprise tier and can only access models configured for the enterprise tier. They cannot access models restricted to the premium or free tiers.
Rate limiting and quotas
Configure rate limiting and quota policies when you create or edit a tier through the dashboard. These policies include:
- Request rate limits
-
Maximum number of API requests allowed per time period, such as 1000 requests per minute.
- Token consumption limits
-
Maximum number of tokens that can be consumed per time period, such as 100,000 tokens per minute.
- Quota policies
-
Optional daily or monthly limits on total usage.
Kuadrant’s Limitador component enforces these limits based on the user’s assigned tier.
Manage Models-as-a-Service tiers
You can create and manage Models-as-a-Service (MaaS) tiers to control access levels, rate limits, and quotas for different user groups.
-
You have cluster administrator privileges for your OpenShift Container Platform cluster.
-
You have enabled the Models-as-a-Service component.
-
You have enabled the Models-as-a-Service dashboard interface.
-
You have access to the Open Data Hub dashboard with administrator privileges.
View tier configurations
Review all configured tiers and their settings to understand the current tier hierarchy and assignments.
-
You have logged in to Open Data Hub.
-
You have administrator access to the Open Data Hub dashboard.
-
In the left navigation menu, click Settings.
-
Click Tiers.
The tiers list displays all configured tiers with the following information:
-
Tier name
-
Assigned groups
-
Priority level
-
Rate limit status
-
-
To view detailed information about a specific tier, click the tier name.
The tier details page shows:
-
Complete rate limit configuration (token and request limits)
-
All assigned groups
-
Description and metadata
-
Resource name
-
-
Verify that the tiers list displays all configured tiers in your system.
-
Click a tier name and verify that you can view its complete configuration including groups, rate limits, and level.
Create a tier
Create a new tier through the Open Data Hub dashboard to define service levels, resource limits, and group assignments.
-
You have logged in to Open Data Hub.
-
You have administrator access to the Open Data Hub dashboard.
-
In the left navigation menu, click Settings.
-
Click Tiers to view existing tiers.
-
Click Create tier.
-
In the Create tier form, configure the tier:
-
Name: Enter a descriptive name for the tier that will be displayed to users, such as
Production TierorData Science Team. -
Optional: Click Edit resource name to set a custom internal identifier for the tier. If not specified, the resource name is generated automatically from the display name. The resource name must use lowercase letters, numbers, and hyphens only.
-
Optional: Description: Provide a brief description of the tier’s purpose and intended users.
-
Level: Set the tier priority level using a numeric value. Higher numbers indicate higher priority. When a user belongs to multiple groups, they are assigned the tier with the highest level value. For example, use 0 for the lowest tier, 1 for medium, and 2 for the highest.
-
Groups: Select or create the OpenShift groups that should be assigned to this tier. Users who are members of these groups will be automatically assigned to this tier.
-
-
Configure rate limits:
-
To limit token consumption, select Enforce token rate limit and click Add token rate limit. Enter the number of tokens allowed, enter the time period, and select the time unit. For example, set
100000tokens per1minutefor production workloads. -
To limit API requests, select Enforce request rate limit and click Add request rate limit. Enter the number of requests allowed, enter the time period, and select the time unit. For example, set
1000requests per1minutefor production workloads.TipYou can configure multiple rate limits with different time windows by clicking Add token rate limit or Add request rate limit again.
-
-
Click Create tier.
-
Verify that the new tier appears in the tiers list with the correct configuration.
-
Click the tier name to view its details and confirm the groups, rate limits, and level are set correctly.
Edit a tier
Modify tier properties, rate limits, and group assignments through the dashboard.
-
You have logged in to Open Data Hub.
-
You have administrator access to the Open Data Hub dashboard.
-
In the left navigation menu, click Settings.
-
Click Tiers.
-
Locate the tier you want to modify and click its name.
-
Click Edit tier.
-
Modify the tier properties:
-
Update the name or description if needed.
-
Adjust the tier level.
-
Add or remove groups.
-
Modify rate limits by editing existing limits or adding new ones.
ImportantYou cannot change the resource name after the tier is created. If you need a different resource name, delete the tier and create a new one.
-
-
Click Update tier.
WarningChanging rate limits or group assignments takes effect immediately. Users may experience access changes or throttling based on the new configuration.
-
Verify that the tier shows the updated configuration in the tiers list.
-
If you changed group assignments, confirm that users in the new groups can access models available to this tier.
Delete a tier
Delete a tier through the dashboard when it is no longer needed.
-
You have logged in to Open Data Hub.
-
You have administrator access to the Open Data Hub dashboard.
-
In the left navigation menu, click Settings.
-
Click Tiers.
-
Locate the tier you want to delete and click its name.
-
Click Actions → Delete tier.
-
In the confirmation dialog, review the impact of deletion:
-
Users assigned to this tier will lose their current tier assignment.
-
Active API keys and tokens scoped to this tier may stop working.
-
-
Type the tier name to confirm deletion.
-
Click Delete.
Do not delete all tiers from your system. If you delete the last remaining tier, the dashboard will not display the "Add tier" button, preventing you from creating new tiers through the UI. Ensure at least one tier remains in the system at all times.
|
Important
|
Deleting a tier immediately affects all users assigned to it. Ensure that affected users are reassigned to another tier before deletion to avoid service disruption. |
-
Verify that the tier no longer appears in the tiers list.
-
Confirm that users who were in the deleted tier’s groups are now assigned to a different tier based on their remaining group memberships.
Publish models with Models-as-a-Service
You can deploy generative AI models and publish them to Models-as-a-Service (MaaS) to enable tier-based access control and centralized authentication.
-
You have logged in to Open Data Hub.
-
You have administrator access to a project in Open Data Hub.
-
You have enabled the Models-as-a-Service component and configured at least one service tier.
-
You have access to a storage connection containing your model files, such as S3-compatible object storage.
-
In the left navigation menu, click Projects.
-
Click the name of the project where you want to deploy the model.
-
Click the Deployments tab.
-
Click Deploy model to open the wizard.
-
In the Model details section:
-
Specify your storage connection and model path.
-
Select Generative AI model as the model type.
-
Click Next.
-
-
In the Model deployment section:
-
Enter a unique model deployment name using lowercase letters, numbers, and hyphens.
-
Select Distributed inference with llm-d as the serving runtime.
ImportantCurrently, only the Distributed inference with llm-d runtime supports MaaS integration. If you select a different runtime, the Publish as MaaS endpoint option will not be available in Advanced settings.
-
Select an appropriate hardware profile for your model.
-
Click Next.
-
-
In the Advanced settings section, configure MaaS access:
-
Select Publish as MaaS endpoint to make the model available through Models-as-a-Service.
-
Configure tier access:
-
All tiers: Make the model available to all service tiers.
-
Specific tiers: Limit access to selected tiers. If you choose this option, select the specific tiers from the Tier names field.
-
-
(Optional) Select Add custom runtime environment variables to customize model behavior.
-
-
Click Deploy.
-
Verify that the model appears on the Deployments tab with a checkmark in the Status column.
-
In the left navigation menu, click Gen AI studio → AI asset endpoints.
-
Click the Models as a service tab.
-
Verify that your model is listed with a Ready status.
-
Verify that tier access control was configured correctly:
$ oc get rolebindings -n <your-project-namespace> | grep <your_model_name>You should see RoleBindings for each tier that has access to the model.
-
Test model access:
-
Generate an authentication token as described in Access models through models-as-a-service.
-
Make a test API call to the model endpoint to verify that it responds.
-
If you configured tier-based access restrictions, confirm that only authorized tiers can access the model.
-
Update tier access for deployed models
After deploying a model to Models-as-a-Service (MaaS), you can modify which service tiers have access without redeploying the model.
-
You have logged in to Open Data Hub.
-
You have administrator access to a project in Open Data Hub.
-
You have deployed a model and published it to Models-as-a-Service.
-
Navigate to Projects and click your project name.
-
Click the Deployments tab.
-
Click the model deployment name.
-
Click Edit from the actions menu.
-
In the Advanced settings section, modify the Tier access settings:
-
Change between All tiers and Specific tiers.
-
Update the tier selections if using specific tiers.
-
-
Click Update.
ImportantChanging tier access takes effect immediately. Users who lose access will receive 403 Forbidden errors on their next API request.
-
Verify that the tier access changes were applied:
$ oc get rolebindings -n <your-project-namespace> | grep <your_model_name>You should see RoleBindings for each tier that has access.
-
Test functional access:
-
Verify that users in authorized tiers can access the model through the MaaS interface.
-
If you restricted access to specific tiers, verify that users in unauthorized tiers receive 403 errors.
-
-
For troubleshooting tier access issues, see Models-as-a-Service administration troubleshooting.
Models-as-a-service administration troubleshooting
Use this reference to diagnose and resolve common administrative issues with Models-as-a-Service (MaaS) deployment, configuration, and management.
Component enablement issues
If the maas-api pod does not start or shows errors after enabling the MaaS component:
-
Check the pod logs for error messages:
$ oc logs -n opendatahub -l app.kubernetes.io/name=maas-api -
Verify that all prerequisites are met, especially:
-
Kuadrant is running in the
kuadrant-systemnamespace -
The
maas-default-gatewayGateway exists in theopenshift-ingressnamespace -
KServe component is set to
Managedin the DataScienceCluster
-
-
Check for events related to the MaaS deployment:
$ oc get events -n opendatahub --sort-by='.lastTimestamp' | grep maas -
Verify that the required RBAC resources were created:
$ oc get clusterrole | grep maas $ oc get clusterrolebinding | grep maas
Dashboard tab visibility issues
If the Models-as-a-Service tab does not appear in the dashboard:
-
Verify that the MaaS API component is running:
$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-api -
Check that the OdhDashboardConfig was updated correctly:
$ oc get odhdashboardconfig odh-dashboard-config -n opendatahub -o yaml | grep modelAsService -
Clear your browser cache and hard refresh the dashboard (Ctrl+Shift+R or Cmd+Shift+R).
-
Check the dashboard pod logs for errors:
$ oc logs -n opendatahub $(oc get pods -n opendatahub -o name | grep dashboard | head -1) --tail=50 -
Verify that you have the required permissions to view the tab. Some features may be restricted based on user roles.
Model visibility issues on Models-as-a-Service page
-
Verify that you selected Publish as MaaS endpoint in the Advanced settings during deployment.
-
Check that the MaaS API is running:
$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-api -
Verify that the model is in a Ready state:
$ oc get llminferenceservice -n <your-project-namespace>
User access errors: 403 Forbidden
-
Verify that the user’s tier is included in the model’s tier access configuration:
-
Check the model’s deployment settings in the dashboard.
-
Ensure the user’s assigned tier matches one of the selected tiers.
-
-
Verify that RoleBindings were created for the selected tiers:
$ oc get rolebindings -n <your-project-namespace> | grep <your-model-name> -
Check the user’s tier assignment:
$ oc exec -n opendatahub deployment/maas-api -- curl -s http://localhost:8080/v1/tiers/lookup \ -H "Content-Type: application/json" \ -d '{"groups": ["<user-group-name>"]}'
Tier-based access control issues
If users receive unexpected access denials:
-
Verify that the user’s tier is correctly resolved:
$ oc exec -n opendatahub deployment/maas-api -- curl -s http://localhost:8080/v1/tiers/lookup \ -H "Content-Type: application/json" \ -d '{"groups": ["<user-group-name>"]}' -
Check that the RoleBinding exists for that tier:
$ oc get rolebindings -n llm | grep <tier-name> -
Verify the RoleBinding subject matches the tier namespace exactly:
$ oc get rolebinding <rolebinding-name> -n llm -o yaml | grep "system:serviceaccounts" -
Ensure the model is ready and the LLMInferenceService exists:
$ oc get llminferenceservice -n llm -
Check the Gateway logs for authorization errors:
$ oc logs -n kuadrant-system -l app=authorino --tail=50
If all tiers are denied access:
-
Verify that the
llminferenceservice-accessRole includes thepostverb (required for POST requests). -
Check that the Role’s
resourceNamesfield includes your model name, or omit it to allow all models.
Tier management issues
If users cannot access models after tier creation:
-
Verify that the tier name in the ConfigMap exactly matches the tier name created in the dashboard (case-sensitive).
-
Verify that the user’s OpenShift groups are listed in the tier’s
groupsarray in the ConfigMap. -
Check that the MaaS API deployment restarted successfully after ConfigMap changes.
-
Verify that the tier’s level is higher than any other tiers the user’s groups might belong to.
-
Check the MaaS API logs for tier resolution errors:
$ oc logs -n opendatahub -l app.kubernetes.io/name=maas-api --tail=50 | grep tier
Use models-as-a-service
Use Models-as-a-Service (MaaS) to access large language models with built-in authentication, rate limiting, and governance. You can discover available models, generate authentication credentials, and integrate models into your applications using OpenAI-compatible APIs.
Models-as-a-Service dashboard interface
The Models-as-a-Service dashboard interface displays available models and provides tools for generating authentication tokens to access those models.
To access the Models-as-a-Service interface, log in to the Open Data Hub dashboard, click Gen AI studio in the left navigation menu, click AI asset endpoints, and click the Models as a service tab.
The Models as a service page displays a table with all model deployments published to MaaS. Each row shows:
- Name
-
The model deployment name. Models published to MaaS display a MaaS badge next to the name.
- Status
-
The current operational state of the model (Ready, Not ready, or Unknown).
- Inference endpoint
-
A View link that opens a dialog showing the MaaS route URL and token generation options.
- Actions
-
Options such as Add to playground for testing the model interactively.
To view your current service tier assignment and limits, click the Tier information link near the top of the Models as a service page. Review your tier details in the popup dialog: tier name, groups you belong to, priority level, and rate limits. Understanding your tier limits helps you plan your usage and avoid rate limit errors.
Access models through Models-as-a-Service
As a data scientist, developer, or application builder, you can use Models-as-a-Service (MaaS) to access large language models with built-in authentication, rate limiting, and quota management.
Your tier and access level
Before you begin working with models, it’s helpful to understand how MaaS determines your access level.
-
Tier assignment: You are automatically assigned to a service tier based on your group membership in OpenShift Container Platform. If you belong to multiple groups, you are assigned to the tier with the highest level number.
-
Rate limits: Your tier determines how many requests you can make per minute and how many tokens you can consume.
-
Model access: Your tier determines which models you can access. You can only access models available to your assigned tier and cannot access models restricted to lower-priority tiers.
-
Authentication: You must use a MaaS-issued token to access models.
-
You have access to the Open Data Hub dashboard.
-
Your administrator has enabled Models-as-a-Service and assigned you to a tier.
View available models
Discover which models are available to your tier and check their status.
-
Log in to the Open Data Hub dashboard.
-
In the left navigation menu, click Gen AI studio.
-
Click AI asset endpoints.
-
Click the Models as a service tab.
The Models as a service page displays a table of all models published to MaaS. Each model shows:
-
Model deployment name (used in API requests)
-
Status (Ready, Not ready, or Unknown)
-
Inference endpoint with a View link
-
Actions menu with options like Add to playground
Models published to MaaS display a MaaS badge next to the name.
-
-
Verify that at least one model appears in the list with a Ready status and MaaS badge.
-
If no models appear, contact your administrator to verify that models are deployed and your tier has access.
View your tier and limits
Check your current tier assignment and understand your resource limits.
-
On the Models as a service page, click the Tier information link near the top of the page.
-
Review your tier details in the popup dialog:
-
Tier name: Free, Premium, or Enterprise
-
Groups you belong to
-
Priority level
-
Rate limits: requests per time period and token quotas
-
|
Tip
|
If you frequently hit rate limits or need higher quotas, contact your administrator to request access to a higher tier. |
-
Verify that the Tier information dialog displays your tier name and associated limits.
Generate an authentication token
Generate a token to authenticate your API requests to models.
-
On the Models as a service page, locate the model you want to access.
-
Click View in the Inference endpoint column.
-
In the MaaS route dialog, copy the route URL and store it securely.
You will need this URL to make API calls to the model.
-
Click Generate API Key.
-
Copy the generated token immediately and store it securely.
ImportantThe token is displayed only once. If you lose the token, you must generate a new one.
Test the token by making an API call to list available models:
$ export MAAS_TOKEN="<your_generated_token>"
$ export CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath={.spec.domain})
$ curl -X GET "https://maas.${CLUSTER_DOMAIN}/v1/models" \
-H "Authorization: Bearer ${MAAS_TOKEN}"
where:
<your_generated_token>-
Specifies the authentication token you copied in the previous step.
If the token is valid, the command returns a JSON list of available models.
Make API calls to models
Use your token to make requests to models through the MaaS API.
-
If you did not copy the MaaS route URL during token generation, get it using one of the following methods:
Copy the MaaS route URL from the MaaS route dialog in the dashboard.
Alternatively, construct the URL from your cluster domain:
$ export CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}') $ export MAAS_URL="https://maas.${CLUSTER_DOMAIN}" $ echo $MAAS_URL -
List available models using the
/v1/modelsendpoint:$ curl -X GET "${MAAS_URL}/v1/models" \ -H "Authorization: Bearer ${MAAS_TOKEN}"Example response in OpenAI-compatible format:
{ "object": "list", "data": [ { "id": "facebook-opt-125m", "object": "model", "created": 1234567890, "owned_by": "llm", "ready": true }, { "id": "llama-2-7b-chat", "object": "model", "created": 1234567890, "owned_by": "llm", "ready": true } ] } -
Call a model using the chat completions endpoint:
$ curl -X POST "${MAAS_URL}/llm/<model_name>/v1/chat/completions" \ -H "Authorization: Bearer ${MAAS_TOKEN}" \ -H "Content-Type: application/json" \ -d { "model": "<model_name>", "messages": [ { "role": "user", "content": "Explain quantum computing in simple terms." } ], "max_tokens": 150, "temperature": 0.7 }where:
<model_name>-
Specifies the model deployment name from the Models as a service page, such as
facebook-opt-125morllama-2-7b-chat.Example response:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "llama-2-7b-chat", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Quantum computing is a type of computing that uses quantum mechanics..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 45, "total_tokens": 57 } }
-
Test the
/v1/modelsendpoint and confirm you receive a list of available models. -
Make a chat completion request to a model and verify that you receive a response with the expected JSON structure.
-
Check the response headers for rate limit information (
X-RateLimit-Limit,X-RateLimit-Remaining).If you receive authentication errors, verify that your token is valid and has not expired.
Rate limit responses
When you exceed your tier’s rate limits, MaaS returns specific error responses to help you manage your usage.
If you exceed the maximum number of requests per minute:
{
"error": {
"message": "Rate limit exceeded. You have exceeded the maximum number of requests per minute for your tier.",
"type": "rate_limit_error",
"code": 429
}
}
X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1234567890 Retry-After: 42
import time
import requests
def make_request_with_retry(url, headers, data, max_retries=3):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Retrying after {retry_after} seconds...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
If you exceed the maximum number of tokens per minute:
{
"error": {
"message": "Token quota exceeded. You have consumed the maximum number of tokens allowed per minute for your tier.",
"type": "quota_error",
"code": 429
}
}
-
Reduce
max_tokensin your requests -
Implement exponential backoff retry logic
-
Batch requests with longer delays between them
-
Request a higher tier from your administrator if you consistently hit limits
Test models in the playground
Use the built-in playground to test prompts and observe model responses without writing code.
-
On the Models as a service page, locate the model you want to test.
-
In the Actions column, click the menu and select Add to playground.
-
In the playground interface, enter your prompt in the message field.
-
Adjust parameters as needed:
Temperature: Controls randomness: 0.0 for deterministic responses, 1.0 for creative responses.
Max tokens: Maximum length of the response.
Top P: Nucleus sampling threshold.
-
Click Send.
-
Review the model’s response.
-
Experiment with different prompts and parameters to understand the model’s behavior.
The playground is useful for:
-
Testing prompt engineering strategies
-
Comparing model responses
-
Validating that a model is suitable for your use case before integrating it into your application
-
Demonstrating model capabilities to stakeholders
-
Verify that the model responds to your prompts in the playground.
-
Check that adjusting parameters like temperature and max tokens affects the model’s responses as expected.
Best practices
Follow these recommendations to effectively use MaaS.
Security
-
Never commit tokens to version control. Use environment variables or secret management systems.
-
Regenerate tokens periodically. Generate new tokens regularly and discard old ones.
-
Use named tokens for tracking. When generating tokens, note which application or purpose they are for.
Performance
-
Implement retry logic with exponential backoff. Handle rate limit errors gracefully.
-
Cache responses when appropriate. If you are making identical requests, consider caching results.
-
Set reasonable
max_tokensvalues. Do not request more tokens than you need, as token quotas are enforced.
Cost management
-
Check rate limit headers. Review
X-RateLimit-Remainingin API responses to track your usage against tier limits. -
Use the playground for testing. Test prompts in the playground before implementing them in code.
-
Optimize prompts. Shorter, more focused prompts consume fewer tokens.
Models-as-a-Service user access troubleshooting
Use this reference to diagnose and resolve common issues when accessing models through Models-as-a-Service (MaaS).
Authentication errors: 401 Unauthorized
{
"error": {
"message": "Invalid or expired token",
"type": "authentication_error",
"code": 401
}
}
-
Token expired: Generate a new token from the dashboard.
-
Incorrect token format: Ensure you’re using the
Authorization: Bearer <token>header format. -
Token not issued by MaaS: Do not use your OpenShift Container Platform token directly. Generate a MaaS-specific token.
Authorization errors: 403 Forbidden
{
"error": {
"message": "Access denied. Your tier does not have permission to access this model.",
"type": "authorization_error",
"code": 403
}
}
-
Model not available to your tier: Contact your administrator to request access or upgrade your tier.
-
Model exists but RBAC not configured: Ask your administrator to verify that RoleBindings are created for your tier.
Model not found: 404
{
"error": {
"message": "Model not found",
"type": "not_found_error",
"code": 404
}
}
-
Incorrect model name: Verify the model name using the
/v1/modelsendpoint. -
Model not deployed: Ask your administrator to check if the model is deployed and ready.
-
Typo in URL: Ensure the URL format is correct:
https://maas.<domain>/llm/<model-name>/v1/chat/completions
Exceeded request rate limits
If you exceed the maximum number of requests per minute:
{
"error": {
"message": "Rate limit exceeded. You have exceeded the maximum number of requests per minute for your tier.",
"type": "rate_limit_error",
"code": 429
}
}
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 42
import time
import requests
def make_request_with_retry(url, headers, data, max_retries=3):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=data)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Retrying after {retry_after} seconds...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
Exceeded token quotas
If you exceed the maximum number of tokens per minute:
{
"error": {
"message": "Token quota exceeded. You have consumed the maximum number of tokens allowed per minute for your tier.",
"type": "quota_error",
"code": 429
}
}
-
Reduce
max_tokensin your requests -
Implement exponential backoff retry logic
-
Batch requests with longer delays between them
-
Request a higher tier from your administrator if you consistently hit limits
Persistent rate limit errors
You continue to receive 429 errors even after waiting.
-
Shared tier quota: Rate limits are per-user, but if many users in your tier are making requests simultaneously, the aggregate tier limit may be reached. Contact your administrator.
-
Incorrect retry logic: Ensure you’re respecting the
Retry-Afterheader value. -
Multiple applications using the same token: Each application should have its own API key to avoid unexpected rate limiting.