Governing LLM access with models-as-a-service

Models-as-a-Service (MaaS) provides tier-based governance for large language model serving in Open Data Hub. MaaS enables administrators to define service tiers with different rate limits and quotas, control access through API key authentication, and track resource consumption for cost allocation. Users can discover available models, generate authentication credentials, and integrate models into their applications using OpenAI-compatible APIs.

Deploy and manage Models-as-a-Service

You can deploy Models-as-a-Service (MaaS) to provide tier-based governance for large language model serving. MaaS enables you to define service tiers with different rate limits and quotas, control access through API key authentication, and track resource consumption for cost allocation.

Models-as-a-Service overview

You can deploy Models-as-a-Service (MaaS) in Open Data Hub to provide tier-based governance for large language model (LLM) serving across your organization. This platform helps you manage resource consumption and governance challenges when you serve models to a large user base.

With MaaS, you can perform the following tasks:

Define service tiers with different rate limits and quotas.
Control access through API key authentication.
Track resource consumption for cost allocation.

As an administrator, you can use this tier-based system to expose models through managed API endpoints. This structure allows you to enforce different consumption policies for different user groups and deliver AI models as shared resources with appropriate access levels.

The Models-as-a-Service platform acts as a governance layer between users and model serving infrastructure. When users request access to a model, MaaS authenticates the user and determines their assigned tier based on group membership. Policy enforcement checks rate limits and quotas for that tier. Authorized requests are forwarded to the appropriate model endpoint, and usage metrics are recorded for tracking and cost allocation purposes. This architecture enables centralized policy enforcement without modifying the underlying model serving infrastructure.

MaaS includes the following capabilities:

Tier-based access control: Define multiple service tiers, such as free, premium, or enterprise, with different rate limits, quotas, and model access permissions for different user groups.
Policy and quota management: Enforce rate limiting and quota policies to prevent resource exhaustion.
API key authentication: Control access to models through tier-based resource allocation and API key-based authentication.
Usage tracking: Track consumption metrics for cost allocation and billing.
Observability: Monitor model access patterns, policy enforcement, and resource utilization.

The following table compares use cases for MaaS and standard model serving to help you choose the right approach.

Table 1. MaaS vs. standard model serving comparison
MaaS	Standalone model serving
Centralized governance across multiple teams or projects is required.	You are deploying models for single-team or single-user use cases.
You need rate limiting, quota enforcement, and usage tracking for cost control.	You are prototyping or developing models in a single-user environment where governance overhead is unnecessary.
You prefer declarative configuration management via GitOps.	Simplified deployment is preferred over centralized control.

MaaS administration is divided into initial configuration and ongoing management, with distinct responsibilities for cluster administrators and Open Data Hub administrators.

Table 2. MaaS administrator responsibilities
Phase	Cluster administrators	Open Data Hub administrators
Initial configuration	Enable MaaS in the Open Data Hub operator Configure the underlying cluster infrastructure to support model serving	Define the initial governance structure by creating tiers Assign users to groups Generate API keys Configure access to existing models Validate that users can successfully access models through MaaS
Ongoing operations	Scale MaaS components to handle increased load Apply software updates Troubleshoot infrastructure performance issues	Monitor usage metrics to track costs Adjust tier configurations and user group assignments Modify rate limits based on demand patterns Rotate API keys according to security schedules Troubleshoot authentication and authorization issues

Prerequisites for installing MaaS

Before installing Models-as-a-Service, verify that your cluster has the required platform components, operators, and infrastructure resources.

MaaS dependencies

MaaS requires several platform components to function:

KServe provides the underlying model serving infrastructure that hosts the large language models.
Kuadrant, which includes Authorino and Limitador, enforces authentication, authorization, and rate limiting policies.
Gateway API routes traffic to models and applies policy enforcement at the network level.

Note	Deploying large language models may require additional dependencies based on the model size and serving runtime. For comprehensive model serving infrastructure requirements, see Component requirements.

Note

To access MaaS through the dashboard interface:

You must enable the Llama Stack Operator on your OpenShift Container Platform cluster. For more information, see Activating the Llama Stack Operator.
You must enable the Gen AI studio menu item in the dashboard by setting spec.dashboardConfig.genAiStudio: true in the OdhDashboardConfig custom resource.

Requirements

Before you install MaaS, ensure that your environment meets the following requirements:

Platform and access requirements:

You have an OpenShift Container Platform cluster version 4.19.9 or later.
You have cluster administrator access to install operators and create cluster-scoped resources.
You have installed the OpenShift CLI, oc.

Operator requirements:

You have installed Kuadrant Operator v1.3 or later:
- You have created the kuadrant-system namespace and OperatorGroup.
- You have configured the ISTIO_GATEWAY_CONTROLLER_NAME environment variable to "openshift.io/gateway-controller/v1".
- You have created a Kuadrant custom resource in the kuadrant-system namespace with ready status.

Open Data Hub requirements:

You have installed the opendatahub-operator.
You have initialized the platform with a DSCInitialization resource.
You have created a DataScienceCluster resource with the kserve component set to Managed.
You have created a GatewayClass resource configured for the OpenShift Gateway Controller (openshift.io/gateway-controller) and a Gateway named maas-default-gateway in the openshift-ingress namespace. For information about creating Gateway API resources, see Enabling the Gateway API.

Prerequisite verification

To verify that your environment meets these requirements:

Verify that you have cluster administrator access:
```
$ oc auth can-i create clusterrole
```
Example output
```
yes
```

Verify that KServe is installed and managed:

$ oc get datasciencecluster -o jsonpath='{.items[0].spec.components.kserve.managementState}'

Example output

Managed

Verify that Kuadrant is running:

$ oc get pods -n kuadrant-system

Example output

NAME                                                   READY   STATUS    RESTARTS   AGE
authorino-operator-controller-manager-xxx              2/2     Running   0          5m
kuadrant-operator-controller-manager-xxx               1/1     Running   0          5m
limitador-limitador-xxx                                1/1     Running   0          5m

Verify that the Gateway exists:

$ oc get gateway maas-default-gateway -n openshift-ingress

Example output

NAME                      CLASS                                 ADDRESS         PROGRAMMED   AGE
maas-default-gateway      openshift-gateway-controller          192.168.1.100   True         10m

Enable the models-as-a-service component

You can enable Models-as-a-Service (MaaS) in your Open Data Hub deployment by updating the modelsAsService component in the DataScienceCluster custom resource (CR).

Enabling the Models-as-a-Service component deploys the following resources in your cluster:

The maas-api service that handles authentication, authorization, and model discovery
Default tier configuration with free, premium, and enterprise tiers
ServiceAccounts for token issuance and policy enforcement
RBAC roles for model access control

Prerequisites

You have cluster administrator privileges for your OpenShift Container Platform cluster.

Procedure

Log in to the OpenShift Container Platform console as a cluster administrator.
Navigate to Operators → Installed Operators, and then click the Open Data Hub Operator.
Click the Data Science Cluster tab.
On the DataScienceClusters page, click your DataScienceCluster object, commonly named default-dsc.
Click the YAML tab.
In the spec.components section, locate the kserve component and add or update the modelsAsService configuration:
```
# ... existing configuration ...
spec:
  components:
    kserve:
      managementState: Managed
      modelsAsService:
        managementState: Managed
# ... remaining configuration ...
```
where:

modelsAsService

Specifies the MaaS component nested under the kserve component.

managementState: Managed

Specifies that MaaS functionality is enabled.
Click Save.

Verification

Verify that the MaaS component is running:

Verify that the maas-api pod is running:

$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-api

Example output

NAME                        READY   STATUS    RESTARTS   AGE
maas-api-7f8c9d4b5-xyz12    1/1     Running   0          3m

Verify that the tier configuration ConfigMap exists:

$ oc get configmap tier-to-group-mapping -n opendatahub

Example output

NAME                      DATA   AGE
tier-to-group-mapping     1      3m

Additional resources

Models-as-a-Service administration troubleshooting

Enable the Models-as-a-Service dashboard interface

To view the Models-as-a-Service (MaaS) dashboard interface in Open Data Hub, you must enable the Gen AI studio menu item and the MaaS feature in the OdhDashboardConfig custom resource (CR). The MaaS dashboard interface appears in the AI asset endpoints page within Gen AI studio.

The MaaS dashboard interface provides a user interface for:

Viewing available models and their status
Generating authentication tokens for model access
Creating and managing named API keys
Viewing your assigned tier and rate limits
Accessing API documentation and examples

Prerequisites

You have cluster administrator privileges for your OpenShift Container Platform cluster.
You have Open Data Hub administrator privileges.
You have enabled the Llama Stack Operator on the OpenShift cluster by setting its managementState field to Managed in the DataScienceCluster custom resource (CR) of the Open Data Hub Operator. For more information, see Activating the Llama Stack Operator.
You have enabled the MaaS component.

Procedure

Log in to the OpenShift Container Platform console as a cluster administrator.
In the OpenShift console, navigate to Home → API Explorer.
In the search bar, enter OdhDashboardConfig to filter by kind.
Click the OdhDashboardConfig custom resource to open the resource details page.
From the Project list, select the Open Data Hub applications namespace (typically opendatahub).
Click the Instances tab.
Click the odh-dashboard-config instance to open the details page.
Click the YAML tab.
In the spec.dashboardConfig section, add or update the modelAsService and genAiStudio configuration:
```
# ... existing configuration ...
spec:
  dashboardConfig:
    genAiStudio: true
    modelAsService: true
# ... remaining configuration ...
```
where:

genAiStudio: true

Specifies that Open Data Hub displays the Gen AI studio menu item in the dashboard.

modelAsService: true

Specifies that Open Data Hub displays the MaaS dashboard interface in the Gen AI studio menu item.
Click Save.

The configuration is automatically applied to the dashboard pods.

Verification

Verify that the MaaS dashboard interface is enabled and accessible:

Verify that both features are enabled in the OdhDashboardConfig:

$ oc get odhdashboardconfig odh-dashboard-config -n opendatahub -o jsonpath='{.spec.dashboardConfig.genAiStudio}{"\n"}{.spec.dashboardConfig.modelAsService}'

Example output

true
true

Verify that the dashboard pods have reloaded with the new configuration:
```
$ oc get pods -n opendatahub | grep dashboard
```
Check that the pods are running and have been recently restarted.
Log in to the Open Data Hub dashboard.
In the left navigation menu, click Gen AI studio → AI asset endpoints.
Verify that the Models as a service tab is displayed.

The MaaS tab provides access to model discovery, token generation, and API key management.

Additional resources

Models-as-a-Service administration troubleshooting

Models-as-a-Service tiers

The Models-as-a-Service (MaaS) platform uses a tier-based model to manage access control and rate limiting for AI model serving. Tiers allow platform administrators to define different levels of service and control model access based on user group membership.

Tier-based access control

When multiple teams share large language models, tiers enable you to:

Prevent resource exhaustion from unlimited usage
Provide different service levels for different user groups
Track and allocate costs based on team consumption
Control which teams can access expensive or sensitive models

MaaS assigns users to tiers based on their OpenShift group membership. When a user belongs to multiple groups, the system assigns the tier with the highest level number.

Default tier configurations

The MaaS platform includes three default configurable tiers. Each tier has a dedicated namespace following the pattern {instance-name}-tier-{tier-name}.

Table 3. Default tier configurations
Tier	Description	Level	Default group	Namespace
Free	Entry-level tier for basic users	0 (Lowest)	`system:authenticated`	`maas-default-gateway-tier-free`
Premium	Intermediate tier for higher throughput	1	`premium-users`	`maas-default-gateway-tier-premium`
Enterprise	Top tier for critical workloads	2 (Highest)	`enterprise-users`	`maas-default-gateway-tier-enterprise`

All tiers are fully configurable. You can modify the default tiers or create additional custom tiers to meet your organization’s needs.

Note	Tier name values are case-sensitive and must match exactly with rate limit policy predicates.

Tier configuration

MaaS tiers are defined in the tier-to-group-mapping ConfigMap in the opendatahub namespace. A tier configuration includes the following properties:

Name: A unique identifier for the tier, such as free, premium, or enterprise.
Display name: A human-readable label shown in the dashboard.
Level: A numeric hierarchy that determines tier precedence. Higher numbers indicate higher priority.
Groups: A list of OpenShift groups whose members are assigned to this tier.

Access control

Administrators control which tiers can access specific models by configuring RBAC permissions for the tier’s namespace.

When a user belongs to multiple groups across different tiers, MaaS assigns them to the tier with the highest level number. Users can only access models made available to their assigned tier and cannot access models in lower-level tiers, even if they belong to groups associated with those lower tiers.

For example, if a user belongs to both the premium-users group (tier level 1) and the enterprise-users group (tier level 2), they are assigned to the enterprise tier and can only access models configured for the enterprise tier. They cannot access models restricted to the premium or free tiers.

Rate limiting and quotas

Configure rate limiting and quota policies when you create or edit a tier through the dashboard. These policies include:

Request rate limits: Maximum number of API requests allowed per time period, such as 1000 requests per minute.
Token consumption limits: Maximum number of tokens that can be consumed per time period, such as 100,000 tokens per minute.
Quota policies: Optional daily or monthly limits on total usage.

Kuadrant’s Limitador component enforces these limits based on the user’s assigned tier.

Manage Models-as-a-Service tiers

You can create and manage Models-as-a-Service (MaaS) tiers to control access levels, rate limits, and quotas for different user groups.

Prerequisites

You have cluster administrator privileges for your OpenShift Container Platform cluster.
You have enabled the Models-as-a-Service component.
You have enabled the Models-as-a-Service dashboard interface.
You have access to the Open Data Hub dashboard with administrator privileges.

View tier configurations

Review all configured tiers and their settings to understand the current tier hierarchy and assignments.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to the Open Data Hub dashboard.

Procedure

In the left navigation menu, click Settings.
Click Tiers.

The tiers list displays all configured tiers with the following information:
- Tier name
- Assigned groups
- Priority level
- Rate limit status
To view detailed information about a specific tier, click the tier name.

The tier details page shows:
- Complete rate limit configuration (token and request limits)
- All assigned groups
- Description and metadata
- Resource name

Verification

Verify that the tiers list displays all configured tiers in your system.
Click a tier name and verify that you can view its complete configuration including groups, rate limits, and level.

Create a tier

Create a new tier through the Open Data Hub dashboard to define service levels, resource limits, and group assignments.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to the Open Data Hub dashboard.

Procedure

In the left navigation menu, click Settings.
Click Tiers to view existing tiers.
Click Create tier.
In the Create tier form, configure the tier:
1. Name: Enter a descriptive name for the tier that will be displayed to users, such as Production Tier or Data Science Team.
2. Optional: Click Edit resource name to set a custom internal identifier for the tier. If not specified, the resource name is generated automatically from the display name. The resource name must use lowercase letters, numbers, and hyphens only.
3. Optional: Description: Provide a brief description of the tier’s purpose and intended users.
4. Level: Set the tier priority level using a numeric value. Higher numbers indicate higher priority. When a user belongs to multiple groups, they are assigned the tier with the highest level value. For example, use 0 for the lowest tier, 1 for medium, and 2 for the highest.
5. Groups: Select or create the OpenShift groups that should be assigned to this tier. Users who are members of these groups will be automatically assigned to this tier.
Configure rate limits:
1. To limit token consumption, select Enforce token rate limit and click Add token rate limit. Enter the number of tokens allowed, enter the time period, and select the time unit. For example, set 100000 tokens per 1 minute for production workloads.
2. To limit API requests, select Enforce request rate limit and click Add request rate limit. Enter the number of requests allowed, enter the time period, and select the time unit. For example, set 1000 requests per 1 minute for production workloads.
  
  Tip
  
  You can configure multiple rate limits with different time windows by clicking Add token rate limit or Add request rate limit again.
Click Create tier.

Verification

Verify that the new tier appears in the tiers list with the correct configuration.
Click the tier name to view its details and confirm the groups, rate limits, and level are set correctly.

Edit a tier

Modify tier properties, rate limits, and group assignments through the dashboard.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to the Open Data Hub dashboard.

Procedure

In the left navigation menu, click Settings.
Click Tiers.
Locate the tier you want to modify and click its name.
Click Edit tier.
Modify the tier properties:
1. Update the name or description if needed.
2. Adjust the tier level.
3. Add or remove groups.
4. Modify rate limits by editing existing limits or adding new ones.
  
  Important
  
  You cannot change the resource name after the tier is created. If you need a different resource name, delete the tier and create a new one.
Click Update tier.

Warning

Changing rate limits or group assignments takes effect immediately. Users may experience access changes or throttling based on the new configuration.

Verification

Verify that the tier shows the updated configuration in the tiers list.
If you changed group assignments, confirm that users in the new groups can access models available to this tier.

Delete a tier

Delete a tier through the dashboard when it is no longer needed.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to the Open Data Hub dashboard.

Procedure

In the left navigation menu, click Settings.
Click Tiers.
Locate the tier you want to delete and click its name.
Click Actions → Delete tier.
In the confirmation dialog, review the impact of deletion:
1. Users assigned to this tier will lose their current tier assignment.
2. Active API keys and tokens scoped to this tier may stop working.
Type the tier name to confirm deletion.
Click Delete.

Do not delete all tiers from your system. If you delete the last remaining tier, the dashboard will not display the "Add tier" button, preventing you from creating new tiers through the UI. Ensure at least one tier remains in the system at all times.

Important

Deleting a tier immediately affects all users assigned to it. Ensure that affected users are reassigned to another tier before deletion to avoid service disruption.

Verification

Verify that the tier no longer appears in the tiers list.
Confirm that users who were in the deleted tier’s groups are now assigned to a different tier based on their remaining group memberships.

Additional resources

Models-as-a-Service administration troubleshooting

Publish models with Models-as-a-Service

You can deploy generative AI models and publish them to Models-as-a-Service (MaaS) to enable tier-based access control and centralized authentication.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to a project in Open Data Hub.
You have enabled the Models-as-a-Service component and configured at least one service tier.
You have access to a storage connection containing your model files, such as S3-compatible object storage.

Procedure

In the left navigation menu, click Projects.
Click the name of the project where you want to deploy the model.
Click the Deployments tab.
Click Deploy model to open the wizard.
In the Model details section:
1. Specify your storage connection and model path.
2. Select Generative AI model as the model type.
3. Click Next.

In the Model deployment section:

Enter a unique model deployment name using lowercase letters, numbers, and hyphens.

Select Distributed inference with llm-d as the serving runtime.

Important

Currently, only the Distributed inference with llm-d runtime supports MaaS integration. If you select a different runtime, the Publish as MaaS endpoint option will not be available in Advanced settings.

Select an appropriate hardware profile for your model.
Click Next.

In the Advanced settings section, configure MaaS access:
1. Select Publish as MaaS endpoint to make the model available through Models-as-a-Service.
2. Configure tier access:
  - All tiers: Make the model available to all service tiers.
  - Specific tiers: Limit access to selected tiers. If you choose this option, select the specific tiers from the Tier names field.
3. (Optional) Select Add custom runtime environment variables to customize model behavior.
Click Deploy.

Verification

Verify that the model appears on the Deployments tab with a checkmark in the Status column.
In the left navigation menu, click Gen AI studio → AI asset endpoints.
Click the Models as a service tab.
Verify that your model is listed with a Ready status.
Verify that tier access control was configured correctly:
```
$ oc get rolebindings -n <your-project-namespace> | grep <your_model_name>
```
You should see RoleBindings for each tier that has access to the model.
Test model access:
1. Generate an authentication token as described in Access models through models-as-a-service.
2. Make a test API call to the model endpoint to verify that it responds.
3. If you configured tier-based access restrictions, confirm that only authorized tiers can access the model.

Additional resources

Update tier access for deployed models

After deploying a model to Models-as-a-Service (MaaS), you can modify which service tiers have access without redeploying the model.

Prerequisites

You have logged in to Open Data Hub.
You have administrator access to a project in Open Data Hub.
You have deployed a model and published it to Models-as-a-Service.

Procedure

Navigate to Projects and click your project name.
Click the Deployments tab.
Click the model deployment name.
Click Edit from the actions menu.
In the Advanced settings section, modify the Tier access settings:
1. Change between All tiers and Specific tiers.
2. Update the tier selections if using specific tiers.
Click Update.

Important

Changing tier access takes effect immediately. Users who lose access will receive 403 Forbidden errors on their next API request.

Verification

Verify that the tier access changes were applied:
```
$ oc get rolebindings -n <your-project-namespace> | grep <your_model_name>
```
You should see RoleBindings for each tier that has access.
Test functional access:
1. Verify that users in authorized tiers can access the model through the MaaS interface.
2. If you restricted access to specific tiers, verify that users in unauthorized tiers receive 403 errors.

Additional resources

For troubleshooting tier access issues, see Models-as-a-Service administration troubleshooting.

Models-as-a-service administration troubleshooting

Use this reference to diagnose and resolve common administrative issues with Models-as-a-Service (MaaS) deployment, configuration, and management.

Component enablement issues

If the maas-api pod does not start or shows errors after enabling the MaaS component:

Check the pod logs for error messages:

$ oc logs -n opendatahub -l app.kubernetes.io/name=maas-api

Verify that all prerequisites are met, especially:
- Kuadrant is running in the kuadrant-system namespace
- The maas-default-gateway Gateway exists in the openshift-ingress namespace
- KServe component is set to Managed in the DataScienceCluster

Check for events related to the MaaS deployment:

$ oc get events -n opendatahub --sort-by='.lastTimestamp' | grep maas

Verify that the required RBAC resources were created:

$ oc get clusterrole | grep maas
$ oc get clusterrolebinding | grep maas

Dashboard tab visibility issues

If the Models-as-a-Service tab does not appear in the dashboard:

Verify that the MaaS API component is running:

$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-api

Check that the OdhDashboardConfig was updated correctly:

$ oc get odhdashboardconfig odh-dashboard-config -n opendatahub -o yaml | grep modelAsService

Clear your browser cache and hard refresh the dashboard (Ctrl+Shift+R or Cmd+Shift+R).

Check the dashboard pod logs for errors:

$ oc logs -n opendatahub $(oc get pods -n opendatahub -o name | grep dashboard | head -1) --tail=50

Verify that you have the required permissions to view the tab. Some features may be restricted based on user roles.

Model visibility issues on Models-as-a-Service page

Verify that you selected Publish as MaaS endpoint in the Advanced settings during deployment.

Check that the MaaS API is running:

$ oc get pods -n opendatahub -l app.kubernetes.io/name=maas-api

Verify that the model is in a Ready state:

$ oc get llminferenceservice -n <your-project-namespace>

User access errors: 403 Forbidden

Verify that the user’s tier is included in the model’s tier access configuration:
- Check the model’s deployment settings in the dashboard.
- Ensure the user’s assigned tier matches one of the selected tiers.

Verify that RoleBindings were created for the selected tiers:

$ oc get rolebindings -n <your-project-namespace> | grep <your-model-name>

Check the user’s tier assignment:

$ oc exec -n opendatahub deployment/maas-api -- curl -s http://localhost:8080/v1/tiers/lookup \
  -H "Content-Type: application/json" \
  -d '{"groups": ["<user-group-name>"]}'

Tier-based access control issues

If users receive unexpected access denials:

Verify that the user’s tier is correctly resolved:

$ oc exec -n opendatahub deployment/maas-api -- curl -s http://localhost:8080/v1/tiers/lookup \
  -H "Content-Type: application/json" \
  -d '{"groups": ["<user-group-name>"]}'

Check that the RoleBinding exists for that tier:
```
$ oc get rolebindings -n llm | grep <tier-name>
```

Verify the RoleBinding subject matches the tier namespace exactly:

$ oc get rolebinding <rolebinding-name> -n llm -o yaml | grep "system:serviceaccounts"

Ensure the model is ready and the LLMInferenceService exists:
```
$ oc get llminferenceservice -n llm
```

Check the Gateway logs for authorization errors:

$ oc logs -n kuadrant-system -l app=authorino --tail=50

If all tiers are denied access:

Verify that the llminferenceservice-access Role includes the post verb (required for POST requests).
Check that the Role’s resourceNames field includes your model name, or omit it to allow all models.

Tier management issues

If users cannot access models after tier creation:

Verify that the tier name in the ConfigMap exactly matches the tier name created in the dashboard (case-sensitive).
Verify that the user’s OpenShift groups are listed in the tier’s groups array in the ConfigMap.
Check that the MaaS API deployment restarted successfully after ConfigMap changes.
Verify that the tier’s level is higher than any other tiers the user’s groups might belong to.

Check the MaaS API logs for tier resolution errors:

$ oc logs -n opendatahub -l app.kubernetes.io/name=maas-api --tail=50 | grep tier

Use models-as-a-service

Use Models-as-a-Service (MaaS) to access large language models with built-in authentication, rate limiting, and governance. You can discover available models, generate authentication credentials, and integrate models into your applications using OpenAI-compatible APIs.

Models-as-a-Service dashboard interface

The Models-as-a-Service dashboard interface displays available models and provides tools for generating authentication tokens to access those models.

To access the Models-as-a-Service interface, log in to the Open Data Hub dashboard, click Gen AI studio in the left navigation menu, click AI asset endpoints, and click the Models as a service tab.

The Models as a service page displays a table with all model deployments published to MaaS. Each row shows:

Name: The model deployment name. Models published to MaaS display a MaaS badge next to the name.
Status: The current operational state of the model (Ready, Not ready, or Unknown).
Inference endpoint: A View link that opens a dialog showing the MaaS route URL and token generation options.
Actions: Options such as Add to playground for testing the model interactively.

To view your current service tier assignment and limits, click the Tier information link near the top of the Models as a service page. Review your tier details in the popup dialog: tier name, groups you belong to, priority level, and rate limits. Understanding your tier limits helps you plan your usage and avoid rate limit errors.

Next steps

Access models through models-as-a-service

Access models through Models-as-a-Service

As a data scientist, developer, or application builder, you can use Models-as-a-Service (MaaS) to access large language models with built-in authentication, rate limiting, and quota management.

Your tier and access level

Before you begin working with models, it’s helpful to understand how MaaS determines your access level.

Tier assignment: You are automatically assigned to a service tier based on your group membership in OpenShift Container Platform. If you belong to multiple groups, you are assigned to the tier with the highest level number.
Rate limits: Your tier determines how many requests you can make per minute and how many tokens you can consume.
Model access: Your tier determines which models you can access. You can only access models available to your assigned tier and cannot access models restricted to lower-priority tiers.
Authentication: You must use a MaaS-issued token to access models.

Prerequisites

You have access to the Open Data Hub dashboard.
Your administrator has enabled Models-as-a-Service and assigned you to a tier.

View available models

Discover which models are available to your tier and check their status.

Procedure

Log in to the Open Data Hub dashboard.
In the left navigation menu, click Gen AI studio.
Click AI asset endpoints.
Click the Models as a service tab.

The Models as a service page displays a table of all models published to MaaS. Each model shows:
- Model deployment name (used in API requests)
- Status (Ready, Not ready, or Unknown)
- Inference endpoint with a View link
- Actions menu with options like Add to playground
  
  Models published to MaaS display a MaaS badge next to the name.

Verification

Verify that at least one model appears in the list with a Ready status and MaaS badge.
If no models appear, contact your administrator to verify that models are deployed and your tier has access.

View your tier and limits

Check your current tier assignment and understand your resource limits.

Procedure

On the Models as a service page, click the Tier information link near the top of the page.
Review your tier details in the popup dialog:
- Tier name: Free, Premium, or Enterprise
- Groups you belong to
- Priority level
- Rate limits: requests per time period and token quotas

Tip	If you frequently hit rate limits or need higher quotas, contact your administrator to request access to a higher tier.

Verification

Verify that the Tier information dialog displays your tier name and associated limits.

Generate an authentication token

Generate a token to authenticate your API requests to models.

Procedure

On the Models as a service page, locate the model you want to access.
Click View in the Inference endpoint column.
In the MaaS route dialog, copy the route URL and store it securely.

You will need this URL to make API calls to the model.
Click Generate API Key.
Copy the generated token immediately and store it securely.

Important

The token is displayed only once. If you lose the token, you must generate a new one.

Verification

Test the token by making an API call to list available models:

$ export MAAS_TOKEN="<your_generated_token>"
$ export CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath={.spec.domain})

$ curl -X GET "https://maas.${CLUSTER_DOMAIN}/v1/models" \
  -H "Authorization: Bearer ${MAAS_TOKEN}"

where:

<your_generated_token>: Specifies the authentication token you copied in the previous step.

If the token is valid, the command returns a JSON list of available models.

Make API calls to models

Use your token to make requests to models through the MaaS API.

Procedure

If you did not copy the MaaS route URL during token generation, get it using one of the following methods:

Copy the MaaS route URL from the MaaS route dialog in the dashboard.

Alternatively, construct the URL from your cluster domain:
```
$ export CLUSTER_DOMAIN=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
$ export MAAS_URL="https://maas.${CLUSTER_DOMAIN}"
$ echo $MAAS_URL
```

List available models using the /v1/models endpoint:

$ curl -X GET "${MAAS_URL}/v1/models" \
  -H "Authorization: Bearer ${MAAS_TOKEN}"

Example response in OpenAI-compatible format:

{
  "object": "list",
  "data": [
    {
      "id": "facebook-opt-125m",
      "object": "model",
      "created": 1234567890,
      "owned_by": "llm",
      "ready": true
    },
    {
      "id": "llama-2-7b-chat",
      "object": "model",
      "created": 1234567890,
      "owned_by": "llm",
      "ready": true
    }
  ]
}

Call a model using the chat completions endpoint:

$ curl -X POST "${MAAS_URL}/llm/<model_name>/v1/chat/completions" \
  -H "Authorization: Bearer ${MAAS_TOKEN}" \
  -H "Content-Type: application/json" \
  -d {
    "model": "<model_name>",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms."
      }
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }

where:

<model_name>

Specifies the model deployment name from the Models as a service page, such as facebook-opt-125m or llama-2-7b-chat.

Example response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-2-7b-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is a type of computing that uses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 45,
    "total_tokens": 57
  }
}

Verification

Test the /v1/models endpoint and confirm you receive a list of available models.
Make a chat completion request to a model and verify that you receive a response with the expected JSON structure.
Check the response headers for rate limit information (X-RateLimit-Limit, X-RateLimit-Remaining).

If you receive authentication errors, verify that your token is valid and has not expired.

Rate limit responses

When you exceed your tier’s rate limits, MaaS returns specific error responses to help you manage your usage.

Request rate limit exceeded

If you exceed the maximum number of requests per minute:

Example error response

{
  "error": {
    "message": "Rate limit exceeded. You have exceeded the maximum number of requests per minute for your tier.",
    "type": "rate_limit_error",
    "code": 429
  }
}

Response headers (useful for retry logic)

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 42

Handling rate limits in Python

import time
import requests

def make_request_with_retry(url, headers, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 60))
            print(f"Rate limited. Retrying after {retry_after} seconds...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

Token quota exceeded

If you exceed the maximum number of tokens per minute:

Example error response

{
  "error": {
    "message": "Token quota exceeded. You have consumed the maximum number of tokens allowed per minute for your tier.",
    "type": "quota_error",
    "code": 429
  }
}

Mitigation strategies

Reduce max_tokens in your requests
Implement exponential backoff retry logic
Batch requests with longer delays between them
Request a higher tier from your administrator if you consistently hit limits

Test models in the playground

Use the built-in playground to test prompts and observe model responses without writing code.

Procedure

On the Models as a service page, locate the model you want to test.
In the Actions column, click the menu and select Add to playground.
In the playground interface, enter your prompt in the message field.
Adjust parameters as needed:

Temperature: Controls randomness: 0.0 for deterministic responses, 1.0 for creative responses.

Max tokens: Maximum length of the response.

Top P: Nucleus sampling threshold.
Click Send.
Review the model’s response.
Experiment with different prompts and parameters to understand the model’s behavior.

The playground is useful for:

Testing prompt engineering strategies
Comparing model responses
Validating that a model is suitable for your use case before integrating it into your application
Demonstrating model capabilities to stakeholders

Verification

Verify that the model responds to your prompts in the playground.
Check that adjusting parameters like temperature and max tokens affects the model’s responses as expected.

Best practices

Follow these recommendations to effectively use MaaS.

Security

Never commit tokens to version control. Use environment variables or secret management systems.
Regenerate tokens periodically. Generate new tokens regularly and discard old ones.
Use named tokens for tracking. When generating tokens, note which application or purpose they are for.

Performance

Implement retry logic with exponential backoff. Handle rate limit errors gracefully.
Cache responses when appropriate. If you are making identical requests, consider caching results.
Set reasonable max_tokens values. Do not request more tokens than you need, as token quotas are enforced.

Cost management

Check rate limit headers. Review X-RateLimit-Remaining in API responses to track your usage against tier limits.
Use the playground for testing. Test prompts in the playground before implementing them in code.
Optimize prompts. Shorter, more focused prompts consume fewer tokens.

Additional resources

Models-as-a-Service user access troubleshooting

Models-as-a-Service user access troubleshooting

Use this reference to diagnose and resolve common issues when accessing models through Models-as-a-Service (MaaS).

Authentication errors: 401 Unauthorized

Symptom

{
  "error": {
    "message": "Invalid or expired token",
    "type": "authentication_error",
    "code": 401
  }
}

Possible causes and solutions

Token expired: Generate a new token from the dashboard.
Incorrect token format: Ensure you’re using the Authorization: Bearer <token> header format.
Token not issued by MaaS: Do not use your OpenShift Container Platform token directly. Generate a MaaS-specific token.

Authorization errors: 403 Forbidden

Symptom

{
  "error": {
    "message": "Access denied. Your tier does not have permission to access this model.",
    "type": "authorization_error",
    "code": 403
  }
}

Possible causes and solutions

Model not available to your tier: Contact your administrator to request access or upgrade your tier.
Model exists but RBAC not configured: Ask your administrator to verify that RoleBindings are created for your tier.

Model not found: 404

Symptom

{
  "error": {
    "message": "Model not found",
    "type": "not_found_error",
    "code": 404
  }
}

Possible causes and solutions

Incorrect model name: Verify the model name using the /v1/models endpoint.
Model not deployed: Ask your administrator to check if the model is deployed and ready.
Typo in URL: Ensure the URL format is correct: https://maas.<domain>/llm/<model-name>/v1/chat/completions

Exceeded request rate limits

If you exceed the maximum number of requests per minute:

Example error response

{
  "error": {
    "message": "Rate limit exceeded. You have exceeded the maximum number of requests per minute for your tier.",
    "type": "rate_limit_error",
    "code": 429
  }
}

Response headers (useful for retry logic)

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1234567890
Retry-After: 42

Handling rate limits in Python

import time
import requests

def make_request_with_retry(url, headers, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 60))
            print(f"Rate limited. Retrying after {retry_after} seconds...")
            time.sleep(retry_after)
            continue

        return response

    raise Exception("Max retries exceeded")

Exceeded token quotas

If you exceed the maximum number of tokens per minute:

Example error response

{
  "error": {
    "message": "Token quota exceeded. You have consumed the maximum number of tokens allowed per minute for your tier.",
    "type": "quota_error",
    "code": 429
  }
}

Mitigation strategies

Reduce max_tokens in your requests
Implement exponential backoff retry logic
Batch requests with longer delays between them
Request a higher tier from your administrator if you consistently hit limits

Persistent rate limit errors

Symptom

You continue to receive 429 errors even after waiting.

Possible causes and solutions

Shared tier quota: Rate limits are per-user, but if many users in your tier are making requests simultaneously, the aggregate tier limit may be reached. Contact your administrator.
Incorrect retry logic: Ensure you’re respecting the Retry-After header value.
Multiple applications using the same token: Each application should have its own API key to avoid unexpected rate limiting.