Lab Guide: Advanced GPU Quota Management and Preemption with Kueue
This lab guide will walk you through setting up a sophisticated resource management scenario on OpenShift using Kueue. You will configure quotas for different teams sharing a common pool of resources and demonstrate how a high-priority team can preempt a lower-priority team’s workload to guarantee access to critical GPUs.
This lab uses a realistic setup with two teams: team-a
(high-priority, requires GPUs) and team-b
(low-priority, CPU-only), who are part of the same resource-sharing cohort.
1. Problem Statement
In a multi-tenant cluster, managing shared resources presents two major challenges:
-
Resource Contention: When the cluster is under heavy load, critical, high-priority jobs (e.g., production training for Team A) might get stuck waiting for resources consumed by lower-priority jobs (e.g., development experiments for Team B).
-
Inefficient Resource Sharing: Different teams have varying needs. A mechanism is required to allow teams to borrow idle resources without disrupting another team’s ability to reclaim them when needed.
2. Solution Overview
This lab demonstrates a solution using Kueue to implement a robust system for quota management and preemption.
-
Cohort-Based Sharing: Both
team-a
andteam-b
will be placed into a singleCohort
, allowing them to draw from a common pool of resources defined in ashared-cq
(ClusterQueue). -
Dedicated and Borrowable Quotas:
team-a-cq
will have a guaranteed (nominalQuota
) fornvidia.com/gpu
resources, whileteam-b-cq
will not. Both will borrow CPU/Memory from theshared-cq
. -
Priority and Preemption:
team-a-cq
will be configured with a preemption policy. If Team A submits a job and the cohort lacks sufficient CPU resources, Kueue will find and preempt a lower-priority workload from Team B to free up capacity.
By the end of this lab, you will have deployed a lower-priority RayCluster
, watched it run, and then deployed a higher-priority RayCluster
that successfully preempts it.
3. Prerequisites
-
OpenShift AI Operator: Ensure the OpenShift AI Operator is installed.
-
GPU Worker Node: You need at least one GPU-enabled worker node in your cluster.
-
GPU Node Taint: The GPU node must be tainted to reserve it for GPU workloads.
This was done during the bootstrap process. If you need to reapply the taint, use this command:
|
4. Lab Steps
4.1. 1. Configure the Multi-Team Environment
First, apply all the necessary configuration objects. This includes namespaces, resource flavors, and the Kueue queues with the correct quotas and preemption policies.
-
Run the following on the terminal to create the
namespaces
and resources needed for theLab
.cat <<EOF | oc create -f - apiVersion: v1 kind: Namespace metadata: labels: kubernetes.io-metadata.name: team-a opendatahub.io/dashboard: "true" kueue.openshift.io/managed: "true" name: team-a --- apiVersion: v1 kind: Namespace metadata: labels: kubernetes.io/metadata.name: team-b opendatahub.io/dashboard: "true" kueue.openshift.io/managed: "true" name: team-b --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: edit namespace: team-a subjects: - kind: ServiceAccount name: default namespace: team-a roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: edit namespace: team-b subjects: - kind: ServiceAccount name: default namespace: team-b roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit --- apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: default-flavor --- apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: gpu-flavor spec: nodeLabels: nvidia.com/gpu.present: "true" tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule --- apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "shared-cq" spec: preemption: reclaimWithinCohort: Any borrowWithinCohort: policy: LowerPriority maxPriorityThreshold: 100 withinClusterQueue: Never namespaceSelector: {} # match all. cohort: "team-ab" resourceGroups: - coveredResources: - cpu - memory flavors: - name: "default-flavor" resources: - name: "cpu" nominalQuota: 6 # This is the shared pool for the cohort - name: "memory" nominalQuota: 16Gi --- apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: team-a-cq spec: preemption: reclaimWithinCohort: Any borrowWithinCohort: policy: LowerPriority # Preempt lower-priority workloads in the cohort maxPriorityThreshold: 100 withinClusterQueue: LowerPriority # Reclaim from lower-priority workloads in the cohort namespaceSelector: matchLabels: kubernetes.io/metadata.name: team-a queueingStrategy: BestEffortFIFO cohort: team-ab resourceGroups: - coveredResources: - cpu - memory flavors: - name: default-flavor resources: - name: cpu nominalQuota: 0 # Must borrow CPU from the cohort - name: memory nominalQuota: 0 - coveredResources: - nvidia.com/gpu flavors: - name: gpu-flavor resources: - name: nvidia.com/gpu nominalQuota: "1" # Guaranteed GPU quota for Team A --- apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: team-b-cq spec: namespaceSelector: matchLabels: kubernetes.io/metadata.name: team-b queueingStrategy: BestEffortFIFO cohort: team-ab resourceGroups: - coveredResources: - nvidia.com/gpu flavors: - name: gpu-flavor resources: - name: nvidia.com/gpu nominalQuota: "0" # No GPU quota for Team B borrowingLimit: "0" - coveredResources: - cpu - memory flavors: - name: default-flavor resources: - name: cpu nominalQuota: 0 # Must borrow CPU from the cohort - name: memory nominalQuota: 0 --- apiVersion: kueue.x-k8s.io/v1beta1 kind: LocalQueue metadata: name: local-queue namespace: team-a spec: clusterQueue: team-a-cq --- apiVersion: kueue.x-k8s.io/v1beta1 kind: LocalQueue metadata: name: local-queue namespace: team-b spec: clusterQueue: team-b-cq --- apiVersion: kueue.x-k8s.io/v1beta1 kind: WorkloadPriorityClass metadata: name: prod-priority value: 1000 description: "Priority class for prod jobs" --- apiVersion: kueue.x-k8s.io/v1beta1 kind: WorkloadPriorityClass metadata: name: dev-priority value: 100 description: "Priority class for development jobs" EOF
-
Verify the setup by checking the
ClusterQueue
objects.oc get cq
You should see
team-a-cq
,team-b-cq
, andshared-cq
listed with a status ofActive
.
4.2. 2. Deploy the Low-Priority Workload (Team B)
Now, acting as Team B, submit a RayCluster
job. This job requests 4 CPU cores, consuming the entire shared quota.
-
Let’s create the
Team B Ray Cluster
using the following command on the terminal.cat <<EOF | oc create -f - # Team B is using dev-priority apiVersion: ray.io/v1 kind: RayCluster metadata: labels: kueue.x-k8s.io/queue-name: local-queue kueue.x-k8s.io/priority-class: dev-priority # Lower priority name: raycluster-dev namespace: team-b spec: rayVersion: 2.7.0 headGroupSpec: template: spec: containers: - name: ray-head image: quay.io/project-codeflare/ray:2.20.0-py39-cu118 resources: limits: { cpu: "2", memory: 3G } requests: { cpu: "2", memory: 3G } rayStartParams: {} workerGroupSpecs: - groupName: worker-group replicas: 1 minReplicas: 1 maxReplicas: 1 template: spec: containers: - name: machine-learning image: quay.io/project-codeflare/ray:2.20.0-py39-cu118 resources: limits: { cpu: "2", memory: 3G } requests: { cpu: "2", memory: 3G } rayStartParams: {} EOF
-
Verify that the job is admitted and running.
Check the Kueue workload status;
ADMITTED
should beTrue
.oc get workload -n team-b
Check that the pods are
Running
.oc get pods -n team-b -w
At this point, Team B’s job has successfully claimed all 4 CPUs from the shared cohort.
4.3. 3. Deploy the High-Priority Workload (Team A)
Next, as Team A, submit a RayCluster
that requires a GPU and 4 CPU cores. Since the CPU pool is full, Kueue must preempt Team B’s job.
-
Let’s create the
Team A Ray Cluster
using the following command on the terminal.cat <<EOF | oc create -f - # Team A is using prod-priority and will prempt team A because shared-cq quota apiVersion: ray.io/v1 kind: RayCluster metadata: labels: kueue.x-k8s.io/queue-name: local-queue kueue.x-k8s.io/priority-class: prod-priority # Higher priority name: raycluster-prod namespace: team-a spec: rayVersion: 2.7.0 headGroupSpec: template: spec: containers: - name: ray-head image: quay.io/project-codeflare/ray:2.20.0-py39-cu118 resources: limits: { cpu: "2", memory: 3G } requests: { cpu: "2", memory: 3G } rayStartParams: {} workerGroupSpecs: - groupName: worker-group replicas: 1 minReplicas: 1 maxReplicas: 1 template: spec: containers: - name: machine-learning image: quay.io/project-codeflare/ray:2.20.0-py39-cu118 resources: limits: { cpu: "2", memory: 3G, "nvidia.com/gpu": "1" } requests: { cpu: "2", memory: 3G, "nvidia.com/gpu": "1" } tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule rayStartParams: {} EOF
4.4. 4. Observe and Verify Preemption
This is the key part of the lab. We will watch as Kueue automatically evicts Team B’s workload.
-
Watch the status of the workloads in both namespaces. The change should happen within a minute.
oc get workload -A -w
You will see the
raycluster-dev
workload inteam-b
switch itsADMITTED
status fromTrue
toFalse
. Shortly after, theraycluster-prod
workload inteam-a
will switch itsADMITTED
status toTrue
. -
Check the pods in both namespaces.
Team B’s pods should now be in the
Terminating
state.oc get pods -n team-b -w
Team A’s pods should be in the
ContainerCreating
orRunning
state.oc get pods -n team-a
-
To see the explicit preemption message, describe Team B’s workload change the name of the workload to the right one.
oc describe workload -n team-b **raycluster-raycluster-dev**
Look for the
Events
section at the bottom. You will see a clear message stating that the workload was Evicted because it was preempted by the higher-priority workload.Example Event OutputEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Preempted 2m16s kueue-admission Preempted to accommodate a workload (UID: 8b76853e-b03f-4dee-a57e-0a9157b5c8a3, JobUID: 4a7827c1-20c9-461e-b369-5e5d029630ff) due to reclamation within the cohort while borrowing Warning Pending 103s kueue-admission Workload no longer fits after processing another workload Warning Pending 103s kueue-admission couldn't assign flavors to pod set worker-group: insufficient unused quota for cpu in flavor default-flavor, 2 more needed
4.5. Cleanup
To clean up all the resources created during this lab, delete the namespaces and the YAML files you created.
-
Delete the namespaces, which will also remove the
RayClusters
and other namespaced objects.oc delete ns team-a team-b
-
Delete the cluster-scoped Kueue objects by deleting the setup file.
#!/bin/sh echo "Deleting all rayclusters" oc delete raycluster --all --all-namespaces > /dev/null echo "Deleting all localqueue" oc delete localqueue --all --all-namespaces > /dev/null echo "Deleting all clusterqueues" oc delete clusterqueue --all --all-namespaces > /dev/null echo "Deleting all resourceflavors" oc delete resourceflavor --all --all-namespaces > /dev/null
5. Conclusion
You have successfully demonstrated a sophisticated resource management scenario using Kueue. You configured a shared resource cohort for two teams with different priorities, and verified that Kueue’s preemption mechanism works as expected, allowing a high-priority workload to claim resources from a running, lower-priority workload.
This powerful capability is crucial for managing expensive resources like GPUs efficiently and fairly in a multi-tenant AI/ML platform.
References
-
[1] Kueue. Documentation. Available from: https://kueue.sigs.k8s.io/docs/overview/.
-
[2] AI on OpenShift Contrib Repo. Kueue Preemption Example. Available from: https://github.com/opendatahub-io-contrib/ai-on-openshift.