Deploying RH Inference Server on OpenShift
In this module, we will use Helm to deploy RH Inference Server on OpenShift. The chart we will use deploys [vLLM](https://docs.vllm.ai/en/latest/), defaulting to using the [Red Hat AI Inference Server](https://docs.redhat.com/en/documentation/red_hat_ai_inference_server) official image.
Prerequisites
-
An OpenShift cluster with NVIDIA GPUs available (by default, configurable) and configured properly (e.g. the NVIDIA GPU Operator). This is provided in the lab environment.
-
This cluster should be your current context, e.g. you see the correct cluster when you run
oc whoami --show-server
-
You need permission to create Namespaces and consume GPUs, but do not require higher privilege to deploy these charts
-
Deploy the RHAIIS Chart
The chart is not currently published to a [Helm repository](https://helm.sh/docs/topics/chart_repository/) as it is quickly evolving. We will deploy manually via the workshop repository.
To deploy the rhaiis_ocp
chart, you need to update the dependencies (since the charts are not published):
helm dependency update charts/chat
Then, you can deploy the defaults quickly:
helm upgrade --install -n rhaiis --create-namespace rhaiis workshop_code/deploy_vllm/rhaiis_ocp
Customization
The defaults for the chart is enough for a normal, default installation of OpenShift with GPU nodes configured in the most common ways.
If you would like to change the model that is served, change the storage method for the model, operate on a pre-downloaded model, or other customizations, you should use [Helm values](https://helm.sh/docs/chart_template_guide/values_files/) to override the defaults.