Lab Exercise: Exposing a new model in 3Scale

Now that you have seen a bit of 3Scale, it is time to put a new model in there but this time running in a GPU which will give us the performance we are missing.

Deploy a new model onto RHOAI

For this we’ll use another ai-accelerator-example named vllm-modelcar-serverless.

Follow these steps:
  1. Go to your terminal on the ai-accelerator-examples repo you previously cloned.

  2. Run ./bootstrap.sh

  3. Select this time the example vllm-modelcar-serverless, number 5 in this output:

    103 maas as platform engineer 01

    At this point you might be thinking, "wait, is this ok to run bootstrap again?" and yes, it is the aim of this ai-accelerator-examples to maintain flexibility. Each example is independent from each other, and all we are doing is enhancing our cluster with the new bits that allow the rapid experimentation!

  4. This will create a new project vllm-modelcar-serverless where a new InferenceService will be created for the granite-3.3-2b-instruct model. It might take up to 10mins or more before the model is ready, since this will trigger a GPU node auto-scale. Meanwhile we can continue configuring 3Scale.

Add a new backend & product to 3Scale

Now, setting up a new Application in 3Scale Gateway requires a set of resources and configuration. This time around we won’t use GitOps but straight a manual setup which work fine for our purposes of testing this new model exposed in 3scale.

  1. Download this oc template file in a local directory: APICast CRDs

  2. Let’s get the URL for the serving model

    GRANITE_URL=$(oc get -n vllm-modelcar-serverless inferenceservice -o json | jq -r '.items[] | select(.status.url) | .status.url')
  3. Now let’s apply the resources in the downloaded template file

    oc process -f apicast_crds.yaml -p GRANITE_URL=$GRANITE_URL | oc apply -n 3scale -f -
  4. This has created the backend, product and promoted those to production 3scale. Now we need the dev1 to subscribe to this application.

    1. Navigate to the 3Scale Admin portal.

      oc get routes -n 3scale -o json | jq -r '.items[] | select(.spec.host | contains("maas-admin")) | "https://"+.spec.host'
    2. Go to Audience → Accounts → Listing → dev1 → Service Subscriptions

subscribe
  1. Click subscribe in the granite-3.3-2b-instruct available service subscription.

  2. Select the Default plan and click Create subscription

    createsub

Developer creates API Keys for this new application

Now the developer can create an application and obtain an API Key from the developer portal. You can follow the same steps you did in the previous section when we accessed the model in the developer portal. This time you will create a new application using the Create New Application button and filling it this form:

createapp

You should be able to see a new application with its key that can be used in AnythingLLM.

You can achieve that by following steps in previoussection: Developer portal Model url, key and name

The model name when configuring AnythingLLM should be just granite not the full model name granite-3.3-2b-instruct

And lastly re-configure AnythingLLM. The wrench icon at the bottom of the screen and navigate to API Providers → LLM to set the new URL, Key and Model Name.

llmsettings