Custom Notebook in RHOAI

Need for custom notebooks

RHOAI provides many pre-built notebooks like Standard Data Science notebook, Tensorflow notebook, Pytorch notebook etc. Data scientists can spin up these notebooks and start running their experiments and creating their machine learning models without much set up.

These inbuilt notebooks come with most of the packages that one would need to run their Data Science projects. But the world of Data Science and Machine Learning is vast and there might be a need to download other python packages.

In this case, the user can use the pip install command inside the notebook but this might lead to issues like kernel mismatches or limited visibility and control.

The best and recommended solution in this case is to create a custom notebook. RHOAI has made it extremely easy to import a notebook image from any image registry.

In the AI267 course, the content teaches you how to create a custom notebook image by taking an existing notebook image and adding additional python packages to it. This is a great way to create a custom notebook image, but what if you or your customer needs to create a custom notebook image from scratch or need to customize the notebook image even more. Your customer may need to know what is in the notebook image itself, or they may need to add additional software (such as database drivers) to the notebook image. We can utilize the notebook repository: https://github.com/opendatahub-io/notebooks/tree/main to create a custom notebook image from the ground up. This repository where ODH and RHOAI workbenches are being created from. Let’s use this repository to create a custom notebook image.

Notebooks Repository

The notebooks repository uses a makefile to create the notebook image. The makefile is a file that contains a set of directives used by a make build automation tool to build a set of target images. It utilizes a build chain feature where it builds the base image first and then builds the images on top of the base image.

Take a look at the developer guide to understand the notebooks build chain.

The notebooks repository also has a Wiki of the different variations of images here: https://github.com/opendatahub-io/notebooks/wiki/Workbenches

Steps to create a custom notebook

You can either use your local machine or a container to build the custom notebook image. The steps are the same for both except for the setup of the environment.

Using your dev workstation:

The following steps are to make sure you have the necessary tools to run the notebooks repository on your local machine.

Make sure you have Python 3.11 installed on your machine. You can use the pyenv tool to manage your Python versions. Also install Pip for installing Python packages. Tested with version pip 25.0.1
Install make (or gmake) on your local machine. Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program. We need make version 4+ to build the notebooks repository.
Install pipenv on your local machine. Pipenv is a tool that aims to bring the best of all packaging worlds (bundled dependencies, virtual environments, and package management) to the Python world. It automatically creates and manages a virtual environment for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. Tested with version 2023.12.1
Install go on your local machine. The Go programming language is an open source project to make programmers more productive. Tested with version 1.24.0
Install podman on your local machine. Podman is a tool to manage OCI containers and pods. It is a daemonless container engine for developing, managing, and running OCI containers on your workstation. Podman is a drop-in replacement for Docker. Tested with version 5.3.0

Using Podman to run the builder container:

We have created a container image that contains all the necessary tools to build the notebooks repository. This image is based on the repository: https://github.com/containers/image_build. You can find the image contents here: https://github.com/redhat-ai-services/rhoai-platform-foundation-bootcamp-instructions/tree/main/custom-notebook-builder-container/podman/Containerfile. This image is based on the latest fedora image and contains: podman, git, make, which, go, pip, python3.11 and pipenv.

The image has already been built for you and is located here: quay.io/asheet/custom-nb-builder-container:latest.

We can run this image using Podman and build/push our custom notebooks from within the container.

Run the following command to run the container on podman:

podman run -it --privileged --name custom-notebook-builder quay.io/asheet/custom-nb-builder-container:latest

If running on an M-powered Mac, you will need to use the --platform linux/amd64 flag to run the container.

podman run -it --privileged --platform linux/amd64 --name custom-notebook-builder quay.io/asheet/custom-nb-builder-container:latest

This will run the container and you will now be inside the container.

Using the notebooks repository to create a custom notebook image

Log into quay.io from your web browser.
If you get an error saying your account is already associated with an existing Quay account.
If you get this error:

The e-mail address your-username@redhat.com is already associated with an existing Quay account. Please log in with your username and password and associate your Red Hat account to use it in the future.

The error message you see can be remedied by the following steps:
- Please go to https://recovery.quay.io/signin/ and sign in with your existing Quay.io email address and password.
- Once in, open account settings, click on "External logins" on the left side, and then on the "Attach" link.
- Go to quay.io/signin and enter your RHCP data and you should be set, RHSSO button should work now. If needed try to Detach and reattach when in recovery mode. All accounts in Quay must have unique e-mail addresses and usernames. The issue here is that there exists a user under e-mail (your email address) in Quay’s db. Logging in with a user with the same e-mail address will not work because users are not connected via a special table in Quay’s db and Quay will perceive logging in via that button as though a new user is coming in. Quay will then check if the e-mail address is unique and will find that it’s not and will error out.
Clone the notebooks git repository with the following command:
```
git clone https://github.com/opendatahub-io/notebooks.git
```
Navigate into the folder you just cloned.
```
cd notebooks
```
Log into the quay.io registery from your CLI.

Generate quay.io cli password from settings

Log into quay.io and go to Account Settings in the top right. In the Account settings, Generate Encrypted Password.

Enter your password and then choose the Podman Login or Docker Login tab. Use this when logging into quay.io from Podman or Docker.
```
podman login -u='username' -p='password' quay.io
```
Open and read the Makefile and try to understand the build chain and how the make file works.
Let’s build an image from the notebooks project, specifically the Jupyter Data Science UBI9 Python 3.11 image.
Before we build, let’s modify the image and add a Python package to the jupyter-data-science-ubi9-python-3.11 image.
Change directory to the jupyter/datascience/ubi9-python-3.11 directory.
```
cd jupyter/datascience/ubi9-python-3.11
```
Open and edit the Pipfile in the jupyter/datascience/ubi9-python-3.11 directory. This file contains the list of Python packages that are installed in the image.
Add a new Python package to the Pipfile. For example, let’s add the art package to the Pipfile so we can make fancy ASCII art in our notebook.

After line 22, add the following line and save the file:
```
art = "~=6.4.0"
```
Use pipenv lock to add the new package to the Pipfile.lock. This updates the pip lock file with the new package. Run the following command:
```
pipenv lock
```
We now will need to update the requirements.txt file. The notebooks repository has a handy script that will do this for us. Run the following command (still in the jupyter/datascience/ubi9-python-3.11 directory):
```
../../../scripts/sync-requirements-txt.sh
```
You can now see that the Art package has been added to the requirements.txt file. As well as updating some of the existing packages.
Now that we have added the new package to the image, we need to build the image. Change directory to the root of the repository project. After running this command you should be in the notebooks directory.
```
cd ../../../
```
Let’s build and push the jupyter-datascience-ubi9-python-3.11 image. Running the make file will build the image and push the image to your quay.io repository. Run the following command:
```
make jupyter-datascience-ubi9-python-3.11 -e  IMAGE_REGISTRY=quay.io/{quay_id}/workbench-images  -e  RELEASE=2024b
```
If using gmake
gmake jupyter-datascience-ubi9-python-3.11 -e IMAGE_REGISTRY=quay.io/{quay_id}/workbench-images -e RELEASE=2024b
Note: If you’re on a M-powered Mac, you need to build with --platform linux/amd64. In the Makefile, you can add the --platform linux/amd64 build arguments to the container build command. Edit line 69 in the Makefile to be: $(eval BUILD_ARGS := --platform linux/amd64)

This takes some time to build. It will build the base image first and then build the image on top of the base image. The image will then be pushed to the quay.io registry under your account in the workbench folder.
Check your quay registry to see the image you just built. https://quay.io/repository/quay_id/workbench-images?tab=tags
A new repository named workbench-images will get created in your quay.io account. This will get created as a Private repository. Convert it into a public repository in the settings.

How to make a quay.io repository public

Go to your quay.io repository and click on the Settings tab.
Scroll down to the Repository Visibility section.
Click on the Make Public button.
Confirm the action by clicking on the Make Public button again.
The repository is now public and can be accessed by anyone.

Create a custom-workbench in RHOAI

Let’s now add our newly build image (that’s in your quay repository) to RHOAI. In the parasol-insurance tenant (ai-accelerator/tenants/parasol-insurance), create a directory named custom-workbench
Create the base and overlays directories inside the custom-workbench directory

Create a file named kustomization.yaml inside the custom-workbench/base directory with the following content:

kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - custom-workbench-is.yaml
  - custom-workbench-pvc.yaml
  - custom-workbench-notebook.yaml

Create a file named custom-workbench-is.yaml inside the custom-workbench/base directory with the following content:

custom-workbench-is.yaml

kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
  annotations:
    opendatahub.io/notebook-image-creator: admin
    opendatahub.io/notebook-image-desc: This is a custom notebook for running the parasol insurance code
    opendatahub.io/notebook-image-name: Custom Notebook
    opendatahub.io/notebook-image-url: 'quay.io/{quay_id}/workbench-images:jupyter-datascience-ubi9-python-3.11-2024b_{update_this}'
    opendatahub.io/recommended-accelerators: '[]'
  name: custom-notebook
  namespace: redhat-ods-applications
  labels:
    app.kubernetes.io/created-by: byon
    opendatahub.io/dashboard: 'true'
    opendatahub.io/notebook-image: 'true'
spec:
  lookupPolicy:
    local: true
  tags:
    - name: latest
      annotations:
        opendatahub.io/notebook-python-dependencies: '[]'
        opendatahub.io/notebook-software: '[]'
        openshift.io/imported-from: 'quay.io/{quay_id}/workbench-images:jupyter-datascience-ubi9-python-3.11-2024b_{update_this}'
      from:
        kind: DockerImage
        name: 'quay.io/{quay_id}/workbench-images:jupyter-datascience-ubi9-python-3.11-2024b_{update_this}'
      importPolicy:
        importMode: Legacy
      referencePolicy:
        type: Source

Replace quay_id with your quay id. This ensures that the image stream you are creating references to the image you pushed to quay.io

Create a file named custom-workbench-pvc.yaml inside the custom-workbench/base directory with the following content:

custom-workbench-pvc.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: custom-workbench
  namespace: parasol-insurance
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  volumeMode: Filesystem

Create a file named custom-workbench-notebook.yaml inside the custom-workbench/base directory with the following content:

custom-workbench-notebook.yaml

apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  annotations:
    notebooks.opendatahub.io/inject-oauth: 'true'
    opendatahub.io/image-display-name: Datascience notebook
    notebooks.opendatahub.io/oauth-logout-url: ''
    opendatahub.io/accelerator-name: ''
    openshift.io/description: ''
    openshift.io/display-name: custom-workbench
    notebooks.opendatahub.io/last-image-selection: 'custom-notebook:latest'
    argocd.argoproj.io/sync-options: ServerSideApply=true
  name: custom-workbench
  namespace: parasol-insurance
spec:
  template:
    spec:
      affinity: {}
      containers:
        - name: custom-workbench
          image: 'image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/custom-notebook:latest'
          resources:
            limits:
              cpu: '2'
              memory: 8Gi
            requests:
              cpu: '1'
              memory: 8Gi
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /notebook/parasol-insurance/custom-workbench/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /notebook/parasol-insurance/custom-workbench/api
              port: notebook-port
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          env:
            - name: NOTEBOOK_ARGS
              value: |-
                --ServerApp.port=8888
                --ServerApp.token=''
                --ServerApp.password=''
                --ServerApp.base_url=/notebook/parasol-insurance/custom-workbench
                --ServerApp.quit_button=False
                --ServerApp.tornado_settings={"user":"user1","hub_host":"","hub_prefix":"/projects/parasol-insurance"}
            - name: JUPYTER_IMAGE
              value: 'image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/custom-notebook:latest'
            - name: PIP_CERT
              value: /etc/pki/tls/custom-certs/ca-bundle.crt
            - name: REQUESTS_CA_BUNDLE
              value: /etc/pki/tls/custom-certs/ca-bundle.crt
            - name: SSL_CERT_FILE
              value: /etc/pki/tls/custom-certs/ca-bundle.crt
            - name: PIPELINES_SSL_SA_CERTS
              value: /etc/pki/tls/custom-certs/ca-bundle.crt
            - name: GIT_SSL_CAINFO
              value: /etc/pki/tls/custom-certs/ca-bundle.crt
          ports:
            - containerPort: 8888
              name: notebook-port
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - mountPath: /opt/app-root/src
              name: custom-workbench
            - mountPath: /dev/shm
              name: shm
            - mountPath: /etc/pki/tls/custom-certs/ca-bundle.crt
              name: trusted-ca
              readOnly: true
              subPath: ca-bundle.crt
          workingDir: /opt/app-root/src
      enableServiceLinks: false
      serviceAccountName: custom-workbench
      volumes:
        - name: custom-workbench
          persistentVolumeClaim:
            claimName: custom-workbench
        - emptyDir:
            medium: Memory
          name: shm
        - configMap:
            items:
              - key: ca-bundle.crt
                path: ca-bundle.crt
            name: workbench-trusted-ca-bundle
            optional: true
          name: trusted-ca

Create a directory named parasol-insurance-dev under the custom-workbench/overlays directory
Create a file named kustomization.yaml inside the custom-workbench/overlays/parasol-insurance-dev directory with the following content:
kustomization.yaml
```
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base
```
Push the changes to the git repository
Navigate to the parasol-insurance data science project in RHOAI, and notice the custom-workbench notebook available in the Workbenches tab:

Verify the new custom workbench spins up successfully and the art package is available in the notebook.

Use pip list in the notebook terminal to verify the art package is installed.

Create a new notebook and run the following code to see the ASCII art:

from art import *
hello_world=text2art("Hello World")
print(hello_world)

Please add an overlay for the parasol-insurance-prod environment as well. This is not covered in this lab but is a good exercise to do.

To check your work please refer to This Branch

Questions for Further Consideration

Additional questions that could be discussed for this topic:

How many Python packages are included in your typical data scientist development environment? Are there any packages that are unique to your team?
How do you handle continuous updates in your development environment, remembering that AI/ML is an evolving landscape, and new packages are released all the time, and existing packages are undergoing very frequent updates?
Can data scientists ask for new packages in a securely controlled development environment?
Where do you store source code for model experimentation and training?
Do you think that cluster storage (such as an OpenShift PVC) is a good permanent location for source code, so that in the event of failure the source is not lost?
How do your teams of data scientists collaborate on notebooks when training models or performing other experiments?