LLM Compressor, Model Quantization and Sparsification techniques and recipes
Existing lab resources
-
Summit lab for LLM compression
https://rhpds.github.io/showroom-summit2025-lb2959-neural-magic/modules/index.html
LLM evaluation with GuideLLM
Existing lab resources
-
GuideLLM notebook and lab
https://redhatquickcourses.github.io/genai-vllm/genai-vllm/1/rhoai_deploy/rhoai_query.html -
Link to Git repo with code
https://github.com/RedHatQuickCourses/genai-apps.git -
Performance benchmarking with GuideLLM
https://redhatquickcourses.github.io/genai-vllm/genai-vllm/1/rhoai_deploy/guide_llm.html -
evals_workshop
https://github.com/taylorjordanNC/evals_workshop
Guide with Resources
Inference Server on Multiple Platforms
Existing lab resources
-
RH Inference server on multiple platforms
https://github.com/redhat-ai-services/inference-service-on-multiple-platforms -
RH Inference server tutorial
https://docs.google.com/document/d/11-Oiomiih78dBjIfClISSQBKqb0Ij4UJg31g0dO5XIc/edit?usp=sharing
Existing Slides
-
PSAP LLM Performance Benchmarking - July 11 2025
https://docs.google.com/presentation/d/1IXReNsWRUcy1C9nGsnnhkG_H-OG5UQ2nYS2KmrXr340/edit?usp=sharing
Existing lab resources
-
Training: vLLM Master Class:
https://redhat-ai-services.github.io/vllm-showroom/modules/index.html -
Training: Optimizing vLLM for RHEL AI and OpenShift AI:
https://rhpds.github.io/showroom-summit2025-lb2959-neural-magic/modules/index.html -
RH Inference server docs - key vLLM serving arguments
https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/3.1/html-single/vllm_server_arguments/index#key-server-arguments-server-arguments -
vLLM: Optimizing and Serving Models on OpenShift AI https://redhatquickcourses.github.io/genai-vllm/genai-vllm/1/index.html