LLM Compressor, Model Quantization and Sparsification techniques and recipes

Existing lab resources

Potential Topics to Cover in the Lab

LLM-Compressor

  • Understanding why you SHOULD NOT quantize your own model (and the small number of use cases where you should)

  • Compressing a model using an existing recipe

LLM evaluation with GuideLLM

Potential Topics to Cover in the Lab

Running Load Tests with GuideLLM

  • Understanding common LLM metrics

  • Configuring --max-num-seqs based on performance tests

Guide with Resources

Inference Server on Multiple Platforms

Potential Topics to Cover in the Lab

Existing Slides

Potential Topics to Cover in the Lab

Securing vLLM Endpoints

  • Managing service accounts for other apps

Troubleshooting vLLM instances

  • Where to find events/logs