Intro to Model Quantization & Compression for Gen AI Application Engineers

Master the essential techniques to optimize Generative AI models for efficiency, speed, and reduced memory footprint, crucial for real-world Gen AI application deployment.

Fundamentals of Model Efficiency and Quantization

Unit 1: The Need for Speed and Size

Unit 2: Introduction to Quantization

Practical Compression and Evaluation for Gen AI Models

Unit 1: Beyond Quantization: Other Compression Techniques

Unit 2: Hands-on with Quantization Tools

Unit 3: Evaluating Compressed Gen AI Models