Intro to Model Quantization & Compression for Gen AI Application Engineers
Master the essential techniques to optimize Generative AI models for efficiency, speed, and reduced memory footprint, crucial for real-world Gen AI application deployment.
...
Fundamentals of Model Efficiency and Quantization
Unit 1: The Need for Speed and Size
Unit 2: Introduction to Quantization
Practical Compression and Evaluation for Gen AI Models
Unit 1: Beyond Quantization: Other Compression Techniques