Foundations of Machine Learning Projects
ML Project Lifecycle Intro
Problem Framing & Goals
Data Collection & Prep
Model Selection & Training
Evaluation & Refinement
Supervised Learning Intro
Regression Problems
Classification Problems
Unsupervised Learning Intro
Clustering Problems
Model Deployment
Monitoring & Maintenance
Ethical ML Considerations
Project Success Metrics
ML Project Pitfalls
Setting Up Your ML Environment
Why Python for ML?
Anaconda vs. Miniconda
Installing Miniconda
Your First Conda Environment
Managing Environments
Exporting & Importing Env
Installing Core Libraries
NumPy: The Array Powerhouse
Pandas: Data's Best Friend
Matplotlib & Seaborn Basics
Scikit-learn: ML Toolkit
Jupyter Notebooks Intro
Cells & Kernels
JupyterLab: The IDE
Tips for Productive ML
Data Acquisition and Initial Exploration
Where to Find Data?
Data Source Considerations
Your First Dataset
Meet Pandas DataFrames
Loading CSV Files
Loading Excel Files
Loading JSON Data
Other Data Formats
Peeking at Your Data
Data's Blueprint: Info & Types
Descriptive Statistics
Counting Categories
Visualizing Distributions
Visualizing Relationships
Correlation Heatmaps
Handling Missing Values and Outliers
Why Data Goes Missing
Spotting Missing Values
Deletion Strategies
Simple Imputation: Mean/Median
Simple Imputation: Mode
Forward/Backward Fill
Advanced Imputation Intro
What's an Outlier?
Visualizing Outliers
Statistical Outlier Detection
Deletion of Outliers
Capping Outliers
Transformation for Outliers
Binning for Outliers
Choosing Your Strategy
Feature Engineering: Encoding Categorical Variables
What's Categorical Data?
Nominal vs. Ordinal
Why Encode Categories?
Label Encoding Explained
Label Encoding with Pandas
Label Encoding with Scikit-learn
One-Hot Encoding Explained
One-Hot Encoding with Pandas
One-Hot Encoding with Scikit-learn
Handling High Cardinality
Encoding New Data
Encoding in Pipelines
Encoding's Model Impact
When to Use Which?
Encoding Best Practices
Feature Engineering: Scaling and Transformation
The Need for Scaling
Algorithms That Care
Scaling vs. Normalization
Standardization Explained
StandardScaler in Action
StandardScaler: Code Walk
Min-Max Explained
MinMaxScaler in Action
MinMaxScaler: Code Walk
When to Standardize
When to Min-Max Scale
Impact on Model Performance
Robust Scaling
Other Transformations
Scaling Pipelines
Scaling Best Practices
Introduction to Supervised Learning: Regression
What is Supervised ML?
Regression: The Basics
Linear Regression Intuition
Simple Linear Regression
Cost Function: MSE
Gradient Descent Intro
Multiple Linear Regression
Assumptions of Linear Reg
Scikit-learn for Regression
Interpreting Coefficients
MAE: Mean Absolute Error
RMSE: Root Mean Squared Error
R-squared: Explained Variance
Regression Workflow: Part 1
Regression Workflow: Part 2
Supervised Learning: Classification Fundamentals
What is Classification?
Binary vs. Multi-Class
Meet Logistic Regression
Binary Logistic Regression
Multi-Class Logistic Regression
Probability Scores
Decision Boundaries
The Classification Threshold
Accuracy: The Basics
Precision & Recall
F1-Score: The Balance
Classification Report
Beyond Basic Metrics
Imbalanced Datasets
Decision Trees and Ensemble Methods
What's a Decision Tree?
Splitting Decisions
Building a Decision Tree
Regression Trees in Action
Visualizing Your Tree
Overfitting: A Tree's Foe
Pruning Your Tree
Hyperparameter Tuning Trees
Why Ensemble?
Bagging: Bootstrap & Aggregate
Random Forests Unpacked
Building a Forest (Class.)
Building a Forest (Reg.)
Forest Hyperparameters
Feature Importance with RF
Advanced Ensemble Methods: Gradient Boosting
Boosting: The Core Idea
Gradient Boosting Intuition
GBM: The Algorithm
GBM for Regression
GBM for Classification
Key GBM Hyperparameters
XGBoost: The Champion
XGBoost in Action
LightGBM: The Speedster
LightGBM in Action
Tuning GBM Hyperparameters
Tuning XGBoost Hyperparameters
Tuning LightGBM Hyperparameters
Early Stopping & Cross-Val
Comparing Boosters
Unsupervised Learning: Clustering with K-Means
What is Unsupervised ML?
Clustering: Grouping Data
K-Means: The Basics
K-Means: Step-by-Step
Implementing K-Means
Choosing the Right K
Inertia: Sum of Squares
Silhouette Score Explained
Calculating Silhouette Score
Visualizing Clusters
Customer Segmentation
Image Compression
K-Means++ Initialization
Challenges of K-Means
Beyond K-Means
Dimensionality Reduction with PCA
Why Less is More
Types of Reduction
Meet Principal Components
Variance & Covariance
Eigen-What?
PCA's Core Algorithm
PCA with Scikit-learn
Choosing Components
Visualizing with PCA
PCA as a Preprocessor
Interpreting Components
When to Use PCA
Incremental PCA
Kernel PCA
Beyond PCA
Model Evaluation and Cross-Validation
Why Evaluate ML Models?
Train, Validate, Test
Splitting Your Data
What is Cross-Validation?
K-Fold Cross-Validation
Stratified K-Fold
Accuracy: Friend or Foe?
The Confusion Matrix
Precision & Recall
F1-Score: The Harmonic Mean
ROC Curves & AUC
Plotting ROC & AUC
Precision-Recall Curves
MAE: Mean Absolute Error
RMSE: Root Mean Squared Error
R-squared: Explained Variance
Hyperparameter Tuning and Model Selection
Parameters vs. Hyperparams
Why Tune Hyperparameters?
Manual Tuning: The Basics
Grid Search Explained
Grid Search in Scikit-learn
Random Search Explained
Random Search in Scikit-learn
Bias and Variance Defined
The Trade-off Explained
Diagnosing Bias & Variance
Mitigating Bias & Variance
Beyond Accuracy
Comparing Models
The Optimal Model
Automated ML (AutoML)
Model Deployment Concepts and Practices
From Notebook to Reality
The ML Deployment Cycle
Model Persistence: Why?
Pickle It!
Joblib for ML Models
Saving Preprocessing Steps
What's an API?
Flask for ML APIs
FastAPI for ML APIs
Containerization Basics
Building a Docker Image
Running Your Container
Deployment Platforms
Monitoring Deployed Models
Retraining & Updates
Ethical Considerations and Bias in ML
What is ML Bias?
Sources of Bias: Data
Sources of Bias: Algorithms
Sources of Bias: Human Factors
Bias vs. Variance Revisited
Fairness in ML
Accountability in ML
Transparency & Explainability
Privacy and Security
Ethical AI Principles
Detecting Bias: Data
Detecting Bias: Models
Mitigating Bias: Pre-processing
Mitigating Bias: In-processing
Mitigating Bias: Post-processing