SQL for AI/ML Engineers in Snowflake: From Beginner to Feature Engineering

Master SQL in Snowflake for AI/ML: data wrangling, feature engineering, and model-ready dataset creation.

Introduction to SQL and Snowflake for AI/ML

Unit 1: SQL and Snowflake: The Big Picture

Unit 2: Setting Up Your Snowflake Environment

Unit 3: Exploring Snowflake's Architecture

SQL Basics: Data Types and Table Creation

Unit 1: SQL Fundamentals

Unit 2: Data Types in Snowflake

Unit 3: Table Creation and Constraints

Data Manipulation: INSERT, UPDATE, and DELETE

Unit 1: Inserting Data into Tables

Unit 2: Updating Data in Tables

Unit 3: Deleting Data from Tables

Data Retrieval: SELECT Statements and Basic Filtering

Unit 1: Selecting Data from Tables

Unit 2: Filtering Data with the WHERE Clause

Unit 3: Sorting and Limiting Results

Advanced Filtering: Operators and Logical Expressions

Unit 1: Comparison Operators

Unit 2: Logical Operators

Unit 3: IN and BETWEEN Operators

Unit 4: LIKE Operator and Pattern Matching

SQL Functions: Data Type Conversion

Unit 1: Introduction to Data Type Conversion

Unit 2: CAST Function

Unit 3: CONVERT Function

Unit 4: Error Handling and Best Practices

SQL Functions: String Manipulation

Unit 1: String Basics in SQL

Unit 2: Trimming and Case Conversion

Unit 3: Finding String Positions

SQL Functions: Date and Time Manipulation

Unit 1: Date Part Extraction

Unit 2: Date Arithmetic

Unit 3: Date and Time Formatting

Unit 4: String to Date Conversion

Handling Missing Values: NULLIF and COALESCE

Unit 1: Understanding and Identifying NULL Values

Unit 2: Replacing Values with NULLIF

Unit 3: Replacing NULL Values with COALESCE

Aggregate Functions: COUNT, AVG, SUM, MIN, MAX

Unit 1: Introduction to Aggregate Functions

Unit 2: Counting with COUNT

Unit 3: Calculating Averages, Sums, and Extremes

Unit 4: Aggregate Functions and GROUP BY

Grouping Data: GROUP BY Clause

Unit 1: Fundamentals of GROUP BY

Unit 2: Aggregate Functions and GROUP BY

Unit 3: Filtering Grouped Data with HAVING

Joining Tables: INNER JOIN

Unit 1: Introduction to INNER JOIN

Unit 2: Practical INNER JOIN Examples

Unit 3: Advanced INNER JOIN Concepts

Joining Tables: LEFT, RIGHT, and FULL OUTER JOIN

Unit 1: Understanding Outer Joins

Unit 2: Practical Applications of Outer Joins

Unit 3: Advanced Outer Join Techniques

Joining Tables: Self-Joins

Unit 1: Self-Joins: The Basics

Unit 2: Advanced Self-Join Techniques

Unit 3: Real-World Self-Join Applications

Subqueries: Introduction and Basic Usage

Unit 1: Subquery Fundamentals

Unit 2: Subqueries in the WHERE Clause

Unit 3: Subqueries in the SELECT Clause

Subqueries: Correlated Subqueries

Unit 1: Correlated Subquery Fundamentals

Unit 2: Advanced Correlated Subquery Techniques

Unit 3: Performance and Alternatives

Views: Creating and Using Views

Unit 1: Introduction to Views

Unit 2: Creating Views

Unit 3: Querying and Managing Views

Unit 4: Advanced View Concepts

Common Table Expressions (CTEs): Introduction and Basic Usage

Unit 1: CTEs: The Basics

Unit 2: Intermediate CTE Usage

Unit 3: CTEs: Scope and Limitations

CTEs: Recursive CTEs

Unit 1: Recursive CTE Fundamentals

Unit 2: Advanced Recursive CTE Techniques

Unit 3: Real-World Applications and Considerations

Window Functions: Introduction and Basic Usage

Unit 1: Introduction to Window Functions

Unit 2: Ranking Window Functions

Unit 3: Value Window Functions

Window Functions: Aggregate Window Functions

Unit 1: Aggregate Window Functions: The Basics

Unit 2: Partitioning and Ordering in Aggregate Windows

Unit 3: Advanced Aggregate Window Function Applications

Data Cleaning: Removing Duplicates

Unit 1: Identifying Duplicates

Unit 2: Removing Duplicates

Unit 3: Impact and Prevention

Data Transformation: Pivoting and Unpivoting Data

Unit 1: Pivoting Fundamentals

Unit 2: Advanced Pivoting Techniques

Unit 3: Unpivoting Techniques

Feature Engineering: Creating New Features from Existing Data

Unit 1: Introduction to Feature Engineering

Unit 2: Creating New Features from Existing Columns

Unit 3: SQL Functions and Operators for Feature Engineering

Unit 4: Applying Domain Knowledge

Feature Engineering: Binning and Discretization

Unit 1: Introduction to Binning and Discretization

Unit 2: Implementing Binning with CASE Statements

Unit 3: Advanced Binning Techniques

Feature Engineering: One-Hot Encoding

Unit 1: One-Hot Encoding Fundamentals

Unit 2: One-Hot Encoding with CASE Statements

Unit 3: Alternative Techniques and Considerations

Feature Engineering: Text Feature Extraction

Unit 1: Text Length and Basic Counts

Unit 2: Substring Extraction

Unit 3: Pattern Matching and Advanced Extraction

Creating Training, Validation, and Test Datasets: Random Sampling

Unit 1: Understanding Train/Val/Test Splits

Unit 2: Random Sampling in Snowflake

Unit 3: Creating Datasets with Random Sampling

Creating Training, Validation, and Test Datasets: Stratified Sampling

Unit 1: Stratified Sampling Fundamentals

Unit 2: Implementing Stratified Sampling in Snowflake

Unit 3: Creating Training, Validation, and Test Sets

Data Governance: Role-Based Access Control

Unit 1: RBAC Fundamentals

Unit 2: Creating and Managing Roles

Unit 3: User Management and Best Practices

Data Governance: Data Masking

Unit 1: Introduction to Data Masking

Unit 2: Data Masking Techniques

Unit 3: Implementing Data Masking in Snowflake

Unit 4: Advanced Data Masking Concepts

Query Optimization: Understanding Query Execution Plans

Unit 1: Introduction to Query Execution Plans

Unit 2: Analyzing Query Execution Plans

Unit 3: Stages of Query Execution

Unit 4: Advanced Analysis and Optimization

Query Optimization: Indexing Strategies

Unit 1: Introduction to Indexing in Snowflake

Unit 2: Manual Indexing Techniques

Unit 3: Advanced Indexing Considerations

Snowflake's Data Sharing Capabilities for ML

Unit 1: Intro to Snowflake Data Sharing

Unit 2: Sharing Data in Snowflake

Unit 3: Data Sharing for ML

External Functions: Calling ML Models from SQL

Unit 1: Introduction to External Functions

Unit 2: Setting Up External Functions

Unit 3: Creating and Using External Functions

Unit 4: Advanced External Function Techniques

Snowflake Machine Learning: Introduction to Snowpark

Unit 1: Snowpark Fundamentals

Unit 2: Setting Up Your Snowpark Environment

Unit 3: Snowpark DataFrame API Basics

Snowpark: DataFrames and Basic Operations

Unit 1: DataFrame Creation and Basic Inspection

Unit 2: Filtering and Sorting DataFrames

Unit 3: Aggregation and Grouping

Snowpark: User-Defined Functions (UDFs)

Unit 1: UDF Fundamentals

Unit 2: Advanced UDF Techniques

Unit 3: UDF Use Cases and Optimization

Snowpark: Feature Engineering with UDFs

Unit 1: UDFs for Feature Engineering

Unit 2: Advanced UDF Feature Engineering

Unit 3: Integrating and Optimizing UDFs

Snowpark: Integration with Machine Learning Libraries

Unit 1: Scikit-learn Integration

Unit 2: TensorFlow/Keras Integration

Unit 3: PyTorch Integration

Snowflake Marketplace: Accessing and Using External Data

Unit 1: Introduction to Snowflake Marketplace

Unit 2: Subscribing to Data Products

Unit 3: Using Marketplace Data in SQL

Unit 4: Using Marketplace Data in Snowpark

Unit 5: ML Use Cases

Data Pipelines: Creating Automated Data Preparation Workflows

Unit 1: Data Pipeline Fundamentals

Unit 2: Building Pipelines with SQL & Snowpark

Unit 3: Automating and Monitoring Pipelines

Data Visualization: Connecting Snowflake to BI Tools

Unit 1: Connecting Snowflake to BI Tools

Unit 2: Visualizing Data and Building Dashboards

Unit 3: Advanced Visualization and Sharing

Advanced SQL: Working with JSON Data

Unit 1: Introduction to JSON in Snowflake

Unit 2: Querying JSON Data

Unit 3: Advanced JSON Techniques

Unit 4: Real-World JSON Examples

Advanced SQL: Working with Semi-Structured Data

Unit 1: Introduction to Semi-Structured Data in Snowflake

Unit 2: Querying JSON Data in Snowflake

Unit 3: Working with XML Data in Snowflake

Unit 4: Views, CTEs, and Feature Engineering

Advanced SQL: Geospatial Data Analysis

Unit 1: Introduction to Geospatial Data in Snowflake

Unit 2: Basic Geospatial Operations

Unit 3: Advanced Geospatial Operations

Unit 4: Geospatial Feature Engineering and ML

Advanced SQL: Time Series Analysis

Unit 1: Introduction to Time Series Analysis in Snowflake

Unit 2: Aggregating and Analyzing Time Series Data

Unit 3: Feature Engineering and Advanced Techniques

Performance Tuning: Optimizing Data Storage

Unit 1: Understanding Snowflake Storage

Unit 2: Data Compression Techniques

Unit 3: Choosing the Right Storage Format

Performance Tuning: Optimizing Data Loading

Unit 1: Data Loading Methods in Snowflake

Unit 2: Choosing the Right Loading Method

Unit 3: Optimizing Data Loading Performance

Unit 4: Ensuring Data Quality During Loading

Security: Data Encryption

Unit 1: Introduction to Data Encryption in Snowflake

Unit 2: Snowflake's Encryption Features

Unit 3: Configuring and Managing Encryption

Security: Network Security

Unit 1: Network Security Fundamentals in Snowflake

Unit 2: Configuring Network Policies

Unit 3: Advanced Network Security

Cost Management: Monitoring and Optimizing Snowflake Costs

Unit 1: Understanding Snowflake Cost Components

Unit 2: Monitoring Snowflake Costs

Unit 3: Optimizing Snowflake Costs

Best Practices: SQL Style Guide

Unit 1: Introduction to SQL Style Guides

Unit 2: Core Formatting Principles

Unit 3: Naming Conventions

Unit 4: Query Structure and Best Practices

Unit 5: Enforcement and Tools

Best Practices: Code Review

Unit 1: Introduction to SQL Code Review

Unit 2: Performing Effective SQL Code Reviews

Unit 3: Providing and Receiving Feedback

Best Practices: Version Control

Unit 1: Introduction to Version Control with Git

Unit 2: Branching and Merging

Unit 3: Collaboration and Advanced Git

Real-World Project: Building a Customer Churn Prediction Model

Unit 1: Project Setup and Data Exploration

Unit 2: Data Preparation and Feature Engineering

Unit 3: Model Training and Deployment

Unit 4: Model Evaluation and Monitoring

Real-World Project: Building a Fraud Detection Model

Unit 1: Project Setup and Data Exploration

Unit 2: Feature Engineering and Data Preparation

Unit 3: Model Training and Deployment

Real-World Project: Building a Product Recommendation System

Unit 1: Project Setup and Data Exploration

Unit 2: Data Preparation and Feature Engineering

Unit 3: Model Training and Deployment

Advanced Topics: Data Lineage and Auditing

Unit 1: Introduction to Data Lineage and Auditing

Unit 2: Data Lineage in Snowflake

Unit 3: Auditing in Snowflake

Unit 4: Lineage, Auditing, and Data Governance

Advanced Topics: Data Quality Monitoring

Unit 1: Introduction to Data Quality Monitoring

Unit 2: Implementing Data Quality Checks

Unit 3: Advanced Data Quality Monitoring Techniques

Advanced Topics: Automated Testing

Unit 1: Introduction to Automated SQL Testing

Unit 2: SQL Unit Testing

Unit 3: SQL Integration and End-to-End Testing

Advanced Topics: Continuous Integration and Continuous Deployment (CI/CD)

Unit 1: CI/CD Fundamentals

Unit 2: Setting up a CI/CD Pipeline for SQL

Unit 3: Automating Deployment and Rollbacks