Subqueries
Using CTEs for Data Cleaning Pipelines
Senior Data Analyst
February 25, 2026
6 min read
The Messy Data Problem
Raw data from a vendor had issues: typos in names, missing dates, duplicate rows. I needed to clean it in stages.
The Pipeline CTE
WITH
-- Stage 1: Standardize
cleaned AS (
SELECT UPPER(TRIM(name)) AS name, price, order_date
FROM raw_orders
),
-- Stage 2: Deduplicate
deduped AS (
SELECT DISTINCT * FROM cleaned
),
-- Stage 3: Fill defaults
final AS (
SELECT name, COALESCE(price, 0) AS price, order_date
FROM deduped
WHERE order_date IS NOT NULL
)
SELECT * FROM final;
Pro Tip
This is the foundation of ETL (Extract, Transform, Load). CTEs let you build transparent data pipelines.
*Day 57: Subquery Performance Traps.*