Correlated Subqueries
The Per-Customer Challenge
Finance asked: *"For each customer, show their largest order amount."*
I could do this with `GROUP BY`, but what if I also needed to show the Order ID of that largest order? A simple aggregate wouldn't work. I needed a **Correlated Subquery**.
The Quest: The Row-by-Row Lookup
A Correlated Subquery refers back to a column in the outer query. It's executed **once per row** of the outer query.
The Implementation: The Syntax
SELECT o.*
FROM orders o
WHERE o.order_amount = (
SELECT MAX(order_amount)
FROM orders
WHERE customer_id = o.customer_id -- This is the correlation!
);
How It Works
1. The outer query reads a row from `orders` (e.g., Customer A's order).
2. The inner query runs, filtered by `customer_id = o.customer_id`.
3. It returns the MAX amount for Customer A.
4. The outer query checks if the current row's amount matches.
5. Repeat for every row.
The "Oops" Moment
Correlated subqueries are slow! For a table with 1 million rows, the inner query runs 1 million times. My first correlated query took 20 minutes.
**Pro Tip**: If performance is an issue, try rewriting the logic using a JOIN or, better yet, a Window Function (Phase 5!).
The Victory
The finance team got the detailed list of top orders per customer. I learned when to use this powerful technique and when to find a faster alternative.
Your Task for Today
Write a correlated subquery that finds the most recent order for each customer.
*Day 48: Subqueries in the FROM Clause (Derived Tables).*