EXISTS vs IN in Real Systems
Welcome to **Day 48**. We touched on this in Phase 1, but now that we know subqueries, we can understand the real performance difference between `IN` and `EXISTS`.
The Theory
The Performance Secret
In modern databases, `EXISTS` is often faster when the list of values is very large. Why?
Because `EXISTS` stops searching the moment it finds **one** match (Short-circuiting). `IN` sometimes tries to build the entire list of IDs in memory before it starts filtering.
Real-World Example: Customer Audit
"Find all customers who have ever placed an order."
-- Using IN
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders);
-- Using EXISTS (Usually faster for millions of orders)
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
Pro Tip: NOT EXISTS
`NOT EXISTS` is the gold standard for finding "missing" records. It's much safer than `NOT IN`, which can return 0 rows if there is a single `NULL` in the subquery result!
Your Task for Today
Rewrite an `IN` subquery as an `EXISTS` subquery and see if you notice any speed difference.
*Day 49: Why Subqueries Can Be Slow.*