Deduplication with DISTINCT ON
Welcome to **Day 80**. Today we tackle a recurring nightmare: **Duplicate Data**.
The Scenario
A bug in the app saved the same user signup three times. You only want the **first** one.
Normally, `DISTINCT` looks at the *whole row*. But in PostgreSQL, we have a unique superpower called `DISTINCT ON`.
The Syntax
SELECT DISTINCT ON (email)
id, email, created_at
FROM users
ORDER BY email, created_at ASC;
How it works
1. **(email)**: It looks for unique emails.
2. **ORDER BY ... ASC**: For every email, it takes the row with the *earliest* `created_at`.
Why it's better than GROUP BY
With `GROUP BY`, you can only see columns that you group by or aggregate. With `DISTINCT ON`, you get the **entire row** (IDs, metadata, settings) for that specific record.
Your Task for Today
Identify a table with duplicates. Use `DISTINCT ON` to pull only the most recent record for every group.
*Day 81: COALESCE—The Data Cleaner's Best Friend.*