Data Engineering

Deduplication with DISTINCT ON

SQL Mastery Team
April 11, 2026
5 min read

Welcome to **Day 80**. Today we tackle a recurring nightmare: **Duplicate Data**.

The Scenario

A bug in the app saved the same user signup three times. You only want the **first** one.

Normally, `DISTINCT` looks at the *whole row*. But in PostgreSQL, we have a unique superpower called `DISTINCT ON`.

The Syntax

SELECT DISTINCT ON (email)

id, email, created_at

FROM users

ORDER BY email, created_at ASC;

How it works

1. **(email)**: It looks for unique emails.

2. **ORDER BY ... ASC**: For every email, it takes the row with the *earliest* `created_at`.

Why it's better than GROUP BY

With `GROUP BY`, you can only see columns that you group by or aggregate. With `DISTINCT ON`, you get the **entire row** (IDs, metadata, settings) for that specific record.

Your Task for Today

Identify a table with duplicates. Use `DISTINCT ON` to pull only the most recent record for every group.

*Day 81: COALESCE—The Data Cleaner's Best Friend.*

Ready to put your knowledge into practice?

Join SQL Mastery and learn through interactive exercises.