The Five Most Common GROUP BY Mistakes
The Code Review
I was asked to review a junior analyst's queries. In a single afternoon, I found the same five mistakes I had made years earlier. This is the checklist I wish I had on Day 1.
Mistake 1: Missing Column in GROUP BY
If شما `SELECT country, city, SUM(amount)`, شما MUST `GROUP BY country, city`. Every non-aggregated column must be grouped.
Mistake 2: Using WHERE Instead of HAVING for Aggregates
`WHERE SUM(amount) > 100` will fail. You need `HAVING SUM(amount) > 100`.
Mistake 3: Forgetting ORDER BY
`GROUP BY` does not guarantee any order. If شما need January before February, add `ORDER BY month`.
Mistake 4: Grouping by the Wrong Granularity
If you group by the raw `timestamp`, you'll get one bucket per second! You probably wanted `DATE_TRUNC('day', timestamp)`.
Mistake 5: Joining Tables with Duplicate Keys
If your `products` table has two rows for the same product, your sum will be doubled. Add `DISTINCT` in your join or fix the source table.
Your Task for Today
Go through the last 3 queries شما wrote that used `GROUP BY`. Can شما spot any of these issues?
*Day 29: How NULLs Affect Aggregations.*