Grouping and Aggregating Data
Welcome to **Day 109**. Today we tackle the workhouse of analysis: **Aggregation**.
The Syntax: Split-Apply-Combine
Pandas follows a specific pattern:
1. **Split**: Break the data into groups (e.g., by Country).
2. **Apply**: Calculate something (e.g., Sum of Revenue).
3. **Combine**: Put it back into a nice table.
# The simple way
monthly_revenue = df.groupby('month')['revenue'].sum()
# The professional way (Multiple calculations)
summary = df.groupby('category').agg({
'revenue': 'sum',
'customer_id': 'count',
'price': 'mean'
})
Why this is better than Excel
You can group by millions of rows and dozens of categories in milliseconds. You can also group by complex logic, like "Group by the first letter of the customer's name."
Your Task for Today
Calculate the total and average spend for every customer using `.groupby()`.
*Day 110: Day 110: Phase 1 Project—The Data Discovery Dashboard.*