Ranking and Binning Data
It's **Day 120**, and we're "Bucketing." In SQL, we used `NTILE`. In Pandas, we have `pd.cut()` and `pd.qcut()`.
Manual Binning: pd.cut()
You define the boundaries yourself.
bins = [0, 18, 65, 100]
labels = ['Minor', 'Adult', 'Senior']
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
Percentile Binning: pd.qcut()
You tell Pandas how many buckets you want, and it calculates the boundaries so that every bucket has the **same amount of people**.
# Split customers into 4 equal quarters (Quartiles)
df['spending_tier'] = pd.qcut(df['spend'], q=4, labels=['Low', 'Med', 'High', 'Top'])
Why this is analytical
Binning allows you to simplify complex numerical data for stakeholders. Instead of saying "Our users have an average age of 34.2," you can say "60% of our users are in the 'Adult' bucket."
Your Task for Today
Split a price column into 3 buckets ('Cheap', 'Mid', 'Expensive') using both `cut` and `qcut` and see how the results differ.
*Day 121: Combining Data with .combine_first().*