Ranking and Binning Data

It's **Day 120**, and we're "Bucketing." In SQL, we used `NTILE`. In Pandas, we have `pd.cut()` and `pd.qcut()`.

Manual Binning: pd.cut()

You define the boundaries yourself.

bins = [0, 18, 65, 100]

labels = ['Minor', 'Adult', 'Senior']

df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)

Percentile Binning: pd.qcut()

You tell Pandas how many buckets you want, and it calculates the boundaries so that every bucket has the **same amount of people**.

# Split customers into 4 equal quarters (Quartiles)

df['spending_tier'] = pd.qcut(df['spend'], q=4, labels=['Low', 'Med', 'High', 'Top'])

Why this is analytical

Binning allows you to simplify complex numerical data for stakeholders. Instead of saying "Our users have an average age of 34.2," you can say "60% of our users are in the 'Adult' bucket."

Your Task for Today

Split a price column into 3 buckets ('Cheap', 'Mid', 'Expensive') using both `cut` and `qcut` and see how the results differ.

*Day 121: Combining Data with .combine_first().*

Manual Binning: pd.cut()

Percentile Binning: pd.qcut()

Why this is analytical

Your Task for Today

Ready to put your knowledge into practice?