Data Science
Handling Outliers
Senior Data Analyst
April 30, 2026
5 min read
Detection
# Box plot
sns.boxplot(x=df['price'])
# IQR Method
Q1, Q3 = df['price'].quantile([0.25, 0.75])
IQR = Q3 - Q1
outliers = df[(df['price'] < Q1 - 1.5*IQR) | (df['price'] > Q3 + 1.5*IQR)]
Treatment Options
1. **Remove them**: `df = df[~df.index.isin(outliers.index)]`
2. **Cap them**: `df['price'] = df['price'].clip(lower=10, upper=1000)`
3. **Investigate**: Sometimes outliers are the real story!
*Days 121-150: Continue with Feature Engineering, Machine Learning, and the Final Project...*