Handling Missing Data: NULLs in Python
Welcome to **Day 107**. In SQL, we have `NULL`. In Pandas, we have `NaN` (Not a Number) or `None`.
Detecting the Gaps
How many rows are missing data?
print(df.isnull().sum())
Strategy 1: The "Lazy" Way (Drop them)
If you have millions of rows and only 100 are missing, just delete them.
df_clean = df.dropna()
Strategy 2: The "Safe" Way (Fill them)
Similar to `COALESCE` in SQL, we can replace missing values with a default (like the Average or 'Unknown').
# Replace missing ages with the average age
df['age'] = df['age'].fillna(df['age'].mean())
Why this is critical for Machine Learning
Most ML algorithms will crash if there is a single `NaN` in your data. Learning how to "Impute" (fill) data is a core Data Science skill.
Your Task for Today
Count the NULLs in a dataset and fill them with a sensible default value for that column.
*Day 108: Sorting and Ranking in Pandas.*