Handling Missing Data: NULLs in Python

Welcome to **Day 107**. In SQL, we have `NULL`. In Pandas, we have `NaN` (Not a Number) or `None`.

Detecting the Gaps

How many rows are missing data?

print(df.isnull().sum())

If you have millions of rows and only 100 are missing, just delete them.

df_clean = df.dropna()

Similar to `COALESCE` in SQL, we can replace missing values with a default (like the Average or 'Unknown').

# Replace missing ages with the average age

df['age'] = df['age'].fillna(df['age'].mean())

Most ML algorithms will crash if there is a single `NaN` in your data. Learning how to "Impute" (fill) data is a core Data Science skill.

Count the NULLs in a dataset and fill them with a sensible default value for that column.

*Day 108: Sorting and Ranking in Pandas.*