Project: The Multi-Source Cleanup Pipeline

Congratulations! You've completed **Phase 2: Data Wrangling**. You are now officially a "Pandas Power User."

Today, we build a **Data Consolidation Engine**.

The Challenge

You have:

1. `users.csv` (Basic info).

2. `transactions.csv` (Price info).

3. `web_logs.csv` (Dates and categories).

The Roadmap

1. **Join**: Merge these into one DataFrame.

2. **Clean**: Fix DateTime formats and handle missing prices.

3. **Transform**: Create a "Status" column based on transaction amounts.

4. **Aggregate**: Show the average shipping time per category for the Top 10% of spenders.

The Solution (Skeleton)

import pandas as pd

# 1. Merge

df = pd.merge(users, tx, on='uid').merge(logs, on='uid')

# 2. DateTime & Nulls

df['date'] = pd.to_datetime(df['date'])

df['price'] = df['price'].fillna(0)

# 3. Logic

df['segment'] = pd.qcut(df['price'], q=10, labels=False)

# 4. Filter & Aggregate

high_value = df[df['segment'] == 9]

result = high_value.groupby('category')['shipping'].mean()

print(result)

Why this is professional grade

You've taken raw, disconnected files and turned them into actionable business intelligence. This is the exact workflow used in top tech companies like Amazon and Netflix.

**Phase 3 (Days 126–140): Visualization & EDA**. We're going to turn these numbers into beautiful, interactive charts.

See you in Phase 3!

The Challenge

The Roadmap

The Solution (Skeleton)

Why this is professional grade

Ready to put your knowledge into practice?