Project: The Data Discovery Dashboard
Congratulations! You've completed your first 10 days of Python for Data Science.
Today, we're building a **Discovery Script**. This is the first thing a professional Data Scientist does when they get a new file.
The Challenge
Analyze a raw CSV of "App Store Reviews."
1. Load the data using Pandas.
2. Check for missing values in the "Rating" and "Review" columns.
3. Fill missing ratings with the average.
4. Filter for reviews with more than 50 characters.
5. Calculate the average rating per App Category.
The Solution
import pandas as pd
# 1. Load
df = pd.read_csv('apps.csv')
# 2 & 3. Cleaning
df['rating'] = df['rating'].fillna(df['rating'].mean())
# 4. Analysis
# We'll learn string operations soon, for now we use a simple filter
long_reviews = df[df['review_length'] > 50]
# 5. The Insight
report = df.groupby('category')['rating'].mean().sort_values(ascending=False)
print("--- Top Rated Categories ---")
print(report.head(5))
What you've achieved
You've moved from "Pulling data" to "Processing data." You've handled dirty inputs, applied business logic, and generated insights using code.
**Phase 2 (Days 111–125): Data Wrangling**. We're going even deeper into Pandas—merging tables, splitting strings, and handling time-series.
See you in Phase 2!