Data Science

Project: The Data Discovery Dashboard

SQL Mastery Team
May 11, 2026
8 min read

Congratulations! You've completed your first 10 days of Python for Data Science.

Today, we're building a **Discovery Script**. This is the first thing a professional Data Scientist does when they get a new file.

The Challenge

Analyze a raw CSV of "App Store Reviews."

1. Load the data using Pandas.

2. Check for missing values in the "Rating" and "Review" columns.

3. Fill missing ratings with the average.

4. Filter for reviews with more than 50 characters.

5. Calculate the average rating per App Category.

The Solution

import pandas as pd

# 1. Load

df = pd.read_csv('apps.csv')

# 2 & 3. Cleaning

df['rating'] = df['rating'].fillna(df['rating'].mean())

# 4. Analysis

# We'll learn string operations soon, for now we use a simple filter

long_reviews = df[df['review_length'] > 50]

# 5. The Insight

report = df.groupby('category')['rating'].mean().sort_values(ascending=False)

print("--- Top Rated Categories ---")

print(report.head(5))

What you've achieved

You've moved from "Pulling data" to "Processing data." You've handled dirty inputs, applied business logic, and generated insights using code.

**Phase 2 (Days 111–125): Data Wrangling**. We're going even deeper into Pandas—merging tables, splitting strings, and handling time-series.

See you in Phase 2!

Ready to put your knowledge into practice?

Join SQL Mastery and learn through interactive exercises.