Pandas vs SQL: A Rosetta Stone
It's **Day 102**. Today we bridge the gap between SQL and the most important library in Data Science: **Pandas**.
The Translation Table
| SQL Command | Pandas Equivalent |
|-------------|-------------------|
| `SELECT *` | `df` |
| `SELECT col1, col2` | `df[['col1', 'col2']]` |
| `WHERE col1 = 'val'` | `df[df['col1'] == 'val']` |
| `GROUP BY col1` | `df.groupby('col1')...` |
| `ORDER BY col1` | `df.sort_values('col1')` |
| `LIMIT 5` | `df.head(5)` |
A Real Example
In SQL:
SELECT name FROM users WHERE age > 25 LIMIT 5;
In Python (Pandas):
import pandas as pd
df = pd.read_csv('users.csv')
# The translation:
result = df[df['age'] > 25][['name']].head(5)
print(result)
Why Pandas?
Pandas allows you to treat data as a "Table" (Dataframe) in memory. It's incredibly fast for mathematical operations and much more flexible than writing raw Python loops.
Your Task for Today
Rewrite a basic SQL `SELECT` and `WHERE` query into a Pandas line of code.
*Day 103: The Pandas DataFrame Explained.*