Data Science

Train/Test Split: The Golden Rule

Senior Data Analyst
May 6, 2026
5 min read

The Split

from sklearn.model_selection import train_test_split

X = df[['feature1', 'feature2']]

y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Why random_state?

Makes the split reproducible. Same seed = same split every time.

*Day 127: Linear Regression.*

Ready to put your knowledge into practice?

Join SQL Mastery and learn through interactive exercises.