Data Science
Train/Test Split: The Golden Rule
Senior Data Analyst
May 6, 2026
5 min read
The Split
from sklearn.model_selection import train_test_split
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Why random_state?
Makes the split reproducible. Same seed = same split every time.
*Day 127: Linear Regression.*