Data Science

One-Hot Encoding for Categorical Variables

Senior Data Analyst
May 2, 2026
5 min read

The Problem

ML models need numbers. A column like `color = ['Red', 'Blue', 'Green']` must be converted.

One-Hot Encoding

df_encoded = pd.get_dummies(df, columns=['color'])

# Creates: color_Red, color_Blue, color_Green (1 or 0)

When to Use

  • Categories with no inherent order (color, region).
  • Be careful with high-cardinality columns (too many dummies).
  • *Day 123: Label Encoding.*

    Ready to put your knowledge into practice?

    Join SQL Mastery and learn through interactive exercises.