Intermediate-level SQL interview questions for Data Engineer positions.
1. How would you handle slowly changing dimensions (SCD)?
Answer: SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.
-- SCD Type 2: Track all historical changes CREATE TABLE dim_customer ( customer_key SERIAL PRIMARY KEY, customer_id INT, -- Business key name VARCHAR(100), address VARCHAR(255), valid_from DATE NOT NULL, valid_to DATE, -- NULL = current is_current BOOLEAN DEFAULT true ); -- Insert new version when customer changes address UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false WHERE customer_id = 123 AND is_current = true; INSERT INTO dim_customer (customer_id, name, address, valid_from) VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);
2. Explain the concept of data partitioning and when to use it.
Answer: Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.
-- Range partitioning by date CREATE TABLE events ( id BIGSERIAL, event_date DATE NOT NULL, event_type VARCHAR(50), data JSONB ) PARTITION BY RANGE (event_date); CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); -- Queries automatically use partition pruning SELECT * FROM events WHERE event_date = '2024-01-15'; -- Only scans events_2024_01 partition