Interview Prep

SQL Interview Questions for Data Engineer (2026)

20 Questions

With Answers

5 Basic10 Intermediate5 Advanced

Quick Navigation

Basic Questions (5)Intermediate (10)Advanced (5)

Basic SQL Questions

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL

1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

Example Code

SQL

1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

Example Code

SQL

1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

Example Code

SQL

1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

Example Code

SQL

1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Intermediate SQL Questions

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL

1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

Example Code

SQL

1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

Example Code

SQL

1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

Example Code

SQL

1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

Example Code

SQL

1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL

1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Example Code

SQL

1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Example Code

SQL

1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Example Code

SQL

1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Example Code

SQL

1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Advanced SQL Questions

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL

1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Example Code

SQL

1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Example Code

SQL

1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Example Code

SQL

1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Example Code

SQL

1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Interview Tips

Practice writing queries without an IDE to simulate whiteboard interviews
Explain your thought process as you solve problems
Ask clarifying questions about edge cases
Consider query performance and scalability

Ready for your interview?

Practice with our interactive SQL sandbox and get instant feedback.

Quick Navigation

Basic SQL Questions

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Intermediate SQL Questions

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Advanced SQL Questions

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Answer

Example Code

Interview Tips

Related Content

Continue Learning

From Our Blog

Ready for your interview?