Interview Prep

SQL Interview Questions for Data Engineer (2026)

20 Questions
With Answers
5 Basic10 Intermediate5 Advanced

Basic SQL Questions

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL
1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL
1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL
1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL
1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Q1. What is the difference between OLTP and OLAP databases?

basic

Answer

OLTP (Online Transaction Processing): Optimized for writes, normalized data, row-oriented, used for daily operations. OLAP (Online Analytical Processing): Optimized for reads, denormalized/star schema, column-oriented, used for analytics and reporting.

Example Code

SQL
1-- OLTP: Normalized tables, frequent small transactions
2CREATE TABLE orders (id SERIAL PRIMARY KEY, customer_id INT, total DECIMAL);
3CREATE TABLE order_items (order_id INT, product_id INT, quantity INT);
4
5-- OLAP: Denormalized fact table with dimensions
6CREATE TABLE fact_sales (
7  date_key INT,
8  product_key INT,
9  customer_key INT,
10  quantity INT,
11  revenue DECIMAL
12);

Intermediate SQL Questions

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL
1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL
1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL
1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL
1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q2. How would you handle slowly changing dimensions (SCD)?

intermediate

Answer

SCD Type 1: Overwrite (no history). Type 2: Add new row with version/date columns (full history). Type 3: Add columns for previous values (limited history). Type 2 is most common for complete historical tracking.

Example Code

SQL
1-- SCD Type 2: Track all historical changes
2CREATE TABLE dim_customer (
3  customer_key SERIAL PRIMARY KEY,
4  customer_id INT,               -- Business key
5  name VARCHAR(100),
6  address VARCHAR(255),
7  valid_from DATE NOT NULL,
8  valid_to DATE,                 -- NULL = current
9  is_current BOOLEAN DEFAULT true
10);
11
12-- Insert new version when customer changes address
13UPDATE dim_customer SET valid_to = CURRENT_DATE, is_current = false
14WHERE customer_id = 123 AND is_current = true;
15
16INSERT INTO dim_customer (customer_id, name, address, valid_from)
17VALUES (123, 'John Doe', 'New Address', CURRENT_DATE);

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL
1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL
1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL
1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL
1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Q3. Explain the concept of data partitioning and when to use it.

intermediate

Answer

Partitioning divides large tables into smaller, manageable pieces. Types: Range (by date), List (by category), Hash (distributed). Benefits: Faster queries (partition pruning), easier maintenance (drop old partitions), parallel processing.

Example Code

SQL
1-- Range partitioning by date
2CREATE TABLE events (
3  id BIGSERIAL,
4  event_date DATE NOT NULL,
5  event_type VARCHAR(50),
6  data JSONB
7) PARTITION BY RANGE (event_date);
8
9CREATE TABLE events_2024_01 PARTITION OF events
10  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
11
12CREATE TABLE events_2024_02 PARTITION OF events
13  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
14
15-- Queries automatically use partition pruning
16SELECT * FROM events WHERE event_date = '2024-01-15';
17-- Only scans events_2024_01 partition

Advanced SQL Questions

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL
1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL
1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL
1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL
1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Q4. How do you handle incremental data loads in an ETL pipeline?

advanced

Answer

Use watermarks (last_updated timestamp), change data capture (CDC), or delta tables. Track high-water mark, process only new/changed records. Handle late-arriving data with lookback windows. Use upsert (MERGE) for idempotent loads.

Example Code

SQL
1-- Incremental load using watermark
2WITH watermark AS (
3  SELECT COALESCE(MAX(last_updated), '1900-01-01') AS hwm
4  FROM etl_metadata WHERE table_name = 'orders'
5)
6INSERT INTO staging_orders
7SELECT * FROM source_orders
8WHERE last_updated > (SELECT hwm FROM watermark);
9
10-- Upsert to target (PostgreSQL)
11INSERT INTO dim_orders (order_id, status, amount)
12SELECT order_id, status, amount FROM staging_orders
13ON CONFLICT (order_id) DO UPDATE SET
14  status = EXCLUDED.status,
15  amount = EXCLUDED.amount;
16
17-- Update watermark
18UPDATE etl_metadata 
19SET last_value = (SELECT MAX(last_updated) FROM staging_orders)
20WHERE table_name = 'orders';

Interview Tips

  • Practice writing queries without an IDE to simulate whiteboard interviews
  • Explain your thought process as you solve problems
  • Ask clarifying questions about edge cases
  • Consider query performance and scalability

Related Content

From Our Blog

Ready for your interview?

Practice with our interactive SQL sandbox and get instant feedback.