Analytics engineering · SQL · Warehouse-ready patterns

SQL for
Analytics Engineering

Twenty practical questions with diagrams and SQL you can adapt to Snowflake, BigQuery, Postgres, or Redshift. Scenarios use factory / OEE data as a concrete story—the same patterns apply to product funnels, finance, marketing, and ops metrics.

20Questions

8Sections

∞Practice

◎

Analytics engineering & this guide

Role fit

What analytics engineers are tested on

Analytics engineers sit between data platform and stakeholders: they ship trustworthy metrics and models (often in dbt or the warehouse) and answer ad hoc questions without breaking grain or double-counting. Interviews usually stress SQL depth, dimensional thinking (facts vs dimensions, grain), window functions for cohorts and rolling KPIs, and knowing when a query is wrong before it is slow.

Core SQL: execution order, joins, aggregates, NULL semantics, portable patterns across warehouses.
Advanced: CTEs for readability, window frames, data-quality checks, EXPLAIN / profiles for regressions.
Context here: downtime, machines, shifts, and OEE mirror real-world operational analytics; reuse the same SQL ideas for any event stream and dimension tables.

ℹClick any question below to expand the answer, diagrams, and copy-paste SQL. Use the search box to filter by keyword (e.g. lag, having, nullif).

📐

Fundamentals

Q1–Q5

Q1 Easy What is the difference between WHERE and HAVING? ▼

FilteringAggregation

WHERE filters rows before aggregation. HAVING filters groups after aggregation. You cannot use aggregate functions inside WHERE.

Execution order visualization

①

FROM + JOIN

All source rows assembled

②

WHERE

Row-level filter — no aggregates allowed here

③

GROUP BY + Aggregate

Groups formed, COUNT/SUM/AVG computed

④

HAVING

Group-level filter — aggregates are available here

sql

-- ❌ WRONG — can't use COUNT() in WHERE
SELECT machine_id, COUNT(*) FROM downtime_events
WHERE COUNT(*) > 5 GROUP BY machine_id;

-- ✅ CORRECT
SELECT machine_id, COUNT(*) AS event_count
FROM downtime_events
WHERE shift_date >= '2024-01-01'   -- row-level filter ✓
GROUP BY machine_id
HAVING COUNT(*) > 5;              -- group-level filter ✓

💡Memory trick: WHERE filters rows before they're grouped. HAVING filters groups after they're counted.

Q2 Easy What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN? ▼

JoinsFundamentals

Visual — shaded region = rows returned

Matches only

All left + matches

All rows both sides

sql — factory context

-- INNER: only machines that have had downtime events
SELECT m.machine_name, COUNT(d.event_id)
FROM machines m INNER JOIN downtime_events d ON m.id = d.machine_id
GROUP BY m.machine_name;

-- LEFT: ALL machines, NULL count if no downtime events
SELECT m.machine_name, COUNT(d.event_id) AS event_count
FROM machines m LEFT JOIN downtime_events d ON m.id = d.machine_id
GROUP BY m.machine_name;
-- Zero-downtime machines show count = 0, not missing

ℹIn factory analytics, LEFT JOIN is your default. You almost always want all machines / all shifts in your report, even ones with no events — otherwise you hide zero-defect successes.

Q3 Easy What is the difference between UNION and UNION ALL? ▼

Set Operations

UNION removes duplicates — requires a sort pass, slower. UNION ALL keeps all rows including duplicates — no sort, faster. Use UNION ALL by default unless you specifically need deduplication.

sql

-- Combine downtime events from two shifts
-- UNION ALL: keep all (may include duplicates if data is duped)
SELECT machine_id, event_time, reason FROM shift_1_events
UNION ALL
SELECT machine_id, event_time, reason FROM shift_2_events;

-- UNION: deduplicate (slower — only use when needed)
SELECT machine_id, event_time, reason FROM shift_1_events
UNION
SELECT machine_id, event_time, reason FROM shift_2_events;

💡Both queries must have the same number of columns in the same order with compatible data types.

Q4 Easy What does COALESCE do? Give a practical factory example. ▼

NULL Handling

Returns the first non-NULL value from its argument list. Essential for handling sensor dropouts, missing readings, or LEFT JOIN NULLs.

sql

-- Sensor dropout: fall back to previous reading, then to 0
SELECT
    sensor_id,
    recorded_at,
    COALESCE(current_reading, prev_reading, 0) AS display_value,
    CASE WHEN current_reading IS NULL THEN 'IMPUTED' ELSE 'ACTUAL' END AS data_source
FROM sensor_readings;

-- NULL-safe division (avoid division-by-zero)
COALESCE(units_good * 1.0 / NULLIF(units_total, 0), 0) AS quality_rate

⚠NULLIF(a, b) returns NULL if a = b, otherwise returns a. Pair it with COALESCE to prevent division-by-zero: COALESCE(value / NULLIF(denominator, 0), 0)

Q5 Easy What is the difference between DELETE, TRUNCATE, and DROP? ▼

DDL / DML

Command	What it removes	Rollback?	Speed	Factory rule
DELETE	Specific rows (WHERE)	Yes	Slow (logged)	Use for archiving specific records
TRUNCATE	All rows, keeps structure	No (usually)	Fast	Almost never — destroys raw event history
DROP	Entire table + structure	No	Instant	Never on production tables

⚠In a factory analytics context, never TRUNCATE or DROP raw event tables. Always archive with a flag or move to a cold storage partition — sensor drift history is valuable for quality correlation analysis.

∑

Aggregation & Grouping

Q6–Q9

Q6 Medium Write a query to find the top 3 machines by total downtime per shift. ▼

Window FunctionsFactory KPI

sql

SELECT shift_id, machine_id, total_downtime, rnk
FROM (
    SELECT
        shift_id,
        machine_id,
        SUM(down_mins) AS total_downtime,
        RANK() OVER (
            PARTITION BY shift_id
            ORDER BY SUM(down_mins) DESC
        ) AS rnk
    FROM production_events
    GROUP BY shift_id, machine_id
) ranked
WHERE rnk <= 3
ORDER BY shift_id, rnk;

-- Snowflake shorthand: QUALIFY instead of subquery
SELECT shift_id, machine_id, SUM(down_mins) AS total_downtime,
       RANK() OVER (PARTITION BY shift_id ORDER BY SUM(down_mins) DESC) AS rnk
FROM production_events
GROUP BY shift_id, machine_id
QUALIFY rnk <= 3;

ℹQUALIFY is Snowflake/BigQuery syntax that filters on window function results — cleaner than a subquery. Mention it in a Snowflake interview.

Q7 Medium What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()? ▼

Window FunctionsRanking

Given values: 100, 100, 90 — how each function ranks them

Score	ROW_NUMBER()	RANK()	DENSE_RANK()
100	1	1	1
100	2	1	1
90	3	3 (skipped 2)	2

ROW_NUMBER — always unique. Tie-breaking is arbitrary (non-deterministic). RANK — ties get same rank, then skips. DENSE_RANK — ties get same rank, no skip. For factory use: use DENSE_RANK when ranking machines by defect count — rank 2 should mean second worst, not third worst because two tied for first.

sql — when to use each

-- ROW_NUMBER: deduplicate — get exactly one row per machine
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY machine_id ORDER BY ts DESC) AS rn
    FROM sensor_readings
) WHERE rn = 1;   -- latest reading per machine

-- DENSE_RANK: rank machines by defect count
DENSE_RANK() OVER (ORDER BY defect_count DESC) AS defect_rank

Q8 Medium Write a query to calculate a 7-day rolling average of daily defect counts. ▼

Window FunctionsTime Series

ROWS frame — what's "in the window" for row 5 (today)

6 back

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Today

← AVG these 7

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

sql

SELECT
    event_date,
    daily_defects,
    AVG(daily_defects) OVER (
        ORDER BY event_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7day_avg,
    -- Calendar-based (handles missing dates differently):
    AVG(daily_defects) OVER (
        ORDER BY event_date
        RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
    ) AS rolling_7day_calendar_avg
FROM daily_quality_summary
ORDER BY event_date;

ℹROWS vs RANGE: ROWS counts physical rows (always exactly 7). RANGE uses the ORDER BY value — all rows within 6 days are included, which handles missing dates more naturally for time-series data.

Q9 Medium How do you find and remove duplicate rows in a table? ▼

Data QualityDeduplication

sql

-- Step 1: Find duplicates
SELECT machine_id, event_timestamp, COUNT(*) AS occurrences
FROM sensor_readings
GROUP BY machine_id, event_timestamp
HAVING COUNT(*) > 1;

-- Step 2: Keep only the latest row per (machine, timestamp)
WITH deduped AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY machine_id, event_timestamp
               ORDER BY ingested_at DESC
           ) AS rn
    FROM sensor_readings
)
SELECT * FROM deduped WHERE rn = 1;

⧉

Window Functions

Q10–Q12

Q10 Medium Write a query to get each machine's most recent sensor reading. ▼

Window FunctionsLatest Record

sql

-- ROW_NUMBER approach (preferred — deterministic)
SELECT machine_id, sensor_type, reading_value, recorded_at
FROM (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY machine_id, sensor_type
               ORDER BY recorded_at DESC
           ) AS rn
    FROM sensor_readings
    WHERE dq_flag = 'GOOD'
) t
WHERE rn = 1;

-- Snowflake shorthand with QUALIFY
SELECT machine_id, sensor_type, reading_value, recorded_at
FROM sensor_readings
WHERE dq_flag = 'GOOD'
QUALIFY ROW_NUMBER() OVER (PARTITION BY machine_id, sensor_type ORDER BY recorded_at DESC) = 1;

Q11 Medium What is LAG() and LEAD()? Calculate time between consecutive downtime events. ▼

Window FunctionsTime Between Events

LAG() accesses the previous row's value. LEAD() accesses the next row's value. Both accept an offset and a default for when no prior/next row exists.

sql

SELECT
    machine_id,
    event_start,
    LAG(event_start) OVER (
        PARTITION BY machine_id
        ORDER BY event_start
    ) AS prev_event_start,
    DATEDIFF('minute',
        LAG(event_start) OVER (PARTITION BY machine_id ORDER BY event_start),
        event_start
    ) AS mins_since_last_downtime,
    LEAD(event_start) OVER (
        PARTITION BY machine_id
        ORDER BY event_start
    ) AS next_event_start
FROM downtime_events
ORDER BY machine_id, event_start;

💡MTBF (Mean Time Between Failures) is just AVG(mins_since_last_downtime) on top of this query — a key factory reliability metric.

Q12 Hard What is the difference between ROWS and RANGE in a window frame? ▼

Window FramesAdvanced

ROWS is physical — counts actual row positions. RANGE is logical — groups rows with the same ORDER BY value, or uses an interval for date/numeric ranges.

ROWS vs RANGE — 3-day window with a missing date

Date	Value	ROWS (3 PREC)	RANGE (2 days)
Jan 1	10	AVG(10)	AVG(10)
Jan 2	20	AVG(10,20)	AVG(10,20)
Jan 4 (gap!)	30	AVG(10,20,30)	AVG(20,30) only
Jan 5	40	AVG(20,30,40)	AVG(30,40)

sql

-- ROWS: exactly 3 preceding rows regardless of date gaps
AVG(reading) OVER (ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)

-- RANGE: all rows within a 3-hour interval (handles gaps correctly)
AVG(reading) OVER (
    ORDER BY ts
    RANGE BETWEEN INTERVAL '3 hours' PRECEDING AND CURRENT ROW
)

ℹFor sensor time-series with potential dropout gaps, RANGE with an interval usually gives more meaningful averages — it respects calendar time, not row count.

⟳

CTEs & Subqueries

Q13–Q14

Q13 Medium What is a CTE and when would you use one over a subquery? ▼

CTEsReadability

A CTE (WITH clause) names an intermediate result you can reference multiple times. Prefer CTEs when: the same intermediate result is used more than once, the logic is complex enough that naming it improves readability, or you need recursion (CTEs only).

sql — OEE with CTE

WITH shift_oee AS (
    SELECT
        pe.shift_id,
        pe.machine_id,
        SUM(pe.run_mins) * 1.0 / MAX(s.planned_mins)    AS availability,
        SUM(pe.units_good) * 1.0
            / NULLIF(SUM(pe.units_total), 0)              AS quality,
        SUM(pe.units_total * dm.ideal_cycle_secs)
            / NULLIF(SUM(pe.run_mins) * 60, 0)             AS performance
    FROM production_events pe
    JOIN dim_shift s  ON pe.shift_id  = s.shift_id
    JOIN dim_machine dm ON pe.machine_id = dm.machine_id
    GROUP BY pe.shift_id, pe.machine_id
)
SELECT
    shift_id,
    machine_id,
    ROUND(availability * 100, 1)               AS avail_pct,
    ROUND(performance * 100, 1)               AS perf_pct,
    ROUND(quality * 100, 1)                  AS qual_pct,
    ROUND(availability * performance * quality * 100, 1) AS oee_pct
FROM shift_oee
WHERE availability < 0.75   -- flag underperforming shifts
ORDER BY oee_pct;

Q14 Hard Write a recursive CTE to traverse an employee org chart. ▼

Recursive CTEAdvanced

How the recursion works

①

Anchor member

SELECT root rows WHERE manager_id IS NULL — the CEO / top node

②

Recursive member

UNION ALL — join each employee to their parent row from the previous iteration

③

Termination

Recursion stops when no new rows match — leaf nodes have no children

sql

WITH RECURSIVE org AS (
    -- Anchor: root nodes (no manager)
    SELECT
        employee_id,
        employee_name,
        manager_id,
        0                                    AS depth,
        employee_name                         AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: each employee joins to their manager's row
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        o.depth + 1,
        o.path || ' > ' || e.employee_name
    FROM employees e
    JOIN org o ON e.manager_id = o.employee_id
)
SELECT employee_id, employee_name, depth, path
FROM org
ORDER BY path;

-- Sample output:
-- Alice        | depth 0 | Alice
-- Bob          | depth 1 | Alice > Bob
-- Dana         | depth 2 | Alice > Bob > Dana
-- Eve          | depth 2 | Alice > Bob > Eve
-- Carol        | depth 1 | Alice > Carol

⚑

Data Quality & Real-World Problems

Q15–Q17

Q15 Hard How do you detect a "stuck sensor" that has returned the same value for 10+ consecutive readings? ▼

Data QualityIslands & Gaps

Classic islands and gaps technique. Subtracting two row numbers creates a constant group ID for consecutive identical values.

How the group ID works

Value	RN overall	RN per value	RN - RN_val = group_id
21.5	1	1	0
21.5	2	2	0
21.5	3	3	0
22.1	4	1	3 (new group)
21.5	5	4	1 (new group)

sql

WITH grouped AS (
    SELECT
        sensor_id, reading_value, recorded_at,
        ROW_NUMBER() OVER (PARTITION BY sensor_id ORDER BY recorded_at)
        - ROW_NUMBER() OVER (PARTITION BY sensor_id, reading_value ORDER BY recorded_at)
        AS grp_id
    FROM sensor_readings
),
consecutive_counts AS (
    SELECT
        sensor_id, reading_value, grp_id,
        COUNT(*) AS consecutive_count,
        MIN(recorded_at) AS streak_start,
        MAX(recorded_at) AS streak_end
    FROM grouped
    GROUP BY sensor_id, reading_value, grp_id
)
SELECT sensor_id, reading_value, consecutive_count, streak_start, streak_end
FROM consecutive_counts
WHERE consecutive_count >= 10
ORDER BY consecutive_count DESC;

💡This is the most sophisticated SQL pattern in factory analytics. Knowing the islands/gaps technique by name and being able to explain the subtraction trick will impress any interviewer.

Q16 Hard Flag sensor readings that are more than 3 standard deviations from the mean (Z-score outlier detection). ▼

Statistical QCSPC

sql

WITH sensor_stats AS (
    -- Compute baseline stats on GOOD readings only
    SELECT
        sensor_id,
        AVG(reading_value)    AS mean_val,
        STDDEV(reading_value) AS stddev_val
    FROM sensor_readings
    WHERE dq_flag = 'GOOD'
    GROUP BY sensor_id
)
SELECT
    r.sensor_id,
    r.reading_value,
    r.recorded_at,
    ROUND(s.mean_val, 3)                                          AS mean_val,
    ROUND(ABS(r.reading_value - s.mean_val)
        / NULLIF(s.stddev_val, 0), 2)                            AS z_score,
    CASE
        WHEN ABS(r.reading_value - s.mean_val)
            / NULLIF(s.stddev_val, 0) > 3  THEN 'SUSPECT'
        ELSE 'GOOD'
    END                                                               AS dq_flag
FROM sensor_readings r
JOIN sensor_stats s ON r.sensor_id = s.sensor_id;

⚠Always compute baseline stats on verified GOOD readings only, not all readings. If you include outliers in the baseline, the mean and stddev shift — making the threshold less sensitive to the anomalies you're trying to catch.

Q17 Hard Find overlapping downtime events for the same machine. ▼

Interval LogicSelf Join

Two intervals [A_start, A_end] and [B_start, B_end] overlap if and only if: A_start < B_end AND A_end > B_start

Overlap condition — visualized

Event A: |——————|
Event B:     |——————|  ← overlaps (B starts before A ends)
Event C:             |——|  ← no overlap (C starts after A ends)

sql

SELECT
    a.machine_id,
    a.event_id      AS event_a,
    b.event_id      AS event_b,
    a.start_time    AS a_start,
    a.end_time      AS a_end,
    b.start_time    AS b_start,
    b.end_time      AS b_end
FROM downtime_events a
JOIN downtime_events b
    ON  a.machine_id = b.machine_id
    AND a.event_id   < b.event_id     -- avoid self-join & duplicate pairs
    AND a.start_time < b.end_time     -- overlap condition part 1
    AND a.end_time   > b.start_time;  -- overlap condition part 2

⚡

Performance & Design

Q18–Q20

Q18 Medium What is the difference between a clustered and non-clustered index? What's the Snowflake equivalent? ▼

IndexesSnowflake

Clustered index — physically sorts table rows by the index key. One per table. Non-clustered index — separate structure pointing back to rows. Multiple allowed.

In Snowflake, data is stored in micro-partitions automatically. The equivalent is a cluster key — defining one causes Snowflake to sort data within micro-partitions, enabling more aggressive partition pruning on your most common filter columns.

sql — Snowflake cluster key

-- Define cluster key on most common filter columns
ALTER TABLE production_events
CLUSTER BY (shift_date, machine_id);

-- Check clustering information
SELECT SYSTEM$CLUSTERING_INFORMATION('production_events');

💡In a factory analytics context: cluster on shift_date and machine_id — these are the most common filter columns in OEE, downtime, and quality queries. Snowflake will prune micro-partitions that don't match, making queries dramatically faster at scale.

Q19 Medium A dashboard query that ran in 2 seconds now takes 45 seconds. How do you diagnose it? ▼

Query OptimizationTroubleshooting

①

Run EXPLAIN / Query Profile

Snowflake Query Profile, PostgreSQL EXPLAIN ANALYZE — look for full table scans, cartesian products, spill to disk

②

Check data volume growth

Did the table 10x in size? Did a new partition get added? Row count explosion from a bad JOIN?

③

Check for schema changes

Did a column type change break a partition filter? Did a JOIN key become nullable?

④

Fix in order

Filter pushdown first → index/cluster key → query rewrite → pre-aggregation table

⚠The most common cause of sudden slowdowns in factory analytics is a cartesian product from an ambiguous JOIN — e.g., joining on machine_id without also joining on shift_date, causing every machine row to match every shift row.

Q20 Medium What is the difference between EXPLAIN and EXPLAIN ANALYZE? ▼

Query Planning

Command	Runs query?	Shows	When to use
EXPLAIN	No	Estimated plan, estimated row counts	Query too expensive to run speculatively
EXPLAIN ANALYZE	Yes	Actual plan, actual row counts, actual time per node, spills	Diagnosing a slow query already in production
Snowflake Query Profile	Yes (post-run)	Node-level stats, spill to disk, partition pruning ratio	Always use after any Snowflake query optimization

💡The partition pruning ratio in Snowflake's Query Profile is the most useful single metric — if you're scanning 800 of 800 micro-partitions on a table filtered by date, your cluster key is wrong or missing.

🏭

Factory Analytics Patterns

OEE + Star Schema

OEE = Availability × Performance × Quality

Availability

run_mins ÷ planned_mins

Performance

actual_rate ÷ ideal_rate

Quality

units_good ÷ units_total

OEE

A×P×Q

world-class ≥ 85%

Star schema — fact_production_event

dim_shift
id · date · planned_mins

dim_machine
id · name · ideal_rate

fact_production_event
run_mins · down_mins
units_good · actual_rate

dim_product
sku · spec_rate

dim_downtime_reason
code · category · planned?

sql — full OEE query from star schema

WITH oee_calc AS (
    SELECT
        s.shift_date,
        pe.machine_id,
        m.machine_name,
        SUM(pe.run_mins) * 1.0 / MAX(s.planned_mins)        AS availability,
        SUM(pe.units_total * m.ideal_cycle_secs / 60.0)
          / NULLIF(SUM(pe.run_mins), 0)                      AS performance,
        SUM(pe.units_good) * 1.0
          / NULLIF(SUM(pe.units_total), 0)                  AS quality,
        COUNT(CASE WHEN pe.dq_flag = 'GOOD' THEN 1 END) * 1.0
          / NULLIF(COUNT(*), 0)                              AS data_quality_score
    FROM fact_production_event pe
    JOIN dim_shift   s ON pe.shift_id   = s.shift_id
    JOIN dim_machine m ON pe.machine_id = m.machine_id
    WHERE s.shift_date >= DATEADD('day', -30, CURRENT_DATE())
    GROUP BY s.shift_date, pe.machine_id, m.machine_name
)
SELECT
    shift_date,
    machine_id,
    machine_name,
    ROUND(availability  * 100, 1) AS avail_pct,
    ROUND(performance   * 100, 1) AS perf_pct,
    ROUND(quality       * 100, 1) AS qual_pct,
    ROUND(availability * performance * quality * 100, 1) AS oee_pct,
    ROUND(data_quality_score * 100, 1)                 AS dq_score_pct,
    -- Rolling 7-day OEE trend
    AVG(availability * performance * quality) OVER (
        PARTITION BY machine_id
        ORDER BY shift_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) * 100                                           AS oee_7day_rolling_avg
FROM oee_calc
ORDER BY shift_date, oee_pct;

≡

Cheat Sheet

Function	Purpose	Key gotcha
WHERE	Filter rows before GROUP BY	Cannot use aggregate functions
HAVING	Filter groups after GROUP BY	Requires GROUP BY (or window context)
COALESCE(a,b,c)	First non-NULL value	Evaluates left to right, stops at first non-NULL
NULLIF(a,b)	Return NULL if a=b, else a	Pair with COALESCE for safe division
ROW_NUMBER()	Unique row rank (no ties)	Tie-breaking is non-deterministic
RANK()	Rank with ties, skips next	Gaps in sequence when ties occur
DENSE_RANK()	Rank with ties, no skip	Use for factory machine rankings
LAG(col, n, default)	Previous row value	Returns NULL if no prior row (set default)
LEAD(col, n, default)	Next row value	Returns NULL if no next row (set default)
ROWS BETWEEN	Physical row count window	Always exact N rows regardless of date gaps
RANGE BETWEEN	Logical value-based window	Use with INTERVAL for time-series
WITH RECURSIVE	Hierarchical traversal	Must use UNION ALL, not UNION
QUALIFY	Filter on window function result	Snowflake / BigQuery only
UNION ALL	Combine results, keep dupes	Faster than UNION — use by default
EXPLAIN ANALYZE	Actual execution stats	Actually runs the query — use on slow prod queries
SUM(CASE WHEN …)	Conditional aggregation (all warehouses)	Prefer COUNT_IF / BOOL_AND in Snowflake when it reads cleaner
FILTER (WHERE …)	Postgres-style conditional agg	Not everywhere; CASE/SUM is the portable default
Surrogate key	Stable id for a dimension row	Often ROW_NUMBER() or hash — document grain and SCD rules
SCD Type 2	History when attributes change	valid_from / valid_to or is_current; join on “as-of” date
Incremental model	dbt-style merge by key + watermark	Align keys with business grain; test uniqueness

Live-coding tip

Many analytics engineering loops end with one end-to-end metric query—here, OEE from a star schema with a rolling trend. Practice explaining grain, joins, and why each term is multiplied so you can adapt the same structure to revenue, retention, or funnel conversion.

SQL forAnalytics Engineering