Analytics engineering · SQL · Warehouse-ready patterns

SQL for
Analytics Engineering

Twenty practical questions with diagrams and SQL you can adapt to Snowflake, BigQuery, Postgres, or Redshift. Scenarios use factory / OEE data as a concrete story—the same patterns apply to product funnels, finance, marketing, and ops metrics.

20Questions
8Sections
Practice

Analytics engineering & this guide

Role fit

What analytics engineers are tested on

Analytics engineers sit between data platform and stakeholders: they ship trustworthy metrics and models (often in dbt or the warehouse) and answer ad hoc questions without breaking grain or double-counting. Interviews usually stress SQL depth, dimensional thinking (facts vs dimensions, grain), window functions for cohorts and rolling KPIs, and knowing when a query is wrong before it is slow.

  • Core SQL: execution order, joins, aggregates, NULL semantics, portable patterns across warehouses.
  • Advanced: CTEs for readability, window frames, data-quality checks, EXPLAIN / profiles for regressions.
  • Context here: downtime, machines, shifts, and OEE mirror real-world operational analytics; reuse the same SQL ideas for any event stream and dimension tables.
Click any question below to expand the answer, diagrams, and copy-paste SQL. Use the search box to filter by keyword (e.g. lag, having, nullif).
📐

Fundamentals

Q1–Q5
Q1 Easy What is the difference between WHERE and HAVING?
FilteringAggregation

WHERE filters rows before aggregation. HAVING filters groups after aggregation. You cannot use aggregate functions inside WHERE.

Execution order visualization

FROM + JOIN
All source rows assembled
WHERE
Row-level filter — no aggregates allowed here
GROUP BY + Aggregate
Groups formed, COUNT/SUM/AVG computed
HAVING
Group-level filter — aggregates are available here
sql
-- ❌ WRONG — can't use COUNT() in WHERE
SELECT machine_id, COUNT(*) FROM downtime_events
WHERE COUNT(*) > 5 GROUP BY machine_id;

-- ✅ CORRECT
SELECT machine_id, COUNT(*) AS event_count
FROM downtime_events
WHERE shift_date >= '2024-01-01'   -- row-level filter ✓
GROUP BY machine_id
HAVING COUNT(*) > 5;              -- group-level filter ✓
💡Memory trick: WHERE filters rows before they're grouped. HAVING filters groups after they're counted.
Q2 Easy What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
JoinsFundamentals

Visual — shaded region = rows returned

INNER JOIN A B Matches only
LEFT JOIN A B All left + matches
FULL OUTER A B All rows both sides
sql — factory context
-- INNER: only machines that have had downtime events
SELECT m.machine_name, COUNT(d.event_id)
FROM machines m INNER JOIN downtime_events d ON m.id = d.machine_id
GROUP BY m.machine_name;

-- LEFT: ALL machines, NULL count if no downtime events
SELECT m.machine_name, COUNT(d.event_id) AS event_count
FROM machines m LEFT JOIN downtime_events d ON m.id = d.machine_id
GROUP BY m.machine_name;
-- Zero-downtime machines show count = 0, not missing
In factory analytics, LEFT JOIN is your default. You almost always want all machines / all shifts in your report, even ones with no events — otherwise you hide zero-defect successes.
Q3 Easy What is the difference between UNION and UNION ALL?
Set Operations

UNION removes duplicates — requires a sort pass, slower. UNION ALL keeps all rows including duplicates — no sort, faster. Use UNION ALL by default unless you specifically need deduplication.

sql
-- Combine downtime events from two shifts
-- UNION ALL: keep all (may include duplicates if data is duped)
SELECT machine_id, event_time, reason FROM shift_1_events
UNION ALL
SELECT machine_id, event_time, reason FROM shift_2_events;

-- UNION: deduplicate (slower — only use when needed)
SELECT machine_id, event_time, reason FROM shift_1_events
UNION
SELECT machine_id, event_time, reason FROM shift_2_events;
💡Both queries must have the same number of columns in the same order with compatible data types.
Q4 Easy What does COALESCE do? Give a practical factory example.
NULL Handling

Returns the first non-NULL value from its argument list. Essential for handling sensor dropouts, missing readings, or LEFT JOIN NULLs.

sql
-- Sensor dropout: fall back to previous reading, then to 0
SELECT
    sensor_id,
    recorded_at,
    COALESCE(current_reading, prev_reading, 0) AS display_value,
    CASE WHEN current_reading IS NULL THEN 'IMPUTED' ELSE 'ACTUAL' END AS data_source
FROM sensor_readings;

-- NULL-safe division (avoid division-by-zero)
COALESCE(units_good * 1.0 / NULLIF(units_total, 0), 0) AS quality_rate
NULLIF(a, b) returns NULL if a = b, otherwise returns a. Pair it with COALESCE to prevent division-by-zero: COALESCE(value / NULLIF(denominator, 0), 0)
Q5 Easy What is the difference between DELETE, TRUNCATE, and DROP?
DDL / DML
CommandWhat it removesRollback?SpeedFactory rule
DELETESpecific rows (WHERE)YesSlow (logged)Use for archiving specific records
TRUNCATEAll rows, keeps structureNo (usually)FastAlmost never — destroys raw event history
DROPEntire table + structureNoInstantNever on production tables
In a factory analytics context, never TRUNCATE or DROP raw event tables. Always archive with a flag or move to a cold storage partition — sensor drift history is valuable for quality correlation analysis.

Aggregation & Grouping

Q6–Q9
Q6 Medium Write a query to find the top 3 machines by total downtime per shift.
Window FunctionsFactory KPI
sql
SELECT shift_id, machine_id, total_downtime, rnk
FROM (
    SELECT
        shift_id,
        machine_id,
        SUM(down_mins) AS total_downtime,
        RANK() OVER (
            PARTITION BY shift_id
            ORDER BY SUM(down_mins) DESC
        ) AS rnk
    FROM production_events
    GROUP BY shift_id, machine_id
) ranked
WHERE rnk <= 3
ORDER BY shift_id, rnk;

-- Snowflake shorthand: QUALIFY instead of subquery
SELECT shift_id, machine_id, SUM(down_mins) AS total_downtime,
       RANK() OVER (PARTITION BY shift_id ORDER BY SUM(down_mins) DESC) AS rnk
FROM production_events
GROUP BY shift_id, machine_id
QUALIFY rnk <= 3;
QUALIFY is Snowflake/BigQuery syntax that filters on window function results — cleaner than a subquery. Mention it in a Snowflake interview.
Q7 Medium What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
Window FunctionsRanking

Given values: 100, 100, 90 — how each function ranks them

ScoreROW_NUMBER()RANK()DENSE_RANK()
100111
100211
9033 (skipped 2)2

ROW_NUMBER — always unique. Tie-breaking is arbitrary (non-deterministic). RANK — ties get same rank, then skips. DENSE_RANK — ties get same rank, no skip. For factory use: use DENSE_RANK when ranking machines by defect count — rank 2 should mean second worst, not third worst because two tied for first.

sql — when to use each
-- ROW_NUMBER: deduplicate — get exactly one row per machine
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY machine_id ORDER BY ts DESC) AS rn
    FROM sensor_readings
) WHERE rn = 1;   -- latest reading per machine

-- DENSE_RANK: rank machines by defect count
DENSE_RANK() OVER (ORDER BY defect_count DESC) AS defect_rank
Q8 Medium Write a query to calculate a 7-day rolling average of daily defect counts.
Window FunctionsTime Series

ROWS frame — what's "in the window" for row 5 (today)

6 back
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Today
← AVG these 7

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

sql
SELECT
    event_date,
    daily_defects,
    AVG(daily_defects) OVER (
        ORDER BY event_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7day_avg,
    -- Calendar-based (handles missing dates differently):
    AVG(daily_defects) OVER (
        ORDER BY event_date
        RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
    ) AS rolling_7day_calendar_avg
FROM daily_quality_summary
ORDER BY event_date;
ROWS vs RANGE: ROWS counts physical rows (always exactly 7). RANGE uses the ORDER BY value — all rows within 6 days are included, which handles missing dates more naturally for time-series data.
Q9 Medium How do you find and remove duplicate rows in a table?
Data QualityDeduplication
sql
-- Step 1: Find duplicates
SELECT machine_id, event_timestamp, COUNT(*) AS occurrences
FROM sensor_readings
GROUP BY machine_id, event_timestamp
HAVING COUNT(*) > 1;

-- Step 2: Keep only the latest row per (machine, timestamp)
WITH deduped AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY machine_id, event_timestamp
               ORDER BY ingested_at DESC
           ) AS rn
    FROM sensor_readings
)
SELECT * FROM deduped WHERE rn = 1;

Window Functions

Q10–Q12
Q10 Medium Write a query to get each machine's most recent sensor reading.
Window FunctionsLatest Record
sql
-- ROW_NUMBER approach (preferred — deterministic)
SELECT machine_id, sensor_type, reading_value, recorded_at
FROM (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY machine_id, sensor_type
               ORDER BY recorded_at DESC
           ) AS rn
    FROM sensor_readings
    WHERE dq_flag = 'GOOD'
) t
WHERE rn = 1;

-- Snowflake shorthand with QUALIFY
SELECT machine_id, sensor_type, reading_value, recorded_at
FROM sensor_readings
WHERE dq_flag = 'GOOD'
QUALIFY ROW_NUMBER() OVER (PARTITION BY machine_id, sensor_type ORDER BY recorded_at DESC) = 1;
Q11 Medium What is LAG() and LEAD()? Calculate time between consecutive downtime events.
Window FunctionsTime Between Events

LAG() accesses the previous row's value. LEAD() accesses the next row's value. Both accept an offset and a default for when no prior/next row exists.

sql
SELECT
    machine_id,
    event_start,
    LAG(event_start) OVER (
        PARTITION BY machine_id
        ORDER BY event_start
    ) AS prev_event_start,
    DATEDIFF('minute',
        LAG(event_start) OVER (PARTITION BY machine_id ORDER BY event_start),
        event_start
    ) AS mins_since_last_downtime,
    LEAD(event_start) OVER (
        PARTITION BY machine_id
        ORDER BY event_start
    ) AS next_event_start
FROM downtime_events
ORDER BY machine_id, event_start;
💡MTBF (Mean Time Between Failures) is just AVG(mins_since_last_downtime) on top of this query — a key factory reliability metric.
Q12 Hard What is the difference between ROWS and RANGE in a window frame?
Window FramesAdvanced

ROWS is physical — counts actual row positions. RANGE is logical — groups rows with the same ORDER BY value, or uses an interval for date/numeric ranges.

ROWS vs RANGE — 3-day window with a missing date

DateValueROWS (3 PREC)RANGE (2 days)
Jan 110AVG(10)AVG(10)
Jan 220AVG(10,20)AVG(10,20)
Jan 4 (gap!)30AVG(10,20,30)AVG(20,30) only
Jan 540AVG(20,30,40)AVG(30,40)
sql
-- ROWS: exactly 3 preceding rows regardless of date gaps
AVG(reading) OVER (ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)

-- RANGE: all rows within a 3-hour interval (handles gaps correctly)
AVG(reading) OVER (
    ORDER BY ts
    RANGE BETWEEN INTERVAL '3 hours' PRECEDING AND CURRENT ROW
)
For sensor time-series with potential dropout gaps, RANGE with an interval usually gives more meaningful averages — it respects calendar time, not row count.

CTEs & Subqueries

Q13–Q14
Q13 Medium What is a CTE and when would you use one over a subquery?
CTEsReadability

A CTE (WITH clause) names an intermediate result you can reference multiple times. Prefer CTEs when: the same intermediate result is used more than once, the logic is complex enough that naming it improves readability, or you need recursion (CTEs only).

sql — OEE with CTE
WITH shift_oee AS (
    SELECT
        pe.shift_id,
        pe.machine_id,
        SUM(pe.run_mins) * 1.0 / MAX(s.planned_mins)    AS availability,
        SUM(pe.units_good) * 1.0
            / NULLIF(SUM(pe.units_total), 0)              AS quality,
        SUM(pe.units_total * dm.ideal_cycle_secs)
            / NULLIF(SUM(pe.run_mins) * 60, 0)             AS performance
    FROM production_events pe
    JOIN dim_shift s  ON pe.shift_id  = s.shift_id
    JOIN dim_machine dm ON pe.machine_id = dm.machine_id
    GROUP BY pe.shift_id, pe.machine_id
)
SELECT
    shift_id,
    machine_id,
    ROUND(availability * 100, 1)               AS avail_pct,
    ROUND(performance * 100, 1)               AS perf_pct,
    ROUND(quality * 100, 1)                  AS qual_pct,
    ROUND(availability * performance * quality * 100, 1) AS oee_pct
FROM shift_oee
WHERE availability < 0.75   -- flag underperforming shifts
ORDER BY oee_pct;
Q14 Hard Write a recursive CTE to traverse an employee org chart.
Recursive CTEAdvanced

How the recursion works

Anchor member
SELECT root rows WHERE manager_id IS NULL — the CEO / top node
Recursive member
UNION ALL — join each employee to their parent row from the previous iteration
Termination
Recursion stops when no new rows match — leaf nodes have no children
sql
WITH RECURSIVE org AS (
    -- Anchor: root nodes (no manager)
    SELECT
        employee_id,
        employee_name,
        manager_id,
        0                                    AS depth,
        employee_name                         AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: each employee joins to their manager's row
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        o.depth + 1,
        o.path || ' > ' || e.employee_name
    FROM employees e
    JOIN org o ON e.manager_id = o.employee_id
)
SELECT employee_id, employee_name, depth, path
FROM org
ORDER BY path;

-- Sample output:
-- Alice        | depth 0 | Alice
-- Bob          | depth 1 | Alice > Bob
-- Dana         | depth 2 | Alice > Bob > Dana
-- Eve          | depth 2 | Alice > Bob > Eve
-- Carol        | depth 1 | Alice > Carol

Data Quality & Real-World Problems

Q15–Q17
Q15 Hard How do you detect a "stuck sensor" that has returned the same value for 10+ consecutive readings?
Data QualityIslands & Gaps

Classic islands and gaps technique. Subtracting two row numbers creates a constant group ID for consecutive identical values.

How the group ID works

ValueRN overallRN per valueRN - RN_val = group_id
21.5110
21.5220
21.5330
22.1413 (new group)
21.5541 (new group)
sql
WITH grouped AS (
    SELECT
        sensor_id, reading_value, recorded_at,
        ROW_NUMBER() OVER (PARTITION BY sensor_id ORDER BY recorded_at)
        - ROW_NUMBER() OVER (PARTITION BY sensor_id, reading_value ORDER BY recorded_at)
        AS grp_id
    FROM sensor_readings
),
consecutive_counts AS (
    SELECT
        sensor_id, reading_value, grp_id,
        COUNT(*) AS consecutive_count,
        MIN(recorded_at) AS streak_start,
        MAX(recorded_at) AS streak_end
    FROM grouped
    GROUP BY sensor_id, reading_value, grp_id
)
SELECT sensor_id, reading_value, consecutive_count, streak_start, streak_end
FROM consecutive_counts
WHERE consecutive_count >= 10
ORDER BY consecutive_count DESC;
💡This is the most sophisticated SQL pattern in factory analytics. Knowing the islands/gaps technique by name and being able to explain the subtraction trick will impress any interviewer.
Q16 Hard Flag sensor readings that are more than 3 standard deviations from the mean (Z-score outlier detection).
Statistical QCSPC
sql
WITH sensor_stats AS (
    -- Compute baseline stats on GOOD readings only
    SELECT
        sensor_id,
        AVG(reading_value)    AS mean_val,
        STDDEV(reading_value) AS stddev_val
    FROM sensor_readings
    WHERE dq_flag = 'GOOD'
    GROUP BY sensor_id
)
SELECT
    r.sensor_id,
    r.reading_value,
    r.recorded_at,
    ROUND(s.mean_val, 3)                                          AS mean_val,
    ROUND(ABS(r.reading_value - s.mean_val)
        / NULLIF(s.stddev_val, 0), 2)                            AS z_score,
    CASE
        WHEN ABS(r.reading_value - s.mean_val)
            / NULLIF(s.stddev_val, 0) > 3  THEN 'SUSPECT'
        ELSE 'GOOD'
    END                                                               AS dq_flag
FROM sensor_readings r
JOIN sensor_stats s ON r.sensor_id = s.sensor_id;
Always compute baseline stats on verified GOOD readings only, not all readings. If you include outliers in the baseline, the mean and stddev shift — making the threshold less sensitive to the anomalies you're trying to catch.
Q17 Hard Find overlapping downtime events for the same machine.
Interval LogicSelf Join

Two intervals [A_start, A_end] and [B_start, B_end] overlap if and only if: A_start < B_end AND A_end > B_start

Overlap condition — visualized

Event A: |——————|
Event B:     |——————| ← overlaps (B starts before A ends)
Event C:             |——| ← no overlap (C starts after A ends)
sql
SELECT
    a.machine_id,
    a.event_id      AS event_a,
    b.event_id      AS event_b,
    a.start_time    AS a_start,
    a.end_time      AS a_end,
    b.start_time    AS b_start,
    b.end_time      AS b_end
FROM downtime_events a
JOIN downtime_events b
    ON  a.machine_id = b.machine_id
    AND a.event_id   < b.event_id     -- avoid self-join & duplicate pairs
    AND a.start_time < b.end_time     -- overlap condition part 1
    AND a.end_time   > b.start_time;  -- overlap condition part 2

Performance & Design

Q18–Q20
Q18 Medium What is the difference between a clustered and non-clustered index? What's the Snowflake equivalent?
IndexesSnowflake

Clustered index — physically sorts table rows by the index key. One per table. Non-clustered index — separate structure pointing back to rows. Multiple allowed.

In Snowflake, data is stored in micro-partitions automatically. The equivalent is a cluster key — defining one causes Snowflake to sort data within micro-partitions, enabling more aggressive partition pruning on your most common filter columns.

sql — Snowflake cluster key
-- Define cluster key on most common filter columns
ALTER TABLE production_events
CLUSTER BY (shift_date, machine_id);

-- Check clustering information
SELECT SYSTEM$CLUSTERING_INFORMATION('production_events');
💡In a factory analytics context: cluster on shift_date and machine_id — these are the most common filter columns in OEE, downtime, and quality queries. Snowflake will prune micro-partitions that don't match, making queries dramatically faster at scale.
Q19 Medium A dashboard query that ran in 2 seconds now takes 45 seconds. How do you diagnose it?
Query OptimizationTroubleshooting
Run EXPLAIN / Query Profile
Snowflake Query Profile, PostgreSQL EXPLAIN ANALYZE — look for full table scans, cartesian products, spill to disk
Check data volume growth
Did the table 10x in size? Did a new partition get added? Row count explosion from a bad JOIN?
Check for schema changes
Did a column type change break a partition filter? Did a JOIN key become nullable?
Fix in order
Filter pushdown first → index/cluster key → query rewrite → pre-aggregation table
The most common cause of sudden slowdowns in factory analytics is a cartesian product from an ambiguous JOIN — e.g., joining on machine_id without also joining on shift_date, causing every machine row to match every shift row.
Q20 Medium What is the difference between EXPLAIN and EXPLAIN ANALYZE?
Query Planning
CommandRuns query?ShowsWhen to use
EXPLAINNoEstimated plan, estimated row countsQuery too expensive to run speculatively
EXPLAIN ANALYZEYesActual plan, actual row counts, actual time per node, spillsDiagnosing a slow query already in production
Snowflake Query ProfileYes (post-run)Node-level stats, spill to disk, partition pruning ratioAlways use after any Snowflake query optimization
💡The partition pruning ratio in Snowflake's Query Profile is the most useful single metric — if you're scanning 800 of 800 micro-partitions on a table filtered by date, your cluster key is wrong or missing.
🏭

Factory Analytics Patterns

OEE + Star Schema

OEE = Availability × Performance × Quality

Availability
A
run_mins ÷ planned_mins
×
Performance
P
actual_rate ÷ ideal_rate
×
Quality
Q
units_good ÷ units_total
=
OEE
A×P×Q
world-class ≥ 85%

Star schema — fact_production_event

dim_shift
id · date · planned_mins
dim_machine
id · name · ideal_rate
fact_production_event
run_mins · down_mins
units_good · actual_rate
dim_product
sku · spec_rate
dim_downtime_reason
code · category · planned?
sql — full OEE query from star schema
WITH oee_calc AS (
    SELECT
        s.shift_date,
        pe.machine_id,
        m.machine_name,
        SUM(pe.run_mins) * 1.0 / MAX(s.planned_mins)        AS availability,
        SUM(pe.units_total * m.ideal_cycle_secs / 60.0)
          / NULLIF(SUM(pe.run_mins), 0)                      AS performance,
        SUM(pe.units_good) * 1.0
          / NULLIF(SUM(pe.units_total), 0)                  AS quality,
        COUNT(CASE WHEN pe.dq_flag = 'GOOD' THEN 1 END) * 1.0
          / NULLIF(COUNT(*), 0)                              AS data_quality_score
    FROM fact_production_event pe
    JOIN dim_shift   s ON pe.shift_id   = s.shift_id
    JOIN dim_machine m ON pe.machine_id = m.machine_id
    WHERE s.shift_date >= DATEADD('day', -30, CURRENT_DATE())
    GROUP BY s.shift_date, pe.machine_id, m.machine_name
)
SELECT
    shift_date,
    machine_id,
    machine_name,
    ROUND(availability  * 100, 1) AS avail_pct,
    ROUND(performance   * 100, 1) AS perf_pct,
    ROUND(quality       * 100, 1) AS qual_pct,
    ROUND(availability * performance * quality * 100, 1) AS oee_pct,
    ROUND(data_quality_score * 100, 1)                 AS dq_score_pct,
    -- Rolling 7-day OEE trend
    AVG(availability * performance * quality) OVER (
        PARTITION BY machine_id
        ORDER BY shift_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) * 100                                           AS oee_7day_rolling_avg
FROM oee_calc
ORDER BY shift_date, oee_pct;

Cheat Sheet

FunctionPurposeKey gotcha
WHEREFilter rows before GROUP BYCannot use aggregate functions
HAVINGFilter groups after GROUP BYRequires GROUP BY (or window context)
COALESCE(a,b,c)First non-NULL valueEvaluates left to right, stops at first non-NULL
NULLIF(a,b)Return NULL if a=b, else aPair with COALESCE for safe division
ROW_NUMBER()Unique row rank (no ties)Tie-breaking is non-deterministic
RANK()Rank with ties, skips nextGaps in sequence when ties occur
DENSE_RANK()Rank with ties, no skipUse for factory machine rankings
LAG(col, n, default)Previous row valueReturns NULL if no prior row (set default)
LEAD(col, n, default)Next row valueReturns NULL if no next row (set default)
ROWS BETWEENPhysical row count windowAlways exact N rows regardless of date gaps
RANGE BETWEENLogical value-based windowUse with INTERVAL for time-series
WITH RECURSIVEHierarchical traversalMust use UNION ALL, not UNION
QUALIFYFilter on window function resultSnowflake / BigQuery only
UNION ALLCombine results, keep dupesFaster than UNION — use by default
EXPLAIN ANALYZEActual execution statsActually runs the query — use on slow prod queries
SUM(CASE WHEN …)Conditional aggregation (all warehouses)Prefer COUNT_IF / BOOL_AND in Snowflake when it reads cleaner
FILTER (WHERE …)Postgres-style conditional aggNot everywhere; CASE/SUM is the portable default
Surrogate keyStable id for a dimension rowOften ROW_NUMBER() or hash — document grain and SCD rules
SCD Type 2History when attributes changevalid_from / valid_to or is_current; join on “as-of” date
Incremental modeldbt-style merge by key + watermarkAlign keys with business grain; test uniqueness

Live-coding tip

Many analytics engineering loops end with one end-to-end metric query—here, OEE from a star schema with a rolling trend. Practice explaining grain, joins, and why each term is multiplied so you can adapt the same structure to revenue, retention, or funnel conversion.