By Role

The Data Analyst SQL Interview Playbook

Every category of SQL question you'll see in a data analyst loop, with frameworks for each and the mistakes that cost offers.

12 min read·By Role

Every data analyst SQL interview is a variation on the same six question types. Master those six and you'll handle 90% of what any analyst loop — FAANG, AI startup, fintech, or otherwise — throws at you.

The problem is most prep guides don't categorize them. They give you a pile of practice problems with no structure, and you spend three weeks grinding through them without ever building the mental model of "this is a Type 3 question — I know the shape."

This playbook fixes that. Six categories, the framework for each, the mistakes that cost offers, and how to walk into any analyst loop with a query plan in your head before you write a line of SQL.

Why Categorization Matters

A data analyst SQL interview question is rarely novel. It's almost always a variation on something you've seen before — wearing different schema clothes. If you can recognize the category in 30 seconds, you reach for the right SQL skeleton instinctively and spend your time on the parts that actually score (edge cases, business framing, narration).

Without categorization, every question feels like a cold start. With it, you're operating from pattern matches.

The Six Categories

Category 1: Aggregation with Grouping

The bread and butter. Given a transactional table, compute a metric grouped by one or more dimensions.

Example: "Find the top 5 product categories by total revenue in the last 30 days."

The SQL is almost always:

SELECT
  p.category,
  SUM(o.order_total) AS revenue
FROM orders o
JOIN products p ON p.product_id = o.product_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
  AND o.status = 'completed'
GROUP BY p.category
ORDER BY revenue DESC
LIMIT 5;

What's being tested: can you join cleanly, group correctly, filter to the right window, and rank.

The senior moves:

Exclude cancelled/refunded orders without being asked
Use >= not > on date predicates (so the first day is inclusive)
Order deterministically (add a secondary ORDER BY p.category ASC if there could be ties)
Mention what happens to NULL order_total or NULL category

Category 2: Top-N Per Group

The most-asked window function pattern. "For each group, find the top N records by some metric."

Example: "For each customer, find their two most recent orders."

WITH ranked AS (
  SELECT
    customer_id,
    order_id,
    order_date,
    order_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
  FROM orders
)
SELECT customer_id, order_id, order_date, order_total
FROM ranked
WHERE rn <= 2;

What's being tested: can you use window functions correctly, and do you know when to reach for them.

The senior moves:

Choose ROW_NUMBER (unique rank) vs. RANK (ties share rank) vs. DENSE_RANK (no gaps) explicitly, and explain why
Handle the tiebreaker on ORDER BY (order_date DESC, order_id DESC is safer than order_date DESC alone)
Acknowledge that customers with only one order will only show one row

Category 3: Funnel / Conversion

Multi-step user journey from one event to another. Compute conversion rates between steps.

Example: "Given an events table tracking viewed_product, added_to_cart, purchased, compute the conversion rate at each step."

WITH user_events AS (
  SELECT
    user_id,
    MAX(CASE WHEN event = 'viewed_product' THEN 1 ELSE 0 END) AS viewed,
    MAX(CASE WHEN event = 'added_to_cart'  THEN 1 ELSE 0 END) AS added,
    MAX(CASE WHEN event = 'purchased'      THEN 1 ELSE 0 END) AS purchased
  FROM events
  WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  SUM(viewed)    AS users_viewed,
  SUM(added)     AS users_added,
  SUM(purchased) AS users_purchased,
  ROUND(SUM(added)::numeric    / NULLIF(SUM(viewed), 0)  * 100, 1) AS view_to_add_rate,
  ROUND(SUM(purchased)::numeric / NULLIF(SUM(added), 0)  * 100, 1) AS add_to_purchase_rate
FROM user_events;

What's being tested: can you pivot user events into a per-user funnel, then aggregate.

The senior moves:

NULLIF to prevent division by zero
Distinguish sequential funnels (must hit step 1 before step 2) from any-time funnels (any user who ever did both)
Cast to numeric before dividing — interviewers love catching integer-division bugs

Category 4: Cohort Retention

Group users by a starting period, then track their behavior in subsequent periods.

Example: "For each monthly signup cohort, what percentage of users were still active in month 3?"

WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('month', MIN(created_at)) AS cohort_month
  FROM users
  GROUP BY user_id
),
activity AS (
  SELECT
    c.user_id,
    c.cohort_month,
    DATE_TRUNC('month', e.event_date) AS active_month
  FROM cohorts c
  JOIN events e ON e.user_id = c.user_id
)
SELECT
  cohort_month,
  COUNT(DISTINCT CASE
    WHEN active_month = cohort_month + INTERVAL '3 months'
    THEN user_id
  END)::numeric / NULLIF(COUNT(DISTINCT user_id), 0) AS month_3_retention
FROM activity
GROUP BY cohort_month
ORDER BY cohort_month;

What's being tested: can you reason about time-based cohorts, use conditional aggregation, and handle the date math.

The senior moves:

DATE_TRUNC at the right grain
DISTINCT user_id (not just COUNT(*)) — a user can have multiple events in month 3
NULLIF again for division safety
Explain that immature cohorts (recent months) won't have month-3 data yet

Category 5: Running Totals / Cumulative

Sum or count something incrementally over time.

Example: "Compute a 7-day rolling average of daily active users."

SELECT
  date_day,
  dau,
  AVG(dau) OVER (
    ORDER BY date_day
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d_avg
FROM daily_active_users
ORDER BY date_day;

What's being tested: can you handle frame specifications in window functions.

The senior moves:

Specify the frame explicitly (ROWS BETWEEN N PRECEDING AND CURRENT ROW)
Mention that you'd want at least 7 days of preceding data before trusting the first values
Distinguish ROWS (positional) from RANGE (value-based) frames

Category 6: Period Comparison

Compare a metric across two time periods — week-over-week, month-over-month, year-over-year.

Example: "Compare this month's revenue to the same month last year."

WITH monthly AS (
  SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(order_total) AS revenue
  FROM orders
  WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '13 months')
    AND status = 'completed'
  GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
  current.month            AS current_month,
  current.revenue          AS current_revenue,
  prior.revenue            AS prior_year_revenue,
  ROUND((current.revenue - prior.revenue) / NULLIF(prior.revenue, 0) * 100, 1) AS yoy_pct_change
FROM monthly current
LEFT JOIN monthly prior ON prior.month = current.month - INTERVAL '12 months'
WHERE current.month = DATE_TRUNC('month', CURRENT_DATE);

What's being tested: can you self-join (or use LAG over a longer window) to compare periods.

The senior moves:

LEFT JOIN (not INNER JOIN) so months without prior-year data still show up
Cast to numeric before dividing
Mention seasonality if relevant — YoY filters it out; WoW doesn't

The Five Mistakes That Cost Offers

After watching dozens of mock loops, the same five failure modes show up:

1. Diving in without restating the question. Every senior interviewer scores this. Spend 60 seconds restating what you understand the question to mean and what you'll return. That's the senior signal.

2. Forgetting NULLIF on division. Integer or numeric division on a possibly-zero denominator will crash or return NULLs. Always guard it. Mention it out loud — "adding NULLIF here so a cohort with zero users doesn't crash".

3. Implicit ordering. A SELECT without ORDER BY returns rows in undefined order. In an interview, this is treated as a bug. Always add a deterministic ORDER BY.

4. Confusing WHERE and HAVING. WHERE filters rows before aggregation. HAVING filters groups after. Mixing these up is the most common interview SQL error — and it's exactly the one that gets called out in code review.

5. Going silent. When you hit an error or a tricky part, narrate. "This is failing because the join is ambiguous on id — let me alias both tables." Composure under stress is half the score in senior interviews.

How To Practice

The honest answer: forget grinding 200 LeetCode problems. Pick five problems from each category above. Solve each one twice — first with a 25-minute timer, then again with no timer but narrating every decision out loud as if interviewing.

Two passes through 30 problems builds more interview-ready fluency than 200 silent reps. The narration muscle is the one you'll actually use.

When you can identify the category of any question in 30 seconds and reach for the right skeleton, you're ready.

Practice next

Explore SQL challenges

100+ challenges across Growth, SaaS, Marketing, Product, and Finance — graded by AI, ranked by difficulty.

Explore SQL challenges

Back to all resources