SQL Skills

Cohort Retention Queries in 6 Frameworks

The query patterns behind every retention chart you've seen. From classic monthly cohorts to weighted retention and resurrection.

13 min read·SQL Skills

Cohort retention is the single most-asked analytical SQL pattern in growth and product interviews. It's also the one most candidates write almost correctly — close enough to look right in a demo, subtly wrong in ways that bite the moment the interviewer asks a follow-up.

The reason candidates struggle isn't that retention is mathematically hard. It's that there are actually six different kinds of retention questions hiding under the same name. Get the framing right and the SQL is straightforward. Pick the wrong framing and you'll compute a number that's correct for some question — just not the one being asked.

This guide walks through the six retention frameworks you'll encounter in interviews, what each measures, and the clean SQL pattern for each.

The Mental Model

Every retention query has three pieces:

The cohort. A group of users defined by something — most often signup month, but it can be signup channel, first purchase amount, or any other dimension.
The retention event. What does "retained" mean? Logged in? Made a purchase? Opened the app? Each definition gives a different number.
The window. How far out are you measuring? Month 1? Month 3? Rolling weekly?

Confusing any of these three is the most common interview mistake. Always articulate them out loud before writing SQL.

Framework 1 — Classic Monthly Cohort Retention

The question: "For each monthly signup cohort, what percentage of users were still active in months 1, 2, 3...?"

This is the textbook retention chart — the triangle that shows monthly cohorts as rows and aging months as columns.

WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('month', MIN(created_at)) AS cohort_month
  FROM users
  GROUP BY user_id
),
activity AS (
  SELECT
    c.user_id,
    c.cohort_month,
    DATE_TRUNC('month', e.event_time) AS active_month
  FROM cohorts c
  JOIN events e ON e.user_id = c.user_id
  WHERE e.event_name = 'session_start'
)
SELECT
  cohort_month,
  COUNT(DISTINCT user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN active_month = cohort_month + INTERVAL '1 month'
    THEN user_id
  END)::numeric / NULLIF(COUNT(DISTINCT user_id), 0) AS m1_retention,
  COUNT(DISTINCT CASE
    WHEN active_month = cohort_month + INTERVAL '3 months'
    THEN user_id
  END)::numeric / NULLIF(COUNT(DISTINCT user_id), 0) AS m3_retention
FROM activity
GROUP BY cohort_month
ORDER BY cohort_month;

The senior signal: acknowledging the "triangle problem" out loud — recent cohorts haven't aged enough to have month-3 retention yet, so those cells will be zero or null. Real retention charts often clip the immature cohorts off the right side.

Framework 2 — Activation-Based Cohort

The question: "For users who completed activation, what percentage were still active 30/60/90 days later?"

The difference from Framework 1 is what defines the cohort start. Here it's the activation event (first meaningful action), not the signup date.

WITH activated AS (
  SELECT
    user_id,
    MIN(event_time) AS activation_time
  FROM events
  WHERE event_name = 'first_purchase'
  GROUP BY user_id
)
SELECT
  DATE_TRUNC('month', a.activation_time) AS activation_cohort,
  COUNT(DISTINCT a.user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN e.event_time BETWEEN a.activation_time + INTERVAL '30 days'
                          AND a.activation_time + INTERVAL '37 days'
    THEN a.user_id
  END)::numeric / NULLIF(COUNT(DISTINCT a.user_id), 0) AS d30_retention
FROM activated a
LEFT JOIN events e
  ON e.user_id = a.user_id
  AND e.event_name = 'session_start'
GROUP BY DATE_TRUNC('month', a.activation_time)
ORDER BY activation_cohort;

The senior signal: using a window around the retention date (day 30-37, not exactly day 30) so users who happened to be active on day 30 specifically aren't undercounted just because they were active a day later instead.

Framework 3 — Behavioral / Acquisition Cohort

The question: "How does month-3 retention differ by acquisition channel?"

Cohort isn't time-based — it's defined by a user attribute (channel, plan, region).

WITH cohorts AS (
  SELECT
    user_id,
    channel,
    DATE_TRUNC('month', created_at) AS signup_month
  FROM users
  WHERE created_at >= CURRENT_DATE - INTERVAL '12 months'
),
activity AS (
  SELECT
    c.user_id,
    c.channel,
    c.signup_month,
    DATE_TRUNC('month', e.event_time) AS active_month
  FROM cohorts c
  JOIN events e ON e.user_id = c.user_id
  WHERE e.event_name = 'session_start'
)
SELECT
  channel,
  COUNT(DISTINCT user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN active_month = signup_month + INTERVAL '3 months'
    THEN user_id
  END)::numeric / NULLIF(COUNT(DISTINCT user_id), 0) AS m3_retention
FROM activity
GROUP BY channel
ORDER BY m3_retention DESC;

The senior signal: noting that channel mix can shift over time — a channel showing high retention might just be a channel that disproportionately acquired older cohorts. Real comparisons should hold the time window constant.

Framework 4 — Rolling / Weekly Cohort

The question: "What's our 4-week rolling retention by weekly cohort?"

Same as monthly but at weekly granularity. Used when you want faster feedback loops — you'll know within weeks if a product change moved retention, instead of waiting months.

WITH cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC('week', MIN(created_at)) AS cohort_week
  FROM users
  GROUP BY user_id
),
activity AS (
  SELECT
    c.user_id,
    c.cohort_week,
    DATE_TRUNC('week', e.event_time) AS active_week
  FROM cohorts c
  JOIN events e ON e.user_id = c.user_id
  WHERE e.event_name = 'session_start'
)
SELECT
  cohort_week,
  COUNT(DISTINCT user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN active_week = cohort_week + INTERVAL '4 weeks'
    THEN user_id
  END)::numeric / NULLIF(COUNT(DISTINCT user_id), 0) AS w4_retention
FROM activity
WHERE cohort_week >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY cohort_week
ORDER BY cohort_week;

The senior signal: weekly cohorts are noisier than monthly. A single week might have very few signups, so the retention rate becomes statistically meaningless. Real dashboards usually show both — weekly for trend-spotting, monthly for stability.

Framework 5 — Resurrection / Returning User Cohort

The question: "How many users came back after being inactive for 30+ days?"

This isn't retention exactly — it's the opposite story. Users who lapsed and then returned.

WITH user_sessions AS (
  SELECT
    user_id,
    event_time,
    LAG(event_time) OVER (
      PARTITION BY user_id
      ORDER BY event_time
    ) AS prev_session
  FROM events
  WHERE event_name = 'session_start'
),
resurrections AS (
  SELECT
    user_id,
    event_time AS resurrection_time
  FROM user_sessions
  WHERE prev_session IS NOT NULL
    AND event_time - prev_session > INTERVAL '30 days'
)
SELECT
  DATE_TRUNC('month', resurrection_time) AS month,
  COUNT(DISTINCT user_id) AS resurrected_users
FROM resurrections
GROUP BY DATE_TRUNC('month', resurrection_time)
ORDER BY month;

The senior signal: pointing out that "30 days of inactivity" is a definition you should validate with the product team. Some products consider 7-day gaps churn (consumer mobile), some consider 90-day gaps churn (enterprise SaaS). Asking the question matters.

Framework 6 — Weighted / Value-Based Retention

The question: "What's our revenue-weighted month-3 retention?"

Counting users equally treats a $5 user and a $5000 user the same. For revenue analysis, weight by spend.

WITH cohort_revenue AS (
  SELECT
    user_id,
    DATE_TRUNC('month', MIN(created_at)) AS cohort_month,
    SUM(amount) AS lifetime_revenue
  FROM users u
  JOIN orders o USING (user_id)
  GROUP BY user_id, DATE_TRUNC('month', MIN(created_at))
),
month_3_active AS (
  SELECT DISTINCT
    cr.user_id,
    cr.cohort_month,
    cr.lifetime_revenue
  FROM cohort_revenue cr
  JOIN events e ON e.user_id = cr.user_id
  WHERE DATE_TRUNC('month', e.event_time) = cr.cohort_month + INTERVAL '3 months'
)
SELECT
  cr.cohort_month,
  SUM(cr.lifetime_revenue) AS cohort_total_value,
  COALESCE(SUM(m3.lifetime_revenue), 0) AS retained_value,
  COALESCE(SUM(m3.lifetime_revenue), 0) / NULLIF(SUM(cr.lifetime_revenue), 0) AS revenue_weighted_m3
FROM cohort_revenue cr
LEFT JOIN month_3_active m3 ON m3.user_id = cr.user_id
GROUP BY cr.cohort_month
ORDER BY cr.cohort_month;

The senior signal: revenue-weighted retention is often higher than user-weighted retention because high-value users tend to stick around. Calling that out shows analytical depth — and surfaces the right kind of retention question for the product (consumer vs. SaaS).

The Five Questions Interviewers Always Follow Up With

After you've written any retention query, expect at least two of these:

What's the denominator? Be precise. "Total users in the cohort" or "users who completed activation" — these give different numbers.
What counts as 'active'? Pin it down. Session start, purchase, login event, app open — interviewers want to see you make this explicit.
How do you handle immature cohorts? The triangle problem. Recent cohorts don't have month-6 data yet. Acknowledge it.
What if a user signed up twice? Deduplication question. Most retention queries assume one user_id = one cohort. If users can re-signup, you need to decide.
How does this compare to industry benchmarks? Be honest if you don't know. "Consumer mobile apps target 30%+ M3, B2B SaaS targets 90%+. I'd want to know the product context before declaring this number good or bad."

The Practice Plan

Pick one of the 6 frameworks. Write the SQL on a real-feeling schema. Then immediately ask yourself: what would I change to handle weekly instead of monthly? What if the cohort is by channel instead of by date? What if I needed revenue-weighted instead of user-weighted?

Six frameworks × three variations each = 18 queries. Write them all. By the end you'll see any retention question and instantly know which framework it maps to. That's the muscle interviewers reward.

Practice next

Explore SQL challenges

100+ challenges across Growth, SaaS, Marketing, Product, and Finance — graded by AI, ranked by difficulty.

Explore SQL challenges

Back to all resources