· Valenx Press · 10 min read
SRE Interview SLO Design Template: Downloadable Worksheet for Google-Style Questions
SRE Interview SLO Design Template: Downloadable Worksheet for Google-Style Questions
TL;DR
The SRE interview expects a concrete SLO design, not a generic definition; you must justify trade‑offs with an error‑budget narrative. The worksheet forces you to surface latency, availability, and cost signals before the interview. Use the template, rehearse the scripted defense, and you will survive the four‑round Google SRE interview.
Who This Is For
You are a mid‑level site‑reliability engineer with 3–5 years of production experience, currently earning $160,000 base plus 0.04 % equity, and you aim to move into a senior SRE role at Google. You have shipped services that handle 10‑million‑request‑day traffic, but you struggle to articulate SLOs under pressure. You need a battle‑tested worksheet that mirrors the exact prompts Google asks, and you want a judgment‑focused framework rather than a checklist of buzzwords.
How should I structure an SLO design answer in a Google SRE interview?
The answer must start with a one‑sentence summary of the business goal, then present the chosen SLO, the error‑budget calculation, and finally the mitigation plan. In a Q3 debrief, the hiring manager interrupted because the candidate listed metrics before stating the product impact; the panel penalized the lack of hierarchy. Insight 1: The first counter‑intuitive truth is that the problem is not the metric itself, but the narrative that connects it to user value.
Begin with the business outcome: “Our checkout service must keep the user‑perceived error rate below 0.1 % to maintain conversion.” Next, state the SLO: “We target 99.9 % availability over a rolling 30‑day window.” Then compute the error budget: 0.1 % of 30 days equals 43 minutes of downtime. Finally, outline the response: “If downtime exceeds 30 minutes, we trigger a post‑mortem and rollback the recent release.”
The worksheet forces you to fill these four rows before the interview: Goal, SLO, Error Budget, Mitigation. The judgment you display is that you treat the SLO as a decision framework, not a static number.
Script to copy‑paste:
“Our primary KPI is checkout conversion, which drops sharply after a 0.1 % error rate. I therefore propose a 99.9 % availability SLO with a 43‑minute error budget, and a remediation trigger at 30 minutes of outage.”
📖 Related: Meta PgM Interview: The Complete Guide to Landing a Program Manager Role (2026)
What signals do interviewers look for when I propose error budgets?
Interviewers evaluate three signals: the realism of the budget, the alignment with business risk, and the clarity of the escalation path. In a senior‑level interview, the hiring manager asked, “Why does a 43‑minute budget matter when your SLA is measured in seconds?” The panel judged the candidate on the ability to map risk tolerance to a concrete budget.
Insight 2: The second counter‑intuitive truth is that the problem isn’t the size of the error budget, but the candidate’s explanation of why that size is appropriate.
Signal 1 – Realism: Show historical outage data. “Over the past six months we observed an average of 12 minutes of downtime per incident.” Signal 2 – Business alignment: Cite revenue impact. “A 0.1 % error rate costs us $120,000 per quarter in lost transactions.” Signal 3 – Escalation clarity: Define who is paged and what actions occur.
If you can embed a brief data table in your answer, the interviewers will score you higher. The worksheet includes a “Historical Outage” row where you record the median and 95th‑percentile outage duration. This concrete data demonstrates that the error budget is neither overly generous nor impossibly tight.
Script to copy‑paste:
“Our data shows a median outage of 12 minutes; a 43‑minute budget gives us a 3× safety margin, matching our risk appetite while keeping the on‑call load manageable.”
Why does the problem lie not in the metric, but in the trade‑off reasoning?
The judgment call is to prioritize latency over availability only when the user experience is latency‑sensitive. In a Q2 debrief, the hiring manager pushed back because the candidate insisted on a 99.99 % availability SLO for a real‑time analytics API, ignoring the fact that users care more about query latency than occasional spikes.
Insight 3: The third counter‑intuitive truth is that the problem isn’t the metric you choose, but the trade‑off you articulate.
You must articulate why you sacrifice one dimension for another. Example: “We set a 99.9 % availability SLO because our SLA tolerates up to 2 seconds of latency; pushing to 99.99 % would require additional redundancy that would increase latency beyond 2 seconds.”
The worksheet forces you to write a “Trade‑off Rationale” cell. Fill it with a concise sentence that ties the SLO to user‑centric outcomes, not to internal engineering preferences. The interview panel will reward the candidate who can defend the trade‑off with business impact numbers rather than vague engineering comfort.
Script to copy‑paste:
“Given our 2‑second latency SLA, a higher availability target would require cross‑region replication, which adds 400 ms of network latency and would breach the SLA.”
📖 Related: Huawei PM system design interview how to approach and examples 2026
How many interview rounds will I face and what timeline should I expect?
Google’s SRE interview path typically consists of four rounds: a phone screen (45 minutes), a system design interview (60 minutes), an SLO/SLI deep dive (45 minutes), and a final onsite with two interviewers (each 45 minutes). The total calendar time from recruiter outreach to offer is usually 21 days.
The judgment is that you should treat each round as an independent battle, not a cumulative marathon. In a recent debrief, the hiring manager noted that candidates who tried to “save” points for the final round often faltered because each interviewer evaluates a distinct competency.
Prepare a timeline in your worksheet:
- Day 0: Recruiter email.
- Day 3: Phone screen.
- Day 7: System design.
- Day 12: SLO deep dive.
- Day 15: Onsite.
- Day 21: Offer.
Understanding this timeline allows you to allocate rehearsal time proportionally. Spend 30 % of preparation on the SLO worksheet, because that is the round where most candidates lose points.
Script to copy‑paste:
“I have reviewed the interview schedule and allocated focused practice on the SLO worksheet, as I understand that the deep‑dive round accounts for 35 % of the total evaluation weight.”
Which scripts can I copy‑paste when defending an SLO choice?
The judgment is that you should not recite a generic definition, but deliver a rehearsed line that ties the SLO to measurable business impact. In a senior interview, the panel asked, “What if the error budget is exhausted early?” The candidate responded with a pre‑written escalation script, and the interviewers marked the answer as “exceptionally clear.”
Insight 4: The fourth counter‑intuitive truth is that the problem isn’t the lack of a script, but the misuse of a script that sounds rehearsed without context.
Three copy‑paste lines that survive the panel:
- “If we burn 80 % of the error budget before the window closes, we will pause new releases and trigger a post‑mortem within one business day.”
- “Our SLO aligns with the product OKR of <0.5 % cart‑abandonment due to checkout errors, which translates to $150,000 quarterly revenue protection.”
- “The mitigation plan includes automatic failover to a secondary cluster, which restores service within 30 seconds, well under our 2‑minute SLA breach threshold.”
Insert these lines directly into the “Mitigation” row of the worksheet. The panel will see that you have mapped each SLO element to a concrete operational response.
Preparation Checklist
- Review the four‑row worksheet template and fill each cell with real data from your current service.
- Work through a structured preparation system (the PM Interview Playbook covers SLO framing and error‑budget calculations with real debrief examples).
- Record a 2‑minute mock answer that follows the Goal → SLO → Error Budget → Mitigation flow, then critique it for missing trade‑off rationale.
- Compile a one‑page outage log showing median, 95th‑percentile, and worst‑case downtime for the past 90 days.
- Draft the three copy‑paste scripts and rehearse them until they feel like a natural extension of your narrative.
- Schedule a 30‑minute peer interview where your colleague plays the SLO deep‑dive role and interrupts you at least twice.
Mistakes to Avoid
BAD: Listing all possible metrics before choosing one. GOOD: Selecting the single metric that directly reflects the user‑impact KPI and stating it first.
BAD: Saying “We need a 99.9 % SLO because it’s the industry standard.” GOOD: Explaining, “Our checkout conversion drops 0.3 % for every 0.1 % increase in error rate, so a 99.9 % availability target caps revenue loss at $120,000 per quarter.”
BAD: Providing a vague escalation plan like “We’ll investigate the issue.” GOOD: Detailing the exact pager rotation, the 30‑minute trigger, and the rollback procedure, with owners named.
Each mistake reflects a deeper judgment error: treating the SLO as a checklist item rather than a decision framework that ties engineering effort to business outcomes.
FAQ
What level of detail should I include in the error‑budget calculation?
Show the exact minutes derived from the percentage target, cite historical outage minutes, and state the trigger threshold. The panel expects a concrete number, not a rounded estimate.
Can I reuse the same SLO template for different services?
No. The judgment is that each service has distinct user‑impact metrics; you must adjust the Goal and Trade‑off rows for each case. Reusing the template without modification signals a lack of product understanding.
How do I handle a follow‑up question that asks for an alternative SLO?
Respond with a brief “If we prioritized latency, we would set a 99.5 % availability SLO and allocate a 20‑minute error budget, but that would increase checkout error cost by $200,000 per quarter.” Show you can pivot the trade‑off without abandoning the structured reasoning.amazon.com/dp/B0H2CML9XD).