toshipon

Posted on Mar 29

Why an SRE Engineer Built a Product Validation Tool — Bringing Observability Thinking to Product Development

#webdev #ai #startup #career

"Why Would an SRE Build a Product Tool?"

I get asked this a lot.

By day, I'm an SRE engineer at a fintech company. Terraform, AWS, Azure, Kubernetes — my job is keeping systems reliable. I think in dashboards, alerts, and incident response.

But when I started building side projects, something felt deeply wrong.

Infrastructure has observability. Product decisions don't.

We use Datadog and Grafana to visualize system state as a matter of course. But "why did we build this feature?" and "was that decision correct?" — there's no dashboard for that. No alerts. No traces.

That gap is what led me to build a hypothesis validation tool. And it turns out, SRE thinking translates surprisingly well to product development.

The Observability Gap in Product Development

The Three Pillars — Reframed

In SRE, we think about observability through three pillars:

Pillar	In Infrastructure	In Product Development
Metrics	CPU, memory, response time	KPIs, usage rates, conversion
Logs	Access logs, error logs	Decision logs, validation results
Traces	Request processing paths	Hypothesis → Experiment → Learning → Next Action

In infrastructure, we never accept "we don't know what's happening" as a state. We set up alerts, build dashboards, write runbooks for incident response.

But in product development? "Why we built this feature" is lost within six months. Code preserves what was built, but never why it was built.

ADRs for Architecture, But What About Product Decisions?

If you're an engineer, you might use ADRs (Architecture Decision Records) to document technical choices:

# ADR-001: Use Supabase for Database

## Status: Accepted

## Context
Minimize backend costs for a side project

## Decision
Adopt Supabase (PostgreSQL + Auth + RLS)

## Rationale
- More SQL flexibility than Firebase
- RLS handles security at the database layer
- Free tier is sufficient for indie projects

ADRs capture technical decisions. But they don't capture "the evidence that convinced us this feature was worth building in the first place."

That's the gap. And it's exactly the kind of gap that makes an SRE uncomfortable.

3 SRE Concepts That Changed How I Build Products

1. SLOs → Validation Success Criteria

In SRE, you define SLOs (Service Level Objectives) before you set up monitoring. "99th percentile response time < 200ms" — the quantitative bar comes first.

Applied to product development, this means defining success criteria before running any experiment.

Hypothesis: "Users struggle with tracking hypothesis validation"
Success Criteria: 3 out of 5 interviewees recognize this as a problem
Method: Semi-structured interviews

This sounds obvious, but most indie hackers (myself included, before) skip it. We run experiments and then decide after the fact whether the results were "good enough." That's like deploying a service without defining SLOs and then arguing about whether the error rate is acceptable.

Define the bar first. Then measure against it.

2. Incident Response → Pivot Decisions

SRE incident response has clear escalation rules:

Sev 1: Assemble the response team immediately
Sev 2: Handle during business hours
Sev 3: Address in the next sprint

I applied the same structure to product validation results:

Validation Result	Response
Validated (high confidence)	Continue — move to implementation
Validated (low confidence)	Investigate — plan additional experiments
Invalidated	Pivot or kill — change direction or stop

The key insight: don't make pivot decisions emotionally. "I spent weeks on this hypothesis, so it must be right" is the product equivalent of ignoring alerts because you don't want to get paged. SREs respond to alerts based on rules, not feelings. Product decisions should work the same way.

I wrote in my last post about spending 3 months building a SaaS that AI made obsolete. If I'd had these rules, I would have killed it in week 3 when the early signals were already there.

3. Runbooks → Validation Playbooks

SREs document incident response procedures as runbooks. When something breaks at 3 AM, you don't want to figure out the steps from scratch.

Same principle for hypothesis validation:

## Problem Validation Playbook

### Prep
1. Review hypothesis canvas — identify core assumptions
2. Define target persona
3. Set success criteria (e.g., 3/5 recognize the problem)

### Execute
1. Pre-test interview questions with AI simulation
2. Run 5 semi-structured interviews
3. Record key findings and direct quotes

### Decide
1. Compare results against success criteria
2. Record learnings
3. Make decision: Continue / Pivot / Kill

With a runbook, you don't panic during an incident. With a validation playbook, you don't freeze when it's time to decide whether your product idea is worth pursuing.

The Career Angle: Why This Combination Is Rare

SRE engineers who think about product validation are uncommon. Product managers who think in terms of observability are also uncommon. The intersection is almost empty.

If you're an engineer considering side projects or a career shift toward product:

Your reliability thinking is an asset — you already know how to define measurable targets and respond to data
Your operational discipline transfers — runbooks, escalation rules, and blameless post-mortems all have product equivalents
Your bias toward measurement is exactly what product development needs — too many product decisions are made on vibes

The gap isn't your skills. The gap is recognizing that the mental models you already use at work apply directly to building products.

What I Do Now

I built these SRE-inspired workflows into my own validation process, and eventually into a tool called KaizenLab to keep myself honest. But the tool matters less than the mindset.

If infrastructure deserves observability, so do your product decisions.

Next time you're about to start a side project, try this: before writing any code, write a validation runbook. Define your SLOs — I mean, success criteria. Set up your "alerts" — the signals that tell you to pivot or kill.

You already know how to do this. You just haven't applied it to products yet.

Are you an engineer who's applied technical thinking to product development? Or a PM who's borrowed concepts from SRE? I'd love to hear how these worlds collide in the comments.

DEV Community