DEV Community: ParthibanRajasekaran

The Lie of the Global Average: Why Taming Complex SLIs Requires Bucketing

ParthibanRajasekaran — Wed, 03 Dec 2025 23:01:54 +0000

As an engineering manager responsible for keeping serverless, event-driven systems alive, I’ve learned to fear green dashboards.

There is no worse feeling than seeing your main status page happily report “99.9% up” while Slack is full of screenshots from customers who can’t check out, can’t pay, or can’t log in and a senior leader is asking, “Why does everything look fine if the business is clearly on fire?”

That’s the core problem:

Global availability is a vanity metric.
It smooths out the spikes.
It tells you the system is fine, but it doesn’t tell you if the business is fine.

In site reliability engineering (SRE), we like tidy global SLIs because they compress a lot of behaviour into one number. But that compression is exactly how pain gets hidden.

If you have 1,000,000 requests and 1,000 fail, the math says you’re 99.9% successful. Everyone relaxes.

But if those 1,000 failures are all POST /submit payment or POST /confirm-order, you’re not reliable. You’re losing money and trust.

Reliability is not a percentage. It is a relationship with your users. And global averages destroy that relationship.

The Antidote: Bucketing (Adding Dimensionality)

The only honest way to deal with this is to stop worshipping global averages and start segmenting your SLIs along the dimensions that actually threaten your business.

Call it bucketing, slicing, or adding dimensionality, the idea is simple:

Stop asking, “Is the site OK?”
Start asking, “Who is it broken for, and where?”

A global SLI is like a city wide traffic report:

“Traffic is moving at 40 mph on average.”

That’s a nice number, but it’s useless if I’m sitting in standstill traffic on the only bridge into the city.

A bucketed SLI is more honest:

“The highway is clear, but the bridge to downtown is blocked.”

Same city. Same “average speed.” Completely different lived reality.

In real systems, three buckets almost always matter.

Bucket A : Reads vs Writes: The Hidden Fire

Most systems are heavily skewed toward reads. In a typical API:

90–95% of traffic is GET : browsing, listing, fetching.
5–10% is POST/PUT : creating orders, payments, sign-ups, profile changes.

The trap

Imagine this:

95% of your traffic is product browsing (GET /products, GET /search).
5% is checkout and payment (POST /checkout, POST /payment).

Now a bad deploy or downstream issue makes every payment call fail.

From the user’s perspective:

Browsing looks fine.
Every attempt to give you money fails.

From the global SLI’s perspective:

5% of all requests are failing.
You’re still at 95% “availability.”
Depending on your alert thresholds, you might not even get paged.

This is how you end up in the classic “everything is green” screenshot while Support is drowning and Finance is asking why conversion just fell off a cliff.

The fix

You split the SLI:

availability_read : success rate for read-only requests.
availability_write : success rate for state changing requests.

Suddenly a total outage on the write path shows up as 0% availability for writes, not “a small blip.” You can:

Alert specifically on write failures.
Tie that bucket to more conservative error budgets.
Treat it as a higher-severity incident even if the homepage still loads.

The business impact

Reads going down is annoying. Writes going down is existential.

If people can’t browse, they might come back later.
If they can browse but can’t pay, they leave angry and they tell others.

Buckets make that difference painfully visible.

Bucket B : Mobile vs Web: The Client Reality

Web clients and mobile clients live in different universes:

Web tends to run on stable connections, with up-to-date JS, easy rollbacks.
Mobile runs on flaky 4G, old app versions, and aggressive retry logic that can turn a subtle bug into a DDoS-shaped traffic pattern.

The trap

You ship a change that:

Works fine for web checkout.
Breaks a specific flow on the iOS app hitting POST /checkout with an older payload shape.

Global metrics barely blink:

Web traffic dominates volume.
Retries hide some of the errors.
The average success rate looks “acceptably noisy.”

Meanwhile:

Your App Store rating is sliding.
Support is logging “mobile checkout broken” tickets.
Product is asking, “Why didn’t we catch this before it hit customers?”

The fix

You bucket by client type:

client_type = web
client_type = ios
client_type = android

You don’t need per-device madness. You need just enough segmentation to see when one channel is quietly dying while the others hide it.

Once you do this, you can ask:

“What is write availability for iOS for the checkout journey?”
“How does Android latency for search compare to web?”

The business impact

Now when you get paged, it’s not:

“High error rate on /checkout.”

It’s:

“Write availability for iOS clients on /checkout has dropped below SLO.”

That alert:

Tells on-call who is broken.
Points directly to where to start looking.
Stops the “blame the network” dance and focuses everyone on the right API, payload, or versioning issue.

Bucket C : Premium vs Standard: The Revenue Bucket

This is where engineering stops talking about “traffic” and starts talking about revenue.

Not all users are equal:

A single enterprise customer might be worth more than 10,000 free-tier users.
A VIP credit-card holder being unable to transact has a different blast radius than a trial user who can’t update a profile picture.

The trap

Without buckets, a retry storm from:

50,000 free users hitting a low-value feature

can drown out:

50 failures from your top 10 enterprise customers hitting your highest-margin features.

On a global SLI chart, it all collapses into one line. If the total error rate is “within budget,” you might be technically winning while strategically losing.

The fix

You tag traffic with user tier:

user_tier = premium / enterprise
user_tier = standard / free

Then you define separate SLOs:

Premium checkout success: 99.99%
Standard/free checkout success: 99.5%

Same system. Different promises.

And more importantly: different reactions when the budget burns. If premium write availability wobbles, you slow down changes or roll back quickly. If free tier browsing is a bit flaky but within tolerance, you don’t knee-jerk into a full change freeze.

The business impact

This is how you align:

Engineering anxiety with Where the company actually makes money.

It also changes roadmap debates. Once you show Product and Sales a graph labeled “Enterprise checkout success”, nobody argues that a free-tier bug and an enterprise bug are equivalent priorities anymore.

The Senior Caveat: Bucketing, SLOs, and the Cost of Cardinality

At this point, every experienced engineer is thinking:

“If I bucket by method, client, region, and tier… won’t this explode my Prometheus / Datadog bill?”

Yes. It can.

Bucketing adds cardinality. If you:

Put user_id or request_id into labels, or
Try to slice by every endpoint and every dimension

you will:

Melt your metrics backend, or
Hand Finance a monitoring bill that looks like a production incident.

The art of bucketing is restraint:

You don’t need a bucket for every variable.
You need a bucket for every distinct failure domain you care about.

In practice, that means:

Be ruthless about which labels are allowed on “SLO grade” metrics.
Keep high cardinality detail (like raw logs) in cheaper systems, and only promote a small set of aggregated counters/gauges as SLIs.
Review your metric schema regularly with both SREs and Finance in the room.

If bucketing is free, you’re doing it wrong either technically (too coarse) or financially (too expensive). A senior team treats cardinality as part of the reliability design, not an afterthought.

Conclusion: Silence Is Golden

When you bucket SLIs along the fault lines that actually matter, reads vs writes, mobile vs web, premium vs standard, something interesting happens:

The noise stops.
“High error rate” pages that send you spelunking through logs get replaced with alerts like: > “Write availability for iOS premium users in checkout has dropped below SLO.”

That’s an honest alert. It tells you:

Who is affected.
Where to look.
How worried the business should be.

It’s also an alert your engineers will respect. When every page comes with a clear, specific blast radius, the pager stops feeling like a random punishment generator and starts feeling like what it should have been all along: a surgical tool.

As a manager, that’s the difference between a team that dreads the pager and one that trusts it. Honest alerts are a retention tool as much as a reliability tool.

Most of these opinions come from incidents I’d rather not repeat, not from slides. Revisiting Google’s “SRE: Measuring and Managing Reliability” course recently mainly gave me sharper language for what the on-call rotation already knew.

Reliability isn’t about the nines you show the board; it’s about the promises you keep to your users and about whether your teams can keep those promises without burning out.

So the next time your dashboard says 99.9% green, don’t congratulate yourself.

Ask a harder question:

“Who is stuck on the bridge?”

Because if you don’t know, your global average is lying to you.

From Zero to Local AI in 10 Minutes With Ollama + Python

ParthibanRajasekaran — Mon, 10 Nov 2025 11:52:54 +0000

Why Ollama (and why now)?

If you want production‑like experiments without cloud keys or per‑call fees, Ollama gives you a local‑first developer path:

Zero friction: install once; pull models on demand; everything runs on localhost by default.
One API, two runtimes: the same API works for local and (optional) cloud models, so you can start on your laptop and scale later with minimal code changes.
Batteries included: simple CLI (ollama run, ollama pull), a clean REST API, an official Python client, embeddings, and vision support.
Repeatability: a Modelfile (think: Dockerfile for models) captures system prompts and parameters so teams get the same behavior.

What’s new in late 2025 (at a glance)

Cloud models (preview): run larger models on managed GPUs with the same API surface; develop locally, scale in the cloud without code changes.
OpenAI‑compatible endpoints: point OpenAI SDKs at Ollama (/v1) for easy migration and local testing.
Windows desktop app: official GUI for Windows users; drag‑and‑drop, multimodal inputs, and background service management.
Safety/quality updates: recent safety‑classification models and runtime optimizations (e.g., flash‑attention toggles in select backends) to improve performance.

How Ollama works (architecture in 90 seconds)

Runtime: a lightweight server listens on localhost:11434 and exposes REST endpoints for chat, generate, and embeddings. Responses stream token‑by‑token.
Model format (GGUF): models are packaged in quantized .gguf binaries for efficient CPU/GPU inference and fast memory‑mapped loading.
Inference engine: built on the llama.cpp family of kernels with GPU offload via Metal (Apple Silicon), CUDA (NVIDIA), and others; choose quantization (Q4/Q5/…) for your hardware.
Configuration: Modelfile pins base model, system prompt, parameters, adapters (LoRA), and optional templates—so your team’s runs are reproducible.

Install in 60 seconds

macOS / Windows / Linux

Download and install Ollama from the official site (choose your OS).
Open a terminal and verify the service is running on port 11434:

ollama --version
curl http://localhost:11434/api/version

Apple Silicon uses Metal by default. On Windows/Linux with NVIDIA, make sure your GPU drivers/CUDA are set up to accelerate larger models. CPU‑only also works for smaller models.

First run (no Python yet)

Pull a model and chat in the terminal:

ollama pull llama3.1:8b

ollama run llama3.1:8b

Three ways to call Ollama from your app

1) REST (works from any language)

Base URL (local): http://localhost:11434/api
Example (chat):

curl http://localhost:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "user", "content": "Give me 3 tips for writing clean Python"}
    ],
    "stream": false
  }'

Common endpoints you’ll use:

/api/chat – chat format (messages with roles)

/api/generate – simple prompt in/out (one‑shot)

/api/embeddings – generate vectors for search/RAG

/api/pull, /api/list, /api/show, /api/delete – model management

For streaming, send "stream": true and read chunks until the server closes the connection.

2) Python SDK (official)

Install:

pip install ollama

Chat:

from ollama import chat

resp = chat(model='llama3.1:8b', messages=[
    {'role': 'user', 'content': 'Give me 3 beginner Python tips.'}
])
print(resp['message']['content'])

Vision (image → text):

from ollama import chat

resp = chat(
  model='llama3.2-vision:11b',
  messages=[{
    'role': 'user',
    'content': 'What does this receipt say?',
    'images': ['receipt.jpg']  # file path or URL
  }]
)
print(resp['message']['content'])

Embeddings:

from ollama import embeddings

text = "Ollama lets you run LLMs locally."
vec = embeddings(model='embeddinggemma', prompt=text)
print(len(vec['embedding']))

3) Ship repeatable configs with a Modelfile

A Modelfile captures the base model, system message, and default parameters so teammates (and CI) get identical behavior.

Modelfile:

FROM llama3.1:8b
PARAMETER temperature 0.6
SYSTEM """
You are a concise AI tutor for Python beginners. Prefer runnable examples.
"""

Build & run:

ollama create py-tutor -f Modelfile
ollama run py-tutor

Your first tiny local RAG (no frameworks required)

This script indexes a handful of .txt files and answers questions using nearest‑neighbor search on embeddings.

import glob, faiss, numpy as np
from ollama import embeddings, chat

EMB = 'embeddinggemma'
LLM = 'llama3.1:8b'

# 1) Chunk a few local docs
chunks, files = [], []
for path in glob.glob('docs/*.txt'):
    text = open(path, 'r', encoding='utf-8').read()
    for i in range(0, len(text), 800):
        chunks.append(text[i:i+800])
        files.append(path)

# 2) Embed and index with FAISS (cosine)
X = np.array([embeddings(model=EMB, prompt=t)['embedding'] for t in chunks], dtype='float32')
faiss.normalize_L2(X)
index = faiss.IndexFlatIP(X.shape[1])
index.add(X)

# 3) Query → top‑k context → answer
q = "What does the onboarding checklist say about Python version?"
qv = np.array([embeddings(model=EMB, prompt=q)['embedding']], dtype='float32')
faiss.normalize_L2(qv)
D, I = index.search(qv, 5)
context = "\n\n".join(chunks[i] for i in I[0])

msg = [
  {'role': 'system', 'content': 'Answer strictly from the provided context. If unknown, say so.'},
  {'role': 'user', 'content': f'Context:\n{context}\n\nQuestion: {q}'}
]
ans = chat(model=LLM, messages=msg)['message']['content']
print(ans)

Why this pattern is useful:

Works offline; no hosted vector DB needed to begin with.
Clear upgrade path to LangChain/LlamaIndex + a proper vector store when your corpus grows.

Performance & correctness tips

Model size vs hardware: start with 7–8B models for fast iteration; scale upward once your UX is dialed in.
Quantization matters: smaller GGUFs load faster and reduce memory but can slightly degrade quality; pick the best trade‑off for your use case.
Stream responses in UI code for perceived latency; switch to non‑streaming for simple back‑office jobs.
Keepalive sessions to avoid repeated load/unload overhead in short‑lived CLIs or serverless functions.
Prompt discipline: lock a SYSTEM prompt in your Modelfile so teammates don’t accidentally regress output style in reviews.
Security: don’t expose your local API on the internet by default; if you must, add authentication and network controls.

Security hardening checklist

Bind to 127.0.0.1 or a private interface; avoid public exposure by default.
If remote access is required, front with a reverse proxy (auth + TLS), restrict by IP, and rate‑limit.
Run the service under a dedicated OS user with least privilege; separate model storage from app logs.
Watch model pulls and updates in CI; pin checksums for reproducibility.
Add basic request logging and redact prompts that may contain secrets.

Local vs Cloud: choosing the right runtime

Local: best for privacy, prototyping, and offline work; your laptop/GPU sets the ceiling.
Ollama Cloud: same API surface, larger models, and no local hardware management; useful for workloads that outgrow your machine.

You can develop locally and deploy to cloud without rewriting client code just point your client at the different base URL.

Common pitfalls (and quick fixes)

11434 is taken: change the port via the OLLAMA_HOST or client host parameter.
CORS in browser apps: frontends that call Ollama directly from the browser will hit CORS; proxy through your backend.
"Model not found": did you ollama pull ? Use ollama list to confirm.
Out‑of‑memory: try a smaller quantization (e.g., Q4 instead of Q6) or a smaller parameter count.
Templates surprise you: inspect with ollama show ; override with your own Modelfile.

AI Workflow Integration: From Models to Methods, How Engineering Teams Will Change

ParthibanRajasekaran — Tue, 04 Nov 2025 14:33:24 +0000

It’s feedback season. Calendars fill with “quick chats,” dashboards glow, and we try to stitch a quarter’s worth of activity into a clean story about impact. But 2025 brings a deeper shift: AI workflow integration. AI is moving from “use a chatbot on the side” to living inside the handoffs

plan → code, code → ship, ship → learn

where traceability, decisions, and outcomes actually live.

Thesis: the next 12 months aren’t about picking the “best” model; they’re about embedding AI inside the workflow so the path from intent to impact is faster and more visible. When AI helps at each hop, you don’t need to perform productivity you can prove it.

Feedback Loops Are the New Feature Flags

We already have the artifacts: OKRs, roadmaps, PRs, tickets, incidents. What we lack is low-friction stitching. The pattern that works is AI in the handoff, producing useful, structured outputs as work happens:

Draft AI-generated PR descriptions that map changes to OKRs (not just file diffs).
Summarize test failures into actionable hypotheses (owners, suspected root causes, next steps).
Generate release notes linked to customer outcomes, not just commit logs.
Prep retro packets by querying "planned vs. shipped vs. impact," with links to issues and incidents.

The competitive edge isn't headcount, it's how well your workflow speaks AI.

Google AI Studio: Ship the Workflow, Not a Demo

Google AI Studio lowers the activation energy to put Gemini API integration where work lives. You can experiment with prompts, then click Get code to export ready-to-run snippets and drop them into backends, CLIs, or UIs, no bespoke glue just to get started.

What makes this useful (not just shiny):

Drop-in Gemini setup. Create and manage API keys in AI Studio; wire them into your app with official SDKs.
Experiment → export code. Tune prompts and parameters in the browser, then export the working snippet to your stack.
APIs for application embedding. The same endpoints you prototyped against become production ready integrations in services and tools.
Collaborative logs & datasets. New Logs and Datasets let teams enable request logging without code changes, inspect interactions, and export datasets for regression-style prompt reviews, so AI changes are reviewable, not mysterious.
Fine tuning reality check. The public Gemini API currently does not offer fine tuning; Google says it plans to bring it back.

Case Study Pattern: EM-AI (Your Engineering Manager, in the Loop)

If you prefer to see ideas running, explore on GitHub a small but opinionated app that puts Gemini behind a thin service layer to assist everyday engineering work (React + Vite + TypeScript). It's a pattern you can lift into your stack.

What it does (scannable, keyword-rich)

Chat Bot & Live Conversation for coaching and "first-draft" tasks.
Daily Planner & Weekly Summary that encourage realistic planning and reflective learning.
OKR Manager to draft and refine objectives, linking tasks to outcomes.
Voice Assistant for hands-free updates.

How it's put together (architecture decisions)

Service boundary for AI calls via services/geminiService.ts (centralized prompts, easy model swaps, guardrails).
Composable UX with modular components: DailyPlanner.tsx, OkrManager.tsx, WeeklySummary.tsx, etc.
Speech & audio utilities (useSpeechRecognition.ts, utils/audio.ts, utils/speech.ts) isolate I/O from business logic.
Security hygiene: .env.example, no committed keys, CI typech ecks, CodeQL/Dependabot.

Try the pattern: fork a single component and wire it to your most painful handoff (PR → release notes, flaky test → actionable summary). Check out the repo and borrow what fits.

Guardrails: How You Go Fast Safely

Key management & permissions. Keep secrets in envs and rotate; AI Studio explains API key creation and setup.
Observable assistants. Turn on Logs and Datasets to compare outputs before/after prompt edits; treat prompt diffs like code diffs.
Tuning strategy. If base prompts fall short, evaluate Vertex AI supervised fine-tuning for governed custom behavior.

FAQ (snippet-friendly)

Q: How do I integrate Gemini into a service?
A: Prototype in Google AI Studio, click Get code, add your API key, then wrap calls behind a service module (like geminiService.ts) to centralize prompts, error handling, and telemetry. Start with one assistant at a painful handoff (e.g., PR summaries).

Q: Can AI generate PR summaries that map changes to OKRs?
A: Yes. Use repo context (diffs, issue/OKR links) and a structured prompt that outputs objective, change summary, risk, owners. Embed it in PR creation so traceability is automatic, not after-the-fact.

Vision: Lean Scaling with Smarter Tools

While headlines debate model leaderboards, the real story is inside teams that scale lean,by augmenting the people they have with AI assistants that improve the quality of the trace. The winners won't just choose the right model; they'll standardize AI-first ways of working across delivery loops.

Start small. Embed AI workflow integration where work happens. Let the signal rise. Next feedback season, you'll point to a traceable audit trail, not a story.

Coding by Vibe, by Tests, or by Spec Which Hat Are You Wearing?

ParthibanRajasekaran — Sun, 26 Oct 2025 20:22:41 +0000

We build the same FastAPI endpoint three ways and compare trade-offs with code

Have you ever shipped something that “felt right”… and a week later you’re untangling spaghetti?
Have you ever written tests after the code and realised they’re just confirming happy paths?
Have you ever had a perfectly fine implementation… that didn’t match what Product actually meant?

If any of that sounds familiar, this post is for you. We’ll compare three very real ways engineers work in 2025:

Vibe coding (code-first, intuition-driven)
Test-Driven Development (TDD) (red → green → refactor)
Spec-Driven Development (start with an executable spec; code follows)

To keep it concrete, we’ll build the same tiny feature three ways: a POST /price endpoint that adds VAT and applies a discount code. You’ll see what each mode feels like, where it shines, where it bites and you can open the matching folders in the repo (vibe/, tdd/, spec_driven/) to run the examples and go deeper.

A 60-second analogy you won’t forget

Vibe coding is like cooking by feel. You throw in garlic “until it smells right.” It’s brilliant for exploring flavours fast but without a recipe, repeating success is hard.

TDD is cooking with a scale. You weigh, taste, adjust, then refactor the plating. Fewer surprises; you can serve 20 plates consistently.

Spec-driven is a recipe card everyone agrees on before you start: ingredients, steps, expected taste. The sous-chef, the server, and you are aligned.

We’ll use all three to make the same dish so the differences pop.

The Problem: a tiny price API

We want a single endpoint:

POST /price

{ "amount": 100.0, "vat_pct": 20, "code": "WELCOME10" } -> { "final": 108.0 }

Rule: VAT is applied first, then a 10% discount for WELCOME10. Round to 2 decimals.

1) Vibe Coding “I’ll just build it”

Have you ever been in flow, built the endpoint in one sitting, and thought: tests can come later? That’s vibe coding. It’s excellent for prototypes and spikes. Risk: edge cases get discovered by users or future-you.

What it feels like, step by step

Spin up a FastAPI app.
Implement the “obvious” math.
Add basic tests after it works.
Learn from behaviour, refactor in place.

Minimal snippet (from vibe/app.py)

`# POST /price -> {"final": number}
from fastapi import FastAPI
from pydantic import BaseModel
from .calculator import final_price

app = FastAPI(title="vibe-pricing")

class PriceIn(BaseModel):
amount: float
vat_pct: float
code: str | None = None

@app.post("/price")
def price(body: PriceIn):
return {"final": final_price(body.amount, body.vat_pct, body.code)}`

Reality check (in vibe/tests/)
You’ll likely add tests that match what you already wrote:

def test_vat_then_discount(): assert final_price(100, 20, "WELCOME10") == 108.0

When it shines

Spikes, demos, “feel the API” explorations
Rapid iteration with minimal ceremony
Watch-outs

Hidden edge cases (rounding, nulls, unknown codes)
Harder to reproduce success; tests risk becoming rubber stamps

2) TDD “Red → Green → Refactor”

Have you ever wished your code told you when you broke something? TDD is that feedback loop you write a failing test (red), make it pass (green), then clean it up (refactor).

What it feels like, step by step

Write a failing test that defines behaviour.
Write the tiniest code to pass it.
Refactor confidently (tests stay green).
Repeat, growing behaviour by behaviour.

Start with a test (from tdd/tests/test_calculator.py)

`from dev_modes.tdd.calculator import price_with_vat, final_price

def test_vat_rounds_to_cents():
assert price_with_vat(19.99, 8.5) == 21.64

def test_final_price_vat_then_discount():
assert final_price(100, 20, "WELCOME10") == 108.0`

Then the code (from tdd/calculator.py)

`def price_with_vat(amount: float, vat_pct: float) -> float:
return round(amount * (1 + vat_pct / 100), 2)

def apply_discount(total: float, code: str | None) -> float:
if not code: return total
if code.lower() == "welcome10": return round(total * 0.9, 2)
return total

def final_price(amount: float, vat_pct: float, code: str | None) -> float:
with_vat = price_with_vat(amount, vat_pct)
return apply_discount(with_vat, code)`

When it shines

Core business logic and refactor-heavy domains
Teams that value predictability and regression safety

Watch-outs

Over-mocking and “testing implementation not behaviour”
You need discipline to resist jumping straight to code

3) Spec-Driven “Agree on the recipe, then cook”

Have you ever built the right thing the wrong way or the wrong thing perfectly? Spec-driven starts with a human-readable, executable spec so everyone agrees on “what good looks like” before code.

What it feels like, step by step

Write the spec in plain language (e.g., Gherkin).
Derive tests from the spec.
Implement the code to satisfy the spec.
Treat the spec as living documentation.

*Spec first (from spec_driven/specs/price.feature)
*
Feature: Checkout price Scenario: VAT then discount Given a base amount of 100 And VAT is 20 percent When I apply the code "WELCOME10" Then the final price should be 108.0

Test shaped by the spec (from spec_driven/tests/test_from_spec.py)

`from fastapi.testclient import TestClient
from dev_modes.spec_driven.app import app

client = TestClient(app)
def test_vat_then_discount_from_spec():
r = client.post("/price", json={"amount": 100, "vat_pct": 20, "code": "WELCOME10"})
assert r.status_code == 200
assert r.json() == {"final": 108.0}`

When it shines

Cross-team features (Product, QA, Eng) and compliance-heavy work

API contracts where misunderstandings are expensive

Watch-outs

A little slower to start (but faster to align)

Specs need ownership or they rot

How to choose (today)

Ask yourself three questions before you start:

Is this exploration or execution?
If you’re exploring unknowns, start with vibe for speed then tighten with TDD or a spec once the shape stabilises.

Is this change cross-team or customer-visible?
If multiple stakeholders must agree, spec-driven pays for itself in clarity.

Will we refactor this a lot?
If yes, TDD buys you courage and speed with safety.

Show me the code (and how to run it)

All three implementations build the same POST /price:

vibe/ : code first, tests after
tdd/ : tests first, then code
spec_driven/ : feature spec → tests → code

Run any folder with your FastAPI dev server and tests. For example:

# Vibe uvicorn dev_modes.vibe.app:app --reload pytest vibe/tests -q
# TDD uvicorn dev_modes.tdd.app:app --reload pytest tdd/tests -q
# Spec-driven uvicorn dev_modes.spec_driven.app:app --reload pytest spec_driven/tests -q

Quick “save yourself later” checklist

Vibe? Jot down two edge cases you didn’t code yet. Add a micro-test for each before merging.

TDD? Re-read your tests: do they describe behaviour, or mirror implementation? Trim mocks.

Spec-driven? Keep the spec crisp and owned. If Product changes their mind, change the spec first.

Final thought

All three modes are tools, not tribes. Great engineers switch hats on purpose. Explore with vibe, harden with TDD, align with specs and your future self (and your directors) will thank you.