I've shipped three AI-powered developer tools to production this year.
Two of them shouldn't have used AI at all.
Let me explain why that matters and how to avoid making the same mistake.
The Question Nobody Asks
Everyone's asking: How can I use AI for this?
The better question is: Should I?
Because here's what I learned the hard way:
AI solves a very specific class of problems.
And most of your problems aren't in that class.
What Happened When I Built for SRE
Last month, I started building an AI system for SRE.
The idea wasn’t to generate text.
It was to simulate real incident response.
So I built an environment where:
- systems break
- signals appear (logs, metrics)
- actions change the state
- wrong decisions are penalized
Not what would you do?
But:
What happens when you actually act?
What I Realized Quickly
AI looks good when it explains problems.
It struggles when it has to:
- decide under uncertainty
- take the correct sequence of actions
- handle multi-step failures
In SRE, being almost right is still wrong.
Where Systems Break
The hardest part wasn’t generation.
It was:
- choosing the right action
- in the right order
- based on incomplete signals
That’s where most AI systems fail.
Not in demos.
In decisions.
The Lesson
SRE made one thing clear:
AI is useful when it supports decisions.
Not when it replaces them.
New Rule
If your system requires:
consistent, correct decisions under pressure
Then AI alone is not enough.
You need:
- structure
- constraints
- validation
The Pattern I Started Seeing
After that failure, I looked at every AI tool I'd built or evaluated.
I found a pattern in what actually worked:
AI works when the problem has high variance inputs and acceptable variance in outputs.
Let me break that down.
High Variance Inputs
This means: the problem receives unpredictable, unstructured, or creative inputs.
Examples that fit:
- User queries in natural language
- Bug reports written by non-technical users
- Code snippets in any language/framework
- API documentation across different vendors
Examples that don't:
- Structured database queries
- Configuration files with known schemas
- Metrics from monitoring tools
- Git commit hashes
If your input is already structured and predictable, you don't need AI. You need a parser.
Acceptable Variance in Outputs
This means: the user can tolerate (and even expects) some variation in the response.
Examples that fit:
- Code suggestions (developer reviews before accepting)
- Draft responses to support tickets (human edits before sending)
- Initial test case generation (QA refines coverage)
- Summarizing long error logs (engineer investigates further)
Examples that don't:
- Deploying to production
- Merging pull requests
- Granting permissions
- Processing payments
If the output must be deterministic and correct 100% of the time, AI is the wrong tool.
You need rules, not models.
The Real Litmus Test
Here's the framework I use now before writing any AI code:
Can I solve this with:
- A regex?
- A state machine?
- A database query?
- A rules engine?
If yes → don't use AI.
Only use AI when:
- The problem is genuinely ambiguous
- Deterministic code would require thousands of edge cases
- Human judgment is currently the only solution
- Imperfect answers are acceptable
Where AI Actually Belongs in Developer Tooling
After building systems that worked and failed, here's what I've seen succeed:
Code Search & Navigation
Why it works:
- Developers search using imprecise natural language
- Codebase context is massive and varied
- "Close enough" results are useful
Example:
"Find where we handle rate limiting for the API"
Traditional search fails because:
- We might call it "throttling" in some files
- Implementation is split across middleware and handlers
- No single keyword matches everything
AI search understands intent.
Error Explanation & Debugging Hints
Why it works:
- Error messages are inconsistent across languages/frameworks
- Developers need context, not just stack traces
- Suggested fixes don't auto-execute
Example:
NullPointerException at line 47
AI can correlate:
- Recent code changes
- Similar past issues
- Common patterns in that file
It doesn't fix it. It points you in the right direction.
Test Case Generation (First Draft)
Why it works:
- Writing tests is high-effort, low-creativity work
- Generated tests are always reviewed
- Edge cases emerge through iteration
Example:
Given a function, generate initial unit tests covering:
- Happy path
- Null inputs
- Boundary conditions
Developer refines from there.
Automated Code Review
Why it fails:
- Context requires understanding team conventions
- False positives erode trust
- Deterministic linters already catch syntax issues
Automatic Refactoring
Why it fails:
- Breaking changes require 100% accuracy
- Semantic meaning must be preserved exactly
- One mistake ships to production
Auto-Generated API Clients
Why it fails:
- OpenAPI specs already exist (structured input)
- Code generation tools are deterministic
- No ambiguity to resolve
The Mistake I See Most Often
Developers use AI because it's impressive.
Not because it's the right tool.
I've done this. We all have.
You see a cool demo and think: "I could use that for..."
But here's what actually happens:
- You bolt AI onto a problem that doesn't need it
- It works 90% of the time
- The 10% failure rate is unpredictable
- You spend more time handling edge cases than you saved
- You rebuild it without AI
Save yourself the cycle.
Start with the simplest solution that could work.
How I Decide Now
When someone asks me to build an AI feature, I ask:
"What happens if this gives the wrong answer?"
If the answer is:
- The user reviews and corrects it → Maybe AI
- We waste some time → Maybe AI
- We lose customer trust → Not AI
- We break production → Definitely not AI
- Nothing, it's just slower → Definitely not AI
The Problems Actually Worth Solving
After shipping AI to production, here's what I've learned:
Good AI problems share these traits:
- Ambiguity is inherent – The problem can't be reduced to rules
- Human-in-the-loop is natural – Someone reviews the output anyway
- Value comes from speed, not perfection – 80% solution in 5 seconds beats 100% solution in 5 hours
- The alternative is hiring more people – You're augmenting human judgment, not replacing deterministic code
For developer tooling specifically:
The sweet spot is: Tasks developers already do manually that require understanding context but not making critical decisions.
Examples:
- Writing boilerplate tests
- Searching codebases semantically
- Explaining unfamiliar error messages
- Generating first-draft documentation
- Suggesting variable names
Not:
- Deploying code
- Approving changes
- Granting access
- Modifying production configs
What I'm Building Differently Now
Instead of starting with What AI can do, I start with:
What are developers doing repeatedly that's:
- Mentally tedious (not challenging, just annoying)
- Context-heavy (requires reading lots of code)
- Non-critical (mistakes are cheap)"
Then I ask:
Could a junior developer do this after reading the context?
If yes → AI might help.
If no → I'm trying to automate judgment, and that won't work.
The Hard Truth
Most problems don't need AI.
They need:
- Better documentation
- Clearer error messages
- Simpler abstractions
- Fewer edge cases
AI feels like progress because it's new.
But progress is solving the problem correctly, not impressively.
A Practical Exercise
If you're reading this and thinking about an AI feature, try this:
- Write down the problem
- Describe the input (is it structured or chaotic?)
- Describe the acceptable output (is variance okay?)
- Write the deterministic solution (if you can)
If step 4 takes less than 100 lines of code → you don't need AI.
If step 4 is impossible → AI might be the right tool.
What I'm Doing Tomorrow
I'm going to break down something most engineers skip:
How to actually structure an AI system once you've confirmed the problem is worth solving.
Because the architecture decisions you make early will determine whether your system is:
- Reliable or brittle
- Maintainable or a black box
- Scalable or a one-off hack
We'll cover:
- Input validation (most failures happen here)
- Prompt orchestration (not just a single call)
- Output schemas (structured responses are non-negotiable)
- Fallback strategies (when AI doesn't know)
Final Thought
We already have the recipes for AI systems.
RAG. Agents. Workflows. Fine-tuning.
But having a recipe doesn’t mean you should cook that dish.
The real skill isn’t using AI.
It’s knowing when not to.
This is Day 1 of documenting how I think about building AI systems in production—
what works, what breaks, and where most approaches fail under real-world pressure.
If you’re working on similar systems, I’m interested in how you’re approaching it—
especially where things didn’t go as expected.
For context, I’ve been exploring related ideas around AI vs AGI and system design here:
Top comments (0)