I still remember the exact moment.
It was 11pm. My team had been sprinting for 3 weeks building
a new payments feature. Deadline tomorrow. I handed the final
implementation task to our AI coding agent and said:
"Finish this up. Make it production ready."
I went to make coffee.
I came back to a deployed codebase with:
- No input validation on the payment amount field
- The API key hardcoded directly in the source file
- Zero tests
- No rollback procedure
- User card numbers being logged to the console
-In 4 minutes, the AI had built something that would have
destroyed us if it had reached real users.
The worst part? The code looked fine. Clean functions.
Good variable names. It even had comments.
It just had none of the things that actually matter in production.
The thing nobody talks about
We spend so much time talking about what AI agents can do.
We don't talk enough about what they skip.
Here's what I've watched AI coding agents skip — over and over —
across my own projects and teams I've talked to:
They skip writing specs.
"The task is obvious."
They skip writing tests.
"I'll add them later."
They skip security checks.
"It's an internal API."
They deploy without a rollback plan.
"It's a small change."
They ship ML models without safety evaluation.
"The eval numbers look good."
They build data pipelines without quality gates.
"The source data is reliable."
And the worst part is — they sound confident while doing it.
No hesitation. No "are you sure?" Just fast, clean, wrong.
What I realized
The problem isn't that AI agents are bad at writing code.
They're genuinely incredible at writing code.
The problem is they have no discipline.
A senior engineer doesn't just know how to write code.
They know when to stop and write a spec first.
They know which shortcuts will cost you three weeks of debugging.
They know that "I'll add tests later" never happens.
That discipline — built from years of being burned —
is exactly what AI agents are missing.
You can't just tell an agent "be a senior engineer."
You have to show them the exact steps a senior engineer takes.
The exact things they verify before calling something done.
The exact excuses they refuse to accept from themselves.
So I built something
After that 11pm incident, I spent the next month building
AI Agent Skills — an open source framework of 40+ structured
workflow files that AI coding agents load before doing work.
Not prompts. Not vague instructions.
Structured skills — each one encoding exactly how a senior
engineer approaches a specific task:
- What to do first
- What to verify at each step
- What evidence to produce before calling it done
- What excuses to refuse
And — this is the part I'm most proud of — each skill has an
anti-rationalization section.
These are the exact shortcuts agents try to take, written out
explicitly, with a direct rebuttal for each one.
For example, the spec-driven-development skill includes:
"This feature is obvious — I don't need to write it down"
If it's obvious, the spec takes 10 minutes. If it's not obvious,
the spec saves days. Either way: write it."We'll iterate quickly — the spec will be wrong anyway"
Iterating on code without a spec means iterating in circles.
The agent sees these. It can't pretend the shortcut is reasonable.
What's covered
I wanted this to be the most complete skill framework ever built
for AI coding agents. So I went far beyond what others cover:
Standard engineering:
Spec writing · Planning · TDD · API design · Code review ·
Security · Performance · Git workflow · CI/CD · Documentation
AI/ML Engineering (nobody else covers this)
Training pipelines · Evaluation harnesses · Safety evaluation ·
Prompt injection testing · RAG system design · LLM evaluation ·
Agent orchestration · Distribution shift monitoring
Data Engineering (nobody else covers this)
Data contracts · Pipeline quality gates · Lineage tracking ·
Late-arriving data · Dead letter queues · GDPR compliance
Mobile Development (nobody else covers this)
Offline-first design · Main thread discipline · Battery efficiency ·
Crash rate targets · App store compliance
Incident Response (nobody else covers this)
5-phase structured response · Blameless post-mortems ·
Status page communication · Rollback-first philosophy
Plus 8 specialist agent personas — dedicated agents for:
Code Review · Security Auditing · Test Engineering ·
Performance · ML Engineering · Data Engineering ·
DevOps · Technical Writing
The moment it clicked for me
The first time I ran the framework on a real project, I gave
the agent a task and said /spec first.
It stopped.
It wrote a spec.
It listed functional requirements. It identified open questions.
It asked me to confirm before writing a single line of code.
I sat there staring at the screen thinking:
-This is what I actually wanted the whole time.*
Not faster code. Smarter code.
Code written by an agent that behaves like it's been burned before.
Like it has something to lose.
It's free. It's open source. It's yours.
I built this because I needed it. I'm sharing it because
I know I'm not the only one who's had the 11pm moment.
GitHub:
Works with Claude Code, Cursor, Gemini CLI, GitHub Copilot,
Windsurf, Kiro, OpenCode — and any agent that reads instructions.
MIT license. No strings.
One ask
If this resonates with you — if you've had your own version
of my 11pm moment — share this article.
There are millions of developers right now trusting AI agents
with production code. Most of them haven't been burned yet.
This framework is for the ones who want to learn from my mistake
instead of making their own.
⭐ Star it if it helps you.
🤝 Contribute if you have a skill to add.
💬 Comment if you've had your own AI disaster story.
I read every comment.
— Built at 11pm, after coffee, with lessons learned the hard way.
Top comments (0)