Julián Deangelis published a piece this week that hit 194k+ views in days. The argument: AI coding agents don't fail because the model is weak. The...
For further actions, you may consider blocking this person and/or reporting abuse
Sounds useful !
Try it on your next feature and let me know what the assumptions surface.
Thanks man - bookmarked this and archived it in my "AI - All The Stuff I Should Really Check Out" folder - will let you know when I get around to it!
That folder name is doing a lot of work. The assumptions list is the part worth seeing in practice. it's not obvious until you watch the skill surface a decision you would have left implicit. When you get around to it, run it on something you're actually building, not a toy example. 👊👊
Gotcha - toy examples tend to be artificial, or simplistic ...
Exactly.
If AI agents can even perform spec-driven development as well, I wonder how the role of developers will shift...
The shift is already happening - away from writing code toward writing the contract that defines what correct looks like. The spec is the thing agents can't generate without you because it requires knowing what your system is supposed to do, what your business constraints are, what failed last time. The agent can implement from a good spec. It can't produce the spec without the understanding that comes from having built the system and watched it fail.
The role isn't disappearing. It's moving upstream.
Just used this as my first skill to add a new 'birthday' feature to my telegrambot. Very usefull. I'm coding on my phone in a termux-ubuntu environment with claude code and codex, so its very easy to miss the unknown assumptions and end up in a long conversation to fix the problem. This fixes the problem up front :)
Even with a small feature like this it helps, for example it shows the commands it wanted me to type, but i want more of a free format - Fixed. Thanks for this!
Termux on a phone is exactly the environment where the assumption problem bites hardest . The iteration cost is high enough that one wrong assumption can cost you an hour. Glad it caught the commands-vs-free-format issue before you were mid-conversation.
That free format flag is good feedback on the skill itself. Defaulting to explicit commands made sense for the use case I built it around, but it's an assumption the skill makes without asking which is ironic given what the skill is supposed to do. Worth adding a prompt at the start that asks about output format preference before generating the spec.
What's the birthday feature doing?
It gives updates in the morning with a birthday reminder. Can create overview, add or remove birthdays. And sync with calender. Plan to add a personalized card feature.
Calendar sync on a phone in Termux is genuinely impressive and a personalized card feature on top of that is the kind of thing that makes a utility actually feel like it was built for someone.
This is a really interesting shift from “prompting” to actually encoding thinking into reusable skills. Instead of repeating instructions every time, you’re building a system that compounds knowledge—exactly what modern AI workflows need. It also reinforces how critical clear specs are, since AI performs best when the intent is precise and structured
"Encoding thinking" is the right frame and it's why the output format matters as much as the prompt. A spec that marks its assumptions explicitly with [ASSUMPTION: ...] forces the thinking to be visible, not just the conclusion. The AI didn't decide those were assumptions. You did. That's the difference between a reusable artifact and a one-time answer.
The key insight here is brilliant: Correcting a draft is faster than answering questions cold. This is exactly why Q&A-first tools fail they assume you know what you don't know. By generating the spec first and flagging assumptions inline, you're making the unknown unknowns visible. That's a fundamentally better UX for spec writing. The architecture assumption surfacing in your example proves the point perfectly.
"Unknown unknowns" is exactly the framing. Q&A assumes you can enumerate the gaps before you've seen the spec which is precisely when you can't. The generate-first approach inverts that: produce the draft, let the assumptions surface, then correct only what's wrong.
The architecture assumption in the Foundation example was the one that would have hit hardest mid-implementation. Cloudflare Workers can't read local filesystems - obvious in retrospect, invisible before the spec named it. That's the category that costs the most: not bugs you can see in the diff, but constraints you didn't know applied until the agent was already 3 hours into the wrong approach...
The Q&A tools that fail are optimizing for first-draft cleanliness. spec-writer optimizes for assumption visibility. Those are different goals and only one of them scales with complexity...
Great project, congrat