Everyone's sharing their skill libraries right now. "Here are my 20 custom slash commands." "Check out my prompt template collection." "This skill saves me 2 hours a day."
I use skills too. I have about a dozen. They handle cover letters, content pipelines, code review, commit messages. Repeatable workflows where the input and output are predictable.
They cover maybe 10% of what my AI system actually does.
The other 90% is the part nobody shares on social media because it's ugly. It's API integrations that break when headers change. It's state management between sessions. It's error handling for when the third-party service returns garbage. It's monitoring that pages you at 6 AM because a cron failed. It's human-in-the-loop workflows where the AI proposes and you approve before anything touches production.
Skills can't solve this. Every client, every codebase, every problem has different infrastructure underneath. A skill is a template. The work is everything the template doesn't cover.
What Skills Actually Are
A skill is a saved prompt with some structure. Input goes in, the agent follows instructions, output comes out. It works when the task is the same shape every time.
"Generate a cover letter from this job posting." Same structure, different content. Perfect skill.
"Debug why the webhook stopped firing after the API provider changed their auth flow." No skill for that. Every instance is different. The agent needs to read logs, trace requests, understand the specific integration, and propose a fix that accounts for your deployment setup. That's infrastructure knowledge, not a prompt template.
The 90% Nobody Demos
Here's what actually keeps my system running day to day.
A server process that syncs data from multiple APIs, caches it locally, and exposes it to agents through a unified interface. When an API changes its response format, I fix the parser. No skill for that.
Scheduled jobs that run without any agent session. They pull data, generate reports, send notifications, and alert me when something fails. The agent isn't even involved. It's just cron, a script, and an alert channel.
Approval workflows where the AI researches options, presents them with rationale, and waits for a human decision before executing. The approval mechanism is buttons in a chat app. The execution layer calls APIs to star repositories, follow users, post comments. The plumbing between "AI suggested it" and "it actually happened" is custom for every use case.
State that persists between sessions. Not agent memory. Infrastructure state. Cache files with TTLs. Vector indexes that get rebuilt nightly. Configuration that lives in flat files because a database would be overkill.
None of this fits in a skill. It's bespoke infrastructure that exists because the specific problem required it.
Why This Matters
The skills hype creates a misleading impression of what production AI work looks like. Someone sees a collection of 30 slash commands and thinks: that's the system. It's not. It's the tip.
The system is the integration layer. The error handling. The monitoring. The state management. The human-in-the-loop controls. The deployment. The part where you wake up and the thing is still running, handling edge cases the skill never anticipated.
If you're evaluating someone's AI engineering capability, don't ask how many skills they have. Ask what happens when the skill fails. Ask what runs when nobody's in a session. Ask how state persists between interactions. That's where the actual engineering lives.
The Honest Ratio
I spend maybe 5% of my time writing new skills. I spend the rest building and maintaining the infrastructure that makes skills useful in the first place.
A skill that generates a cover letter is worthless without the task management system that tracks proposals, the message log that maintains conversation history, and the pipeline that routes everything to the right place.
A skill that creates a content draft is worthless without the publishing pipeline, the banner generation, the cross-platform distribution, and the editorial calendar that decides what to write next.
The skill is the last mile. The infrastructure is the entire road.
The Question
Next time you see someone demo their skill collection, ask yourself: what's underneath? What happens between sessions? What runs at 4 AM? What breaks, and who gets paged?
That's the 90%. That's the actual work.
I build production AI infrastructure, not prompt collections. If your team needs the 90% that skills don't cover, let's talk: cal.eu/reneza
Top comments (12)
This nails something I've been thinking about a lot. I run a programmatic SEO site with 89K+ pages across 12 languages, and the skills I use (content publishing, site auditing, outreach prospecting) cover maybe 15% of the actual system.
The other 85% is exactly what you describe — cron jobs that sync stock data from yfinance every morning, a Cronicle scheduler orchestrating 12+ ETL pipelines, deploy safety scripts that refuse to push if the build drops below 1000 HTML pages (learned that one the hard way), and metric validation logic that catches when an LLM hallucinates a 41% dividend yield for Apple.
Your point about state persisting between sessions really resonates. I have flat-file state tracking agent outputs across sessions — what ran, what it produced, what needs reconciliation. None of that is a "skill." It's infrastructure that took weeks to get right and breaks in new ways regularly.
I'd add one nuance though: skills become more valuable when they encode institutional memory rather than generic templates. A skill that says "when auditing this specific site, check these 6 things that broke last time" is closer to a runbook than a prompt. The line between skill and infrastructure blurs when the skill carries enough context about the system it operates on.
But your core thesis stands — the demo-able part is the smallest part.
Same here - skills that started as generic prompts gradually accumulated so many "except when X, check Y" conditions that they're basically runbooks now. The context they carry about the system is what makes them useful, not the prompt engineering.
Your dividend yield hallucination check is a perfect example. That's a guardrail you only build after getting burned, and it ends up being the most load-bearing code in the whole system.
Exactly right — the context about the system is the actual value, not the prompt scaffolding. The dividend yield validator is a perfect example of what I'd call "scar tissue code." Nobody writes a range check for financial metrics until they see an LLM confidently state that Apple has a 41% dividend yield. Once you've been burned, that 5-line validation function becomes the most important code in the pipeline. I've found the best skills end up being 20% prompt and 80% accumulated edge cases, domain constraints, and "never do X because Y happened last time" guardrails.
@mrlinuncut Good distinction. The ones that encode real domain logic are different - but they're also the ones that break the fastest when the domain shifts. A skill that encodes "how we do deploys at Company X" works until the deploy pipeline changes, and then you're debugging a prompt instead of debugging code. The gap is maintenance. Generic skills rot slowly because they're simple. Domain-specific skills rot fast because they're coupled to things that move. The more logic you pack into a skill, the more it behaves like any other piece of software that needs tests, versioning, and someone who understands why it was written that way. Which brings you right back to the 90% the article is about.
@larsfaye Exactly right. The debugging and systems thinking part is where the actual engineering lives. LLMs can generate the first pass, but the moment something breaks in a way the model hasn't seen before, you're back to reading logs, forming hypotheses, and tracing through state. That muscle doesn't come from prompting - it comes from years of being stuck and finding your way out. Sounds like your course is targeting the right gap.🙂
@reneza Thanks so much! I'm getting great feedback thus far. Just curious, I don't see my comment, yet you replied to something...I'm new to dev.to and wondering what I'm missing! 😅
Interesting take — but I'd push back on the premise. We have 150+ skills on a production codebase and they're not "saved prompts with structured inputs." They're domain-specific procedures built over months of real work: what files to read, what conventions to follow, what broke last time, what the client's edge cases are.
The difference is whether skills are generic templates or institutional memory. A generic "write unit test" skill is boilerplate — agreed. A skill that says "this module uses
SiControllerTestCase, the delegate pattern requires mocking the proxy callback, and the last time someone forgot to clean up test entities it polluted the database for 3 days" — that's not boilerplate. That's the stuff a senior dev would tell you on day one.You're right that the infrastructure matters more than the skill count. But the skills are infrastructure — they're how the system remembers what it learned. Without them, every session starts from zero.
curious what you think the actual gap is bc the boilerplate comparison makes sense for generic skills, but the ones that encode real domain logic or company specific workflows feel different from copy paste scaffold code
I hear you on the "ugly 90%" – that's where the real trenches are. But for things like API integrations, could skills at least manage the pattern of setting up webhooks or parsing common error responses, even if the specifics need filling in? It's not a full fix, but saves some initial grunt work.
Sure, skills can scaffold the initial webhook setup or error parsing pattern. But that saves maybe 20 minutes. The remaining hours go into handling edge cases that only show up in production. Rate limits at 2am, malformed payloads from a specific API version, retry logic that doesn't cascade into duplicate processing.
The scaffold is nice. The production hardening is where the work actually lives.
100% Aggree with you
Some comments may only be visible to logged-in visitors. Sign in to view all comments.