Everyone's sharing their skill libraries right now. "Here are my 20 custom slash commands." "Check out my prompt template collection." "This skill ...
For further actions, you may consider blocking this person and/or reporting abuse
This nails something I've been thinking about a lot. I run a programmatic SEO site with 89K+ pages across 12 languages, and the skills I use (content publishing, site auditing, outreach prospecting) cover maybe 15% of the actual system.
The other 85% is exactly what you describe — cron jobs that sync stock data from yfinance every morning, a Cronicle scheduler orchestrating 12+ ETL pipelines, deploy safety scripts that refuse to push if the build drops below 1000 HTML pages (learned that one the hard way), and metric validation logic that catches when an LLM hallucinates a 41% dividend yield for Apple.
Your point about state persisting between sessions really resonates. I have flat-file state tracking agent outputs across sessions — what ran, what it produced, what needs reconciliation. None of that is a "skill." It's infrastructure that took weeks to get right and breaks in new ways regularly.
I'd add one nuance though: skills become more valuable when they encode institutional memory rather than generic templates. A skill that says "when auditing this specific site, check these 6 things that broke last time" is closer to a runbook than a prompt. The line between skill and infrastructure blurs when the skill carries enough context about the system it operates on.
But your core thesis stands — the demo-able part is the smallest part.
Same here - skills that started as generic prompts gradually accumulated so many "except when X, check Y" conditions that they're basically runbooks now. The context they carry about the system is what makes them useful, not the prompt engineering.
Your dividend yield hallucination check is a perfect example. That's a guardrail you only build after getting burned, and it ends up being the most load-bearing code in the whole system.
Exactly right — the context about the system is the actual value, not the prompt scaffolding. The dividend yield validator is a perfect example of what I'd call "scar tissue code." Nobody writes a range check for financial metrics until they see an LLM confidently state that Apple has a 41% dividend yield. Once you've been burned, that 5-line validation function becomes the most important code in the pipeline. I've found the best skills end up being 20% prompt and 80% accumulated edge cases, domain constraints, and "never do X because Y happened last time" guardrails.
@mrlinuncut Good distinction. The ones that encode real domain logic are different - but they're also the ones that break the fastest when the domain shifts. A skill that encodes "how we do deploys at Company X" works until the deploy pipeline changes, and then you're debugging a prompt instead of debugging code. The gap is maintenance. Generic skills rot slowly because they're simple. Domain-specific skills rot fast because they're coupled to things that move. The more logic you pack into a skill, the more it behaves like any other piece of software that needs tests, versioning, and someone who understands why it was written that way. Which brings you right back to the 90% the article is about.
@larsfaye Exactly right. The debugging and systems thinking part is where the actual engineering lives. LLMs can generate the first pass, but the moment something breaks in a way the model hasn't seen before, you're back to reading logs, forming hypotheses, and tracing through state. That muscle doesn't come from prompting - it comes from years of being stuck and finding your way out. Sounds like your course is targeting the right gap.🙂
@reneza Thanks so much! I'm getting great feedback thus far. Just curious, I don't see my comment, yet you replied to something...I'm new to dev.to and wondering what I'm missing! 😅
Interesting take — but I'd push back on the premise. We have 150+ skills on a production codebase and they're not "saved prompts with structured inputs." They're domain-specific procedures built over months of real work: what files to read, what conventions to follow, what broke last time, what the client's edge cases are.
The difference is whether skills are generic templates or institutional memory. A generic "write unit test" skill is boilerplate — agreed. A skill that says "this module uses
SiControllerTestCase, the delegate pattern requires mocking the proxy callback, and the last time someone forgot to clean up test entities it polluted the database for 3 days" — that's not boilerplate. That's the stuff a senior dev would tell you on day one.You're right that the infrastructure matters more than the skill count. But the skills are infrastructure — they're how the system remembers what it learned. Without them, every session starts from zero.
curious what you think the actual gap is bc the boilerplate comparison makes sense for generic skills, but the ones that encode real domain logic or company specific workflows feel different from copy paste scaffold code
I hear you on the "ugly 90%" – that's where the real trenches are. But for things like API integrations, could skills at least manage the pattern of setting up webhooks or parsing common error responses, even if the specifics need filling in? It's not a full fix, but saves some initial grunt work.
Sure, skills can scaffold the initial webhook setup or error parsing pattern. But that saves maybe 20 minutes. The remaining hours go into handling edge cases that only show up in production. Rate limits at 2am, malformed payloads from a specific API version, retry logic that doesn't cascade into duplicate processing.
The scaffold is nice. The production hardening is where the work actually lives.
100% Aggree with you