Hélder Vasconcelos

Posted on Mar 14

Built a Visual Workbench Because Managing Claude Code Skills Was Driving Me Crazy

#ai #opensource #agents

It started with a folder full of markdown files.

I'd been using Claude Code daily for months. It became my go-to coding partner pretty quickly. Early on, I discovered Skills: markdown files with YAML frontmatter that you drop into ~/.claude/skills/ to teach Claude how you want things done. Write a SKILL.md, describe when it should trigger, add your instructions, and Claude suddenly knows your deployment pipeline, your coding standards, your project's weird quirks.

I went deep. I wrote skills for everything. Code review guidelines. Database migration patterns. Component scaffolding. API endpoint boilerplate. Test generation strategies. Each one made Claude Code sharper, more tuned to how I actually work.

Then the problems started. 😅

The Skill Management Problem Nobody Talks About 🤯

Five skills in a folder? Totally fine. Thirty skills spread across multiple projects, each with slightly different versions? Absolute nightmare.

I kept hitting the same walls. I'd edit a skill's YAML frontmatter, deploy it, then discover a typo broke the trigger pattern. No validation anywhere. I'd copy a skill between projects, tweak it, then completely forget which version was current. No version history. I wanted to test whether a skill actually produced the output I expected before shipping it. No testing sandbox.

Sharing was the worst part. A teammate would ask for my code review skill. I'd send the file over. They'd ask which model it was tuned for. I couldn't remember. They'd deploy it, get so-so results, and write off skills entirely.

I was spending more time managing skills than writing code. I was using an AI agent to be more productive, but the tooling around that agent kept dragging me back. 🙃

Building What I Needed 🔨

So I did what any developer does when the tooling falls short: built my own.

The idea was straightforward. A visual editor where I can see YAML frontmatter and markdown instructions side by side. Real-time validation so I catch errors before deployment. A way to test a skill against actual models with streaming responses, so I can tweak the instructions until the output matches what I want. And when it's ready, one-click deploy instead of manually copying files around.

That project became uberSKILLS ⚡ an open-source visual workbench for designing, testing, and deploying agent skills.

The first version was rough. A Next.js app with a basic editor and a deploy button that wrote files to ~/.claude/skills/. But even that bare-bones version saved me hours. No more YAML syntax errors. No more blind deployments. No more wondering if a skill would actually work.

From Side Project to Multi-Agent Workbench 🚀

This is where things got interesting.

While I was building uberSKILLS for Claude Code, the agent ecosystem blew up. Cursor shipped their rules system. GitHub Copilot added custom instructions. Windsurf launched with its own skill format. Gemini CLI showed up with agent configuration. Codex, OpenCode, Antigravity... suddenly there were eight major code agents, all supporting some form of persistent instructions.

The problem I'd solved for Claude Code? It existed everywhere. Every agent had its own directory structure, its own conventions, its own deployment path. Developers using multiple agents were maintaining duplicate sets of instructions with zero shared tooling. 😩

So uberSKILLS grew. Today it deploys to eight agents 🎯:

Claude Code
Cursor
GitHub Copilot
Windsurf
Gemini CLI
Codex
OpenCode
Antigravity

Write your skill once, pick your targets, deploy everywhere. The skill format is standardized (YAML frontmatter for metadata and triggers, markdown body for instructions) and the engine handles translation to each agent's expected structure.

This matters more than it might sound. If you've spent time crafting a detailed code review prompt that works great with Claude, you should be able to use that same work with Copilot or Cursor without rewriting anything. Your prompt engineering expertise should be portable. 🔄

What It Actually Does ⚙️

Three steps: create, test, deploy.

✏️ Create

You can go manual with the structured editor and fill in metadata fields (name, description, trigger patterns, tags, model preferences). Or open the AI chat, describe what you want in plain language, and let it generate a complete skill for you. The AI creation flow has a live preview panel so you can watch the SKILL.md update as you refine your description through conversation.

🧪 Test

This is where uberSKILLS really pays for itself. The multi-model sandbox lets you pick any model available through OpenRouter (Claude, GPT, Gemini, Llama, dozens more) and run your skill against it with streaming responses. You see output in real time, plus metrics: token counts, latency, time to first token. Tweak the instructions, test again, compare outputs across models, and actually feel confident a skill works before it touches a real project. Every test run gets saved too, so you can track how instruction changes affect output quality over time. 📊

🚀 Deploy

One click. Pick your target agents from a dropdown, hit deploy, and uberSKILLS writes the files to the correct directory for each agent. Status updates to "deployed" so you can see at a glance what's live and what's still in draft.

Beyond those three steps: there's a skills library with search, status filtering, and sorting. Version history tracks every edit so you can roll back any revision. Import and export lets you pull skills from zip files or directories, and share them with your team. Settings panel covers API key management, theme preferences, and data backup. 📦

The Technical Choices 🛠️

For the curious, here's the stack: Turborepo monorepo with pnpm. Next.js 15 on the App Router with React 19, shadcn/ui, and Tailwind CSS v4. SQLite through Drizzle ORM for the database, so no external database server needed. Everything runs locally. AI integration uses the Vercel AI SDK with the OpenRouter provider for multi-model support.

SQLite was a deliberate choice. uberSKILLS is local-first. Your skills, test history, API keys... all of it stays on your machine. The API key gets encrypted with AES-256-GCM before storage. No cloud dependency, no account to create, no data leaving your laptop. 🔒

Getting started is one command:

npx @uberskillsdev/uberskills

It creates a ~/.uberskills/data/ directory, sets up the database, runs migrations, generates an encryption secret, and launches at localhost:3000. No Docker, no cloning, no configuration ceremony. ✨

Why Skills Are the Multiplier Most Developers Ignore 💡

I talk to developers every week who use Claude Code or Copilot and have never written a single skill. They're leaving a ton of productivity on the table.

A well-written skill turns a general-purpose agent into a specialist. Without skills, you repeat the same context in every conversation. With skills, that context loads automatically based on trigger patterns. Your agent already knows your database conventions, your error handling patterns, your test philosophy, your deployment checklist... before you type a word.

The developers getting the most out of code agents are the ones who invest time teaching them. Skills are how you do the teaching. uberSKILLS is how you manage all that teaching without going crazy. 🧠

What's Next 🗺️

uberSKILLS is open source under MIT and free forever. The roadmap includes a community skill marketplace where developers can share and discover skills, collaborative editing for teams, and deeper integrations as new agents keep showing up.

The agent ecosystem moves fast. New agents ship every month, existing ones pick up new capabilities every week. But one thing stays consistent: developers who customize their agents outperform those who don't. A proper workbench for that customization isn't optional anymore. It's infrastructure.

If you're still managing agent skills by hand-editing markdown files and copying them between directories, try uberSKILLS. Your future self, the one who isn't debugging YAML indentation at midnight, will appreciate it. 😄

GitHub: github.com/uberskillsdev/uberskills

npm: npx @uberskillsdev/uberskills

Top comments (2)

Henry Godnick • Mar 15

Skills management is one of those things that gets messy fast once you have more than a handful. Visual approach makes sense. The other piece that helped me was getting visibility into how much each skill costs in terms of tokens when it gets loaded into context. Some of my skills were way too verbose and inflating the context window every session. A real time token tracker made it obvious which ones needed trimming.

Hélder Vasconcelos • Mar 17

sometimes the best approach is to have simpler instructions. Without testing and measuring it is hard to understand. Thanks for the feedback,