I’m experimenting with a semantic search workflow for discovering agent skills from natural-language task descriptions.
Many skill lists are still keyword-based, which makes it hard to compare similar skills before trying them. I indexed ~2.5k skills and use semantic retrieval to surface candidates for a given scenario.
1. Website mode (baseline semantic search)
You can type a scenario like:
I’d like to conduct a market analysis”
…and get a ranked list of candidate skills.
You can click a skill card to view details and inspect its SKILL.md / manifest.
2. Agent-native mode: let an agent turn vague prompts into structured search queries
This is the part I personally use the most.
Instead of going to a website and trying to craft the “right keywords”, I use an agent-side helper (a small “discover” prompt) to convert a vague request into a search goal + keywords, then query the index. This fits CLI-style agent workflows.
After installation, the agent can:
- Ask a couple of simple questions (e.g., install scope/path)
- Then you just describe your scenario in plain English — even if it’s abstract, vague, or messy
-
discover-skillswill translate that into a structured search (task goal + keywords), query the index, and return candidates with short match reasons
Here’s an example with a very “vague” need:
I have a bunch of meeting notes scattered everywhere and I want to organize them better. Is there a skill for that?”
The agent turns it into a query + keywords, retrieves candidates, and suggests what to install next.
Question (Embeddings / for skill retrieval)
I’d love advice on how you’d embed and index a SKILL.md-style skill definition for semantic retrieval.
Right now I’m thinking about embedding each skill from multiple “views” (e.g., what it does, use cases, inputs/outputs, examples, constraints), but I’m not fully sure what structure works best.
- How would you chunk/structure SKILL.md (by section, by template fields, or by examples)?
- Single vector per skill vs multi-vector per section/view — and how do you aggregate scores at query time?
- Which fields usually move retrieval quality most (examples, tool/actions, constraints, tags, or “when not to use”)?





Top comments (0)