Eli

Posted on Jun 29

Stop writing giant prompts. Package agent behavior as Skills.

#agents #llm #ai #architecture

Most agent prototypes start the same way.

You write a long system prompt. You add a few tools. You paste in some examples. Then you hope the model remembers the workflow next time.

That works for a demo.

It gets messy when the agent has to do the same job every week, for different users, with slightly different context each time.

The problem is not that the prompt is too short. The problem is that the prompt is doing too many jobs.

It contains product rules, domain knowledge, examples, user preferences, safety notes, tool instructions, output formats, and random fixes from the last time the agent failed.

After a while, nobody wants to touch it.

A prompt is not a product boundary

A prompt is a good place to describe one interaction.

It is a bad place to hide an entire operating manual.

Say you are building an agent for job search coaching. The agent needs to know how to review a resume, track applications, prepare for interviews, rewrite cover letters, and run a weekly review.

You can put all of that in one large prompt.

But then what happens when you want to reuse just the resume review part? What happens when you want the same workflow in another agent? What happens when a user changes their target role? What happens when you need to update your interview prep process?

You either copy-paste chunks of prompt around, or you keep making the master prompt larger.

Neither scales.

The useful unit is the Skill

I prefer to package agent behavior as a Skill.

A Skill is a small, reusable set of instructions for a specific job.

Not a vague personality. Not a full agent. Not user data.

A Skill can contain:

- what the agent helps with
- what it should ask before answering
- workflow steps
- examples
- templates
- memory rules
- tool usage rules
- safety boundaries

The agent can load the Skill when it needs to do that job.

That separation matters.

The Skill says how the job should be done.

The memory says what is true for this user.

The runtime decides when and where the agent works.

Those should not all live in one giant prompt.

Example: an aquarium coach

This is the example that made the idea click for me.

A generic chatbot can answer aquarium questions. Sometimes it does a decent job. But aquarium advice depends heavily on context.

"Why are my shrimp dying?" is not enough information.

The answer depends on:

- tank size
- tank age
- freshwater, planted, reef, or shrimp tank
- livestock
- filter and equipment
- ammonia, nitrite, nitrate, pH, temperature
- recent water changes
- dechlorinator
- new livestock or plants
- dosing history
- previous incidents

If the user has to repeat all of that every time, the agent is just a chatbot with extra steps.

For an aquarium coach, the Skill might say:

Before giving strong advice, ask for:
- tank age
- tank size
- livestock
- recent changes
- current parameter readings

When troubleshooting:
1. Check emergency basics first: oxygen, temperature, ammonia, nitrite.
2. Do not suggest changing five variables at once.
3. Prefer conservative next steps.
4. Ask what changed recently.
5. Log what was tried and what happened.

Boundaries:
- Do not recommend random medication.
- Do not act like a substitute for a vet or experienced local specialist.
- For severe symptoms, suggest testing water and getting local help.

The user's memory stores the actual tank profile.

The Skill stores the method.

That is much easier to maintain than a long prompt that mixes "this is a 20 gallon planted tank" with "never recommend random medication."

Example: a job search coach

Same pattern.

The user memory might store:

- target roles
- location constraints
- resume versions
- companies applied to
- interview feedback
- salary range
- work authorization constraints

The Skill might store:

- resume review checklist
- cover letter template
- weekly application review workflow
- interview prep flow
- rejection follow-up template
- rules for not inventing experience

Again, user facts and reusable behavior are separate.

If the user changes from frontend roles to product engineering roles, the memory changes. The Skill does not need to be rewritten from scratch.

If the resume review workflow improves, the Skill changes. The user's history does not need to be touched.

That is the point.

What should not go in a Skill

A Skill should not become a junk drawer.

I try to keep these out:

- private user data
- API keys
- temporary task state
- one-off instructions
- full chat history
- vague motivational language

"Be helpful and encouraging" is not a Skill.

"Before rewriting a resume bullet, ask for the user's actual responsibility, measurable result, and tools used. Do not invent metrics." That is closer.

Good Skills are boring. They are specific enough to be tested.

Memory should be shaped by the job

A lot of agent memory systems start with a broad question: "What should we remember about the user?"

That usually leads to generic memory.

User likes concise answers. User lives in Berlin. User prefers Python.

Useful, sometimes. But not enough.

For vertical agents, memory should come from the job.

An aquarium coach should remember tank parameters and maintenance history.

A job search coach should remember applications and interview feedback.

An IELTS coach should remember recurring grammar errors, target band score, practice essays, and weak speaking topics.

A fitness coach should remember injuries, equipment, schedule, and training history.

The memory schema should not be universal. It should be designed around the Skill.

Tool use is not the hard part

People often start agent design by asking what tools the agent should have.

That is backwards.

First ask:

What job is this agent doing?
What does it need to know?
What should it remember?
What workflow should it follow?
Where can it cause harm?
When should it ask instead of act?

Then add tools.

A weak agent with many tools is still weak. It just fails in more interesting ways.

For many useful agents, the first version does not need much autonomy. It needs context, a workflow, and a clear boundary.

A simple Skill format

This is the structure I keep coming back to:

# Skill: [job name]

## Use when

Describe when the agent should load this Skill.

## Job

Describe the job in plain language.

## Ask first

List information the agent should collect before giving advice or acting.

## Remember

List user-specific facts that should be stored over time.

## Workflow

Write the repeatable process.

## Templates

Add reusable formats, checklists, or response structures.

## Examples

Include realistic examples.

## Boundaries

Say what the agent should not do.

## Verification

Say how the agent should check its work.

It does not need to be complicated. In fact, complicated Skills are often worse.

If a Skill cannot be understood by a person, it probably will not be reliable for an agent either.

Why this matters

The first wave of agent demos made everything look like a tool problem.

Give the model a browser. Give it a shell. Give it APIs. Let it run.

That is useful for some tasks.

But a lot of real agents are not useful because they are powerful. They are useful because they are prepared.

They know the domain. They remember the right context. They follow a sane workflow. They know when not to guess.

That is the part I want to make easier to build.

I am working on this pattern in ClawMama: hosted agents that combine Skills, memory, and chat channels. The current experiment is simple: take narrow agents, like an aquarium coach or job search coach, and make the reusable Skill explicit instead of burying everything inside a prompt.

If you are building agents, I would start with two questions:

What should this agent remember?
What belongs in the Skill instead of the prompt?

Those two questions catch a surprising amount of bad design early.

DEV Community