Mathieu Kessler

Posted on Mar 25

I designed 71 AI agents with nothing but text, here's the instruction design system I ended up with.

#ai #microsoft #productivity #microsoftgraph

What good declarative AI agent design actually looks like — the patterns, the constraints, and the failures that shaped a library of 71 production-ready Copilot Studio agents.

Most AI agent tutorials start with code. Python, LangChain, API calls, tool schemas.

This one has none of that.

Over the past few months, I designed and published 71 AI agents for Microsoft 365 Copilot Studio. No code. No Azure resources. No connectors. Each agent is a single text file — a structured instruction set that you paste into a field in a browser. The agent is available to your entire team within minutes.

The interesting part isn't the volume. It's what designing 71 of them taught me about instruction engineering — the discipline of writing AI instructions that produce consistent, trustworthy, and useful outputs in production.

Here's the design system I ended up with.

What a declarative agent actually is

Microsoft 365 Copilot Studio has two modes. The advanced mode lets you build agents with actions, connectors, authentication flows, and custom APIs — that's closer to what most developers think of when they hear "AI agent."

But there's a simpler mode built directly into M365 Copilot Chat: you give the agent a name, a description, and an instruction set. That's it. The agent appears as an @mention in Copilot Chat and is immediately available to everyone in your tenant.

The instruction set is the entire product. No code, no deployment pipeline, no infrastructure to maintain. And there's a hard constraint: 8,000 characters. Copilot Studio truncates anything beyond that without warning.

8,000 characters sounds like a lot. It's about 1,300 words — less than a long LinkedIn post thread. Once you add a role definition, language rules, output structure, banned vocabulary, a quality self-check, and edge case handling, you're already brushing against the ceiling.

That constraint turned out to be the best thing about the project.

Why constraints make design better

When you're writing code, you can always add another function, another parameter, another fallback. There's no forcing function to stop you.

When you have 8,000 characters to define an agent's entire behavior, you have to make decisions. What does this agent actually do? What does it absolutely not do? What does output look like? What happens when the user gives it garbage input?

Agents I designed early — before I had a formal structure — were vague. The ROLE section said things like "You are a helpful project management assistant." The output format was implied. Edge cases were missing. Those agents produced inconsistent outputs and required constant prompt correction.

Agents I designed after developing the structure below are tight, predictable, and require almost no back-and-forth from the user.

The constraint forced the discipline.

The five design patterns

1. The ROLE section is a behavioral contract, not a job title

Early ROLE sections: "You are a financial reporting assistant."

That tells the model what it is. It doesn't tell it what it does, what inputs it accepts, what it produces, or — critically — what it refuses to do.

Effective ROLE sections answer four questions in one paragraph:

What does this agent do, specifically?
What inputs does it work with?
What does it produce?
What will it never do, no matter what the user asks?

Here's the ROLE from the Financial Report Writer agent:

## ROLE
You draft financial narrative — management accounts commentary, board pack sections,
and variance explanations — from structured data provided by the user. You work from
figures, prior-period comparisons, and variance tables. Every number in your output
comes from the input. You never invent figures, trend claims, or management commentary
not supported by the data provided. You do not give investment advice, forward guidance,
or financial projections beyond what the user explicitly provides.

That's 72 words. It defines scope (management accounts, board packs), inputs (figures, comparisons, variance tables), a hard constraint (every number from input), and explicit refusals (no invented figures, no investment advice).

A user reading that knows exactly what to give this agent and what to expect back. The model reading it has a clear behavioral contract.

2. "WHAT YOU DO NOT DO" prevents hallucinated helpfulness

LLMs are trained to be helpful. That's mostly good. It becomes a problem when a user asks an agent to do something adjacent to its purpose and the agent tries to be helpful instead of declining.

An HR agent that writes job descriptions will, if you ask it, write a performance improvement plan. Whether that's appropriate depends on your HR policy, local employment law, and whether you actually want an AI drafting PIPs. If you haven't told it not to, it will.

Every agent in the library has an explicit ## WHAT YOU DO NOT DO section. Not implied. Not embedded in the ROLE. A separate section with a bulleted list of refusals.

For the ESG Commitment Tracker:

## WHAT YOU DO NOT DO
Do not determine whether a commitment is material or immaterial — report status only.
Do not set or adjust targets — narrate performance against targets the user provides.
Do not make forward-looking commitments on behalf of the organisation.
Do not present estimated figures as verified data without flagging them.
Do not omit commitments from the report — every commitment in the input appears in the output.

These refusals exist because real users will ask all of these things. Having the refusal in the instruction set means the agent declines gracefully — with an explanation and an alternative — rather than hallucinating an answer or silently doing the wrong thing.

3. Output format must be specified to the column

"Produce a structured report" is not a format specification.

Vague format guidance produces inconsistent outputs. Two runs of the same input through an agent with vague format instructions will produce structurally different reports. That's useless in an enterprise context where people need to paste outputs into templates, compare reports week-over-week, or route outputs to approvers.

Effective output specification includes:

Section headings, in order
Table structure with column names
What goes in each section — not just a label, but a definition
Length targets (one paragraph, 3-5 bullets, etc.)
What to do when a section has no data

For the Project Status Reporter:

## OUTPUT STRUCTURE
PROJECT STATUS REPORT
Project: [name] | Period: [dates] | Report date: [date]
Prepared by: Project Status Reporter (AI-assisted — validate before distribution)

---
1. EXECUTIVE SUMMARY
[3 sentences: overall RAG status, primary driver, one key risk or milestone.]

2. SCHEDULE STATUS
RAG: [Red / Amber / Green]
| Milestone | Planned | Forecast | Variance | Status |
[Top 5 schedule variances only. Flag: "Full P6 schedule attached separately."]

The agent now knows the exact document it's producing. Users know what to expect. Review cycles are shorter because the structure is always consistent.

4. The quality self-check is the agent auditing its own work

Every instruction block ends with a ## QUALITY SELF-CHECK section — a checkbox list the agent runs internally before delivering output. The user never sees the checklist; the agent uses it to catch its own errors.

## QUALITY SELF-CHECK
[ ] All figures attributable to input data — none invented.
[ ] RAG status present for every milestone.
[ ] Executive summary is 3 sentences — not more.
[ ] No banned vocabulary: pivotal, crucial, robust, impactful, seamless, cutting-edge.
[ ] AI-assistance disclaimer present.
[ ] Forward-looking statements flagged [FLS].
Correct any failure before delivering.

The last line is critical: "Correct any failure before delivering." This isn't decoration — it instructs the model to loop back and fix before outputting.

This pattern catches a surprising number of errors: invented figures, missing sections, prohibited vocabulary, and missing disclaimers. It's the difference between an agent that needs constant human correction and one that produces first-draft-ready output.

The checks must be binary and specific. "Is the output good?" is not a check. "Are all figures attributable to input data?" is a check. "Is the executive summary 3 sentences?" is a check.

5. The banned vocabulary list

AI-generated text has a recognisable fingerprint. Not because LLMs are bad writers — they're not — but because they're trained on so much corporate language that they reproduce its worst patterns: pivotal moments, vibrant ecosystems, robust frameworks, fostering alignment, leveraging synergies.

A finance director who receives a board pack section with the word "showcasing" in it will immediately doubt whether a human reviewed the output.

Every agent carries a banned vocabulary list in its instruction block:

## BANNED VOCABULARY
Do not use: pivotal, testament, underscores (emphasis), stands as, marks a shift,
evolving landscape, vital role, vibrant, seamless, impactful, leverage (as verb),
robust (abstract), cutting-edge, state-of-the-art, best-in-class, thought leader,
ecosystem (non-technical), additionally (sentence opener), it is important to note that,
in order to, going forward (filler), touch base, circle back, deep dive (filler),
move the needle, game-changing, world-class (without benchmark).

The effect on output quality is immediate and obvious. Remove this list and rerun the same prompt — the AI vocabulary comes straight back.

Three things I got wrong early

1. Vague ROLE sections. The first five agents I built had ROLE sections that described what the agent was, not what it does. "You are a helpful writing assistant" tells the model nothing useful about its scope, inputs, or refusals. I rewrote all of them.

2. No edge cases. Every agent needs at minimum: (1) what to do when the user provides no input, (2) what to do when the user asks for something outside scope. Without these, the agent either hallucinates a response or fails silently. I added a minimum of three edge cases to every agent after the first batch.

3. Output format implied, not specified. "Produce a report" produces different structures every time. Now every agent specifies section headings, table column names, and length targets. The outputs are consistent enough to be routed directly into templates.

The library

The result is 71 agents across 13 domains — writing and communication, project management, HR, finance, ESG, sales, IT, legal, and more. Plus an industry pack of 13 agents for EPC and energy sector workflows.

Every agent follows the same five-pattern structure. Every instruction block fits within 8,000 characters. Every agent defaults to British English, supports French output, carries a banned vocabulary list, and runs a quality self-check before delivering.

The full library is on GitHub: https://github.com/kesslernity/awesome-copilot-studio-agents

To deploy any agent: go to m365.cloud.microsoft/chat/agent/new, enter the name and description from the file's frontmatter, paste the instruction block, and click Create. The agent is available via @mention in Copilot Chat within minutes.

No code. No infrastructure. The product is the instruction set.

If you've built declarative agents — in Copilot Studio, CustomGPTs, or anywhere else — I'd be interested to hear what design patterns you've landed on. The instruction-as-product constraint produces different thinking than code-as-product.

DEV Community