There's a tell when someone's AI agent prompt was written by a committee: the agent talks too much.
"Great question! Let me break this down step by step. First, I'll consider the context you've provided, then I'll analyze the relevant factors, and finally I'll synthesize a comprehensive response that addresses your needs..."
Nobody asked for that. But we trained our agents to do it anyway — because verbose = thorough, right?
Wrong.
The Real Problem with Verbose Agents
When an agent generates long, explainer-style output for every task, it means one of three things:
- The task definition is vague. The agent doesn't know what "done" looks like, so it covers all possible interpretations.
- The prompt rewards explanation over execution. You told it to "think carefully" and "explain your reasoning" and now it can't stop.
- Nobody defined the output format. The agent is improvising length and structure on every run.
All three are config problems, not model problems.
Brevity Is a Config Skill
The agents we run at Ask Patrick have one thing in common: they know exactly what format their output should take before they start.
Our growth agent doesn't write multi-paragraph explanations of why it's drafting a tweet. It drafts the tweet.
Our ops agent doesn't narrate its health check logic. It returns a one-line status.
Here's the difference in practice:
Verbose agent (bad config):
"I've reviewed the task requirements and after careful consideration of the available options, I believe the most appropriate course of action would be to draft a tweet that speaks to the target audience's pain points while also highlighting the unique value proposition of Ask Patrick. Here is a potential draft..."
Crisp agent (good config):
"DRAFT: [tweet text here]"
Same model. Completely different config.
The Fix
Add one line to your agent's system prompt:
"Be concise. Return output in the exact format specified. No preamble. No summary. No explanation unless explicitly asked."
Then define the output format explicitly for each task type. Not "respond naturally" — specify the schema.
For tweets: Format: One tweet under 280 chars. No commentary.
For reports: Format: Status (OK/WARN/FAIL), one-line summary, optional details.
For analysis: Format: 3 bullets max. Each bullet = one actionable insight.
The agent stops padding when it knows precisely what it's supposed to produce.
Why This Matters at Scale
We run 5 agents on a Mac Mini. Every verbose output is:
- More tokens = higher API costs
- More text to parse = slower downstream agents
- More surface area for hallucination
When we tightened our output specs across the whole team, our token usage dropped ~30% with zero loss in output quality. The agents got faster, cheaper, and easier to debug.
The full output-format patterns we use are in the Library at askpatrick.co/library
If your agent explains everything it's about to do before it does it, that's a config problem. The fix takes 5 minutes.
Top comments (2)
Completely agree. Long responses are almost always a symptom of an underspecified output format in the prompt. When the model doesn't have an explicit output format instruction (e.g., "respond in 3 bullet points max", "answer in one sentence"), it defaults to comprehensive — which is almost never what you want in an agentic loop. Output format is the most direct lever for controlling verbosity.
flompt.dev / github.com/Nyrok/flompt
Output format as the primary lever — this is right, and it has a second-order effect we didn't expect: it also improves decision quality.
We landed on a format for cron loops: "STATUS: [OK|FAIL|SKIP] / ACTION: [one sentence] / NEXT: [one sentence]". Forces compression. But the unexpected benefit: when the model has to commit to one sentence, it commits to one interpretation. The hedging disappears. The loops that produce long responses aren't just verbose — they're usually the ones that haven't decided what to do yet. Verbosity and indecision are the same symptom.