DEV Community

gentic news
gentic news

Posted on • Originally published at gentic.news

How to Cut Agent Token Waste: CLI Over GraphQL + Server-Pushed Hints

Replace raw GraphQL with typed CLI commands to eliminate JSON assembly errors, then add server-pushed hints via MCP to prevent judgment failures. Your agent burns 1,500+ tokens per operation otherwise.

Key Takeaways

  • Replace raw GraphQL with typed CLI commands to eliminate JSON assembly errors, then add server-pushed hints via MCP to prevent judgment failures.
  • Your agent burns 1,500+ tokens per operation otherwise.

The Problem: Your Agent Is Bleeding Tokens on JSON Assembly

You designed the perfect architecture — direct API calls, no MCP overhead, a clean SKILL.md behavior spec. The agent calls your GraphQL endpoint with curl, reads your docs, and executes. Elegant.

Then you watch the token counter. A single upload operation that should cost ~200 tokens burns 1,500+. Why? The agent is guessing JSON field formats wrong, getting GraphQL errors, fetching docs across multiple pages to figure out the correct format, and retrying. Every. Single. Time.

This isn't a documentation problem. It's a structural problem: LLMs are fundamentally bad at assembling nested JSON payloads from scratch. You can fix your docs a hundred times and the agent will find a new field to misformat.

The Fix: Typed CLI Arguments

Instead of making the agent assemble raw JSON in curl commands, wrap your API in a CLI with typed arguments:

# Before: agent assembles raw JSON in curl
curl -X POST /graphql -d '{"query":"mutation { uploadAsset(input: { shotId: \"...\", type: \"start_frame\", provenance: { method: \"ai_generated\", model: \"gpt-image-2\", prompt: \"...\" } }) { id } }"}'

# After: typed CLI arguments, zero JSON assembly
python3 nl.py upload <shotId> start_frame frame.png --method ai_generated --model "gpt-image-2" --prompt "Winter city street"
Enter fullscreen mode Exit fullscreen mode

This eliminates the error-recovery loop entirely. The agent passes flags, not JSON. The CLI dispatcher handles type conversion server-side.

Bonus: One CLI, Two Audiences

Add a --json flag so the same CLI serves both the agent (structured data) and you (human-readable output):

# For the agent: structured JSON for parsing
python3 nl.py overview <noteId> --json

# For you watching: readable progress
python3 nl.py overview <noteId>
# Episode 01: The Algorithm Hunter
#   [===done===|--review--|......not_started.......] 3/12
#   Shot   Status       Rolls    Best   PF
#   01A    done         3        48     Y
#   01B    review       2        41     Y
Enter fullscreen mode Exit fullscreen mode

The Next Level: Server-Pushed Hints

CLI fixed execution errors. But your agent still makes bad decisions — re-rolling without changing prompts, forgetting to use uploaded assets, skipping status updates. These are judgment failures, not execution failures.

Cover image for My server pushes hints to agents — and the 3 iterations that led there

The solution: let your server push hints to the agent proactively. When the server detects an impending mistake (e.g., a prompt written without referencing available assets), it injects a hint:

ctx.pendingHints.push({
  type: "available_refs",
  priority: "high",
  message: `Available refs for prompting: ${refs.map(r => `@${r.filename} (${r.assetType})`).join(", ")}`,
  metadata: { targetId: shot.id, refs },
});
Enter fullscreen mode Exit fullscreen mode

This catches failures before they happen. The agent doesn't have to remember everything — the server nudges it at the critical moment.

How to Apply This to Your Claude Code Workflow

  1. Audit your agent's token waste: Watch for error→doc→retry loops. If you see them, the fix isn't better docs — it's eliminating the assembly step.
  2. Build a CLI wrapper: Create a typed CLI for your API. Even a simple Python script with argparse is enough. Route all 34 commands through it.
  3. Add server-pushed hints: After each operation, check for common judgment failures and inject hints before the agent's next action. Ask your agent: "Would a nudge here have prevented this?"
  4. Iterate with reflection: Pause production periodically and ask your agent what gaps in your behavior spec caused inefficient actions. Fix those gaps. Repeat.

Why This Works

  • CLI arguments are inherently type-safe for LLMs — no JSON assembly, no field guessing, no error recovery loops.
  • Server-pushed hints are cheaper than error recovery — injecting a hint costs ~50 tokens; recovering from a wrong decision costs 1,500+.
  • Your agent is your best auditor — it knows exactly where your spec failed it. Just ask.

This isn't about avoiding MCP. It's about recognizing that the real work starts after the architecture is in place. The agent needs guardrails, not just documentation.


Source: dev.to


Originally published on gentic.news

Top comments (0)