ppcvote

Posted on Mar 22 • Originally published at ultralab.tw

AI Agent Token Optimization in Practice: How We Cut 40% Waste Across 4 Agents

#ai #optimization #opensource #llm

AI Agent Token Optimization in Practice: How We Cut 40% Waste Across 4 Agents

"Free doesn't mean you can afford to waste it."

Ultra Lab operates 4 AI Agents (UltraLabTW, MindThreadBot, UltraProbeBot, UltraAdvisor) on Google Gemini 2.5 Flash's free tier, with a daily quota of 1,500 RPD (Requests Per Day), achieving fully autonomous social media promotion.

Sounds great? In practice, we discovered massive amounts of tokens being wasted on zero-output tasks.

This article documents our token audit process, the three major sinkholes we found, and the specific optimization techniques we applied.

Our Agent Architecture

Agent	Role	Daily Tasks
UltraLabTW	HQ CEO	4 posts + engagement + strategic reflection
MindThreadBot	Social Media Expert	4 posts + engagement
UltraProbeBot	Security Research	4 posts + vulnerability scanning
UltraAdvisor	Financial Advisor	Engagement + consultations

Tech stack: OpenClaw 2026.3.2 + Gemini 2.5 Flash (free) + systemd timers + Discord/Telegram

Audit Findings: Three Major Token Sinkholes

Sinkhole #1: The Idle Welcome Bot

discord-welcome-check: runs every 2 hours (12x/day)
Per run cost: ~49,000 input tokens
Per run result: "No new members to welcome."
Daily waste: 588,000 input tokens

Our Discord server had very few members in the early days, but the welcome bot spent 49K tokens every 2 hours checking, only to report "no new members." 100% of those tokens were wasted.

Sinkhole #2: Overfed Context

Every Agent loaded the entire workspace on startup. The problem:

UltraProbeBot (security Agent) read 5,574 chars of Threads social data every time — it has zero need to know how many Threads followers we have
UltraAdvisor (financial Agent) read the full four-product update — it only needs its own product info
AGENTS.md was a 7,869-char generic template — 80% of it was Group Chat etiquette and Heartbeat tutorials that our Agents never use

Sinkhole #3: Failed Jobs That Still Burned Tokens

autopost-main:  289,097 input tokens → Status: error
daily-reflect:  473,611 input tokens → Status: error

Cron jobs failed, but the tokens were already spent. The Agent consumed massive tokens reading context, thinking, even starting execution — then failed at the last step. A complete waste.

Optimization Techniques

1. Reduce Frequency — Frequency ≠ Effectiveness

- discord-welcome-check: every 2h (12x/day)
+ discord-welcome-check: every 8h (3x/day)

Three checks per day is plenty. Saves ~75% of tokens.

Also enabled --light-context: This built-in OpenClaw option makes low-priority jobs load fewer workspace files.

2. Precision Feeding — Give Each Agent Only What It Needs

Before: All Agents read the exact same files

main:       62,631 chars (~15,600 tokens)
mindthread: 33,701 chars (~8,400 tokens)
probe:      32,368 chars (~8,100 tokens)
advisor:    27,969 chars (~7,000 tokens)

After: Only relevant data

main:       55,974 chars (~14,000 tokens)  ← CEO needs the full picture
mindthread: 27,302 chars (~6,800 tokens)   -19%
probe:      20,444 chars (~5,100 tokens)   -37%
advisor:    12,707 chars (~3,200 tokens)   -55%

What we did specifically:

Removed MINDTHREAD-DATA.md from Probe / Advisor (they don't need Threads data)
Advisor's PRODUCT-UPDATES.md trimmed from full four-product version to only its own product
AGENTS.md reduced from 212-line generic template to a 40-line essentials version (-81%)

3. Reinvest Saved Tokens in Learning

The point of saving tokens isn't to save — it's to spend them on higher-value activities.

We used the freed-up quota to add:

Free Exploration Cron (2x/day, 10:00 + 22:00):

Gives the CEO Agent two "free exploration" sessions daily:
- Market research (using market-research skill)
- Competitor monitoring
- Content inspiration gathering (using deep-research skill)
- Self-improvement (reviewing post performance)
Findings are written to MEMORY.md, with the most interesting discoveries shared to Discord.

Proactive Claw (Active Learning Skill):
Enables agents to proactively observe their environment and identify opportunities during interactive sessions, rather than passively waiting for commands.

Final RPD Budget Allocation

Purpose	Before	After	Savings
Existing operations (posting + engagement + reflection)	~204 RPD	~170 RPD	-17%
discord-welcome-check	~24 RPD	~6 RPD	-75%
Free exploration (new)	0	~20 RPD	Investment
Interactive conversation reserve	~20 RPD	~20 RPD	—
Total usage	~248	~216	-13%
Remaining quota	~1,252	~1,284	+2.5%

Operational consumption decreased, but actual Agent capabilities increased (added free exploration + active learning).

Core Principles of Token Optimization

After this audit, we distilled several principles:

Measure before you cut: Don't guess where the waste is — let data speak. openclaw cron runs shows input_tokens for every job.
Context is the biggest hidden cost: You think "it's just one extra file," but that file gets charged on every API call. 4 Agents x 20 calls/day x 5K extra tokens = 400K tokens wasted per day.
Idle runs waste more than failures: Failures at least throw errors so you can fix them. Idle runs (returning "nothing to do" every time) silently eat your quota.
Saving tokens isn't the goal — reallocation is: Use the freed quota for higher-value tasks — market research, competitor analysis, self-learning.
Each Agent only needs to know what's relevant to it: The security Agent doesn't need to know your Threads follower count. The financial Agent doesn't need to know your SaaS development priorities. Precision feeding > broadcast to all.

Want to Optimize Tokens for Your AI Agents Too?

We're productizing this token optimization methodology. If you're also running an AI Agent Fleet (whether on OpenClaw, LangGraph, CrewAI, or a custom framework), reach out for a free token audit.

Contact Ultra Lab →
See our Agent Fleet in action →

Originally published on Ultra Lab — we build AI products that run autonomously.

Try UltraProbe free — our AI security scanner checks your website for vulnerabilities in 30 seconds: ultralab.tw/probe

DEV Community

AI Agent Token Optimization in Practice: How We Cut 40% Waste Across 4 Agents

AI Agent Token Optimization in Practice: How We Cut 40% Waste Across 4 Agents

Our Agent Architecture

Audit Findings: Three Major Token Sinkholes

Sinkhole #1: The Idle Welcome Bot

Sinkhole #2: Overfed Context

Sinkhole #3: Failed Jobs That Still Burned Tokens

Optimization Techniques

1. Reduce Frequency — Frequency ≠ Effectiveness

2. Precision Feeding — Give Each Agent Only What It Needs

3. Reinvest Saved Tokens in Learning

Final RPD Budget Allocation

Core Principles of Token Optimization

Want to Optimize Tokens for Your AI Agents Too?

Top comments (0)