I Built a Rate Limiter for My AI Content Agent — Here's What I Learned
Part of a series on running 6 AI agents in production. Part 1 covers the overall architecture.
The first time my content agent published 18 identical posts in a row, I realized I needed rate limiting.
This was not a subtle failure. The agent had a bug that caused it to re-read the same draft queue item and republish it every 30 minutes through the night. By morning: 18 copies of the same article, live on Tumblr, with slightly different post IDs. The cleanup took 45 minutes.
The embarrassing part: this was entirely preventable. I had rate limiting on my todo list since week one. I just assumed the agent would behave well enough not to need it urgently.
Agents don't behave well enough not to need it. Rate limiting is infrastructure, not a nice-to-have.
The problem
My content agent publishes to 5 platforms: Tumblr, Dev.to, Hashnode, Write.as, and (eventually) LinkedIn. Each platform has:
- External API limits — Tumblr allows ~250 posts/day. Dev.to has unpublished limits. Write.as will 429 you if you hit it too fast.
- Platform engagement norms — publishing 10 posts/day on LinkedIn makes people unfollow you. 2-3 posts/day on Tumblr is optimal. Dev.to rewards 2-3 quality posts/week.
- My own content strategy limits — I decided on specific cadences: 2-3 Tumblr originals/day, 1 Write.as post per 12 hours, max 2 Dev.to posts per day.
The API limits are hard constraints. The engagement norms and strategy limits are soft constraints. Both need to be enforced.
The architecture
I ended up with three layers in content_control.py:
Layer 1: Rate limit check
The system checks two conditions: daily count under limit AND minimum interval elapsed.
Layer 2: Semantic deduplication
The 18-posts-in-a-row bug would have been caught here. Jaccard similarity on trigrams. Threshold: 0.82.
Layer 3: Quality gates
Brand safety and basic quality checks. Each platform has different rules.
The routing logic
Once all three checks pass, the system routes to one of four actions based on a confidence score:
- auto_publish: confidence >= 80
- request_approval: confidence 60-79
- regenerate: quality too low
- block: rate limited or duplicate
What I'd do differently
- Log everything from the start.
- Separate platform logs.
- Make limits configurable, not hardcoded.
- Track the reasons for blocks.
Does it work?
In the 3 weeks since implementing this:
- 0 duplicate posts published
- 3 rate limit overruns caught and blocked
- ~12 posts sent to Telegram approval loop, 2 rejected by me
- 1 false positive
The false positive rate is acceptable.
The bigger lesson
Building AI agents that publish content at scale requires treating them like production software, not like prototypes.
The 18-duplicate incident cost me 45 minutes of cleanup. The next time something goes wrong, I want the failure mode to be published nothing rather than published 18 identical posts.
Publish nothing is recoverable. Publishing 18 identical posts is a different kind of problem.
The full content_control.py code is in my agents repo. More on the agent architecture at timzinin.com.
Top comments (1)
The Jaccard trigram dedup is the real hero here — API rate limits are easy, but catching semantic duplicates before they go live is the part most agent builders skip until they have their own 18-post incident.