Aritomo Fukuda

Posted on May 2 • Originally published at Medium

I Launched First. Then I Found a Hidden Design Bug in My AI Pipeline.

#webdev #programming #saas #softwareengineering

You don't always have time to fix everything before launch.

Sometimes you ship, watch nothing break, and then go back to fix what you know is wrong.

This is that story.

The backstory

Before launching OriginBrief, I noticed that Anthropic's Batch API has request limits. If user count grows, those limits become a real constraint.

I flagged it. Then I shipped anyway.

"I'll fix it properly after launch," I told myself.

Today, I finally sat down to do that.

The design review

OriginBrief runs a multi-phase AI pipeline. Phase 1 collects and analyzes sources. Phase 2 generates the final report — key points, market trends, citations.

When I started auditing Phase 2 for request efficiency, something immediately looked off.

For themes with trend indicators, Phase 2 was sending two separate batch requests:

One for the summary
One for trend analysis
My original design intent was simple: Phase 2 = one request per theme. Summary and trend analysis together, in one shot.

So why were they separate?

The root cause

I traced it back through the git history.

Trend analysis was added to the realtime API pipeline at 04:10 JST. Fifty-five minutes later, at 05:05 JST, the pipeline was migrated to the Batch API.

The migration was mechanical. The separation that existed in the realtime pipeline was carried over without question — even though it made no sense for batch processing.

The design intent never made it into the implementation.

The fix

I merged trend analysis into the summary request as an optional field in the JSON schema.

Now the logic is: if trend analysis conditions are met, include the instructions in the same prompt and expect the output in the same response.

One request. One response. Same data.

Phase 2 is now exactly what I originally intended — one request per theme, fixed.

The result

Phase 2 batch requests reduced by ~50% for themes with trend analysis enabled
More headroom in the Batch API budget as user count grows
Cleaner chunk calculation for the scaling implementation I'll build next
What comes next

The real scaling work — chunking the pipeline so it can handle thousands of themes — is now simpler to design. Phase 2 being one request per theme means the math is clean.

That's the next piece. I'll write about it when it's done.

The actual lesson

When you migrate code from one architecture to another, you carry over assumptions that no longer apply.

The separation made sense for realtime. It made no sense for batch.

I only caught it because I forced myself to sit down and ask: "What breaks when we have 10x the users?"

Ship fast. But go back and ask that question.

OriginBrief delivers weekly AI research reports from primary sources. Register a theme, get structured reports — without the manual research.

Top comments (1)

toshihiro shishido • May 3

For SaaS B2B, RPS in the ecommerce sense doesn't translate cleanly because the funnel is multi-stage (Visitor → Lead → SQL → Customer) and revenue lands months after the session. What I've seen work is "pipeline value per session" as the analog — same idea, but with the lead-stage contract-size estimate as the numerator instead of closed revenue. Lets you compare to ecommerce conceptually without the closed-revenue lag killing the signal.