My AI pipeline had a 1M token context window. The output still got worse.

#ai #gemini #llm #promptengineering

Fixing a context window problem in an AIOps investigation pipeline

The pipeline stitches context from three repos, calls Gemini with a chain-of-thought prompt, and posts root cause analysis to Slack and Jira. At some point output quality dropped.

Diagnosis

A character count diagnostic showed the actual repo sizes:

frontend    ~527k tokens
backend     ~311k tokens
legacy      ~7.9M tokens

The fixed 50/35/15 budget split was loading the same proportion of irrelevant code regardless of ticket type. A scheduling bug got the same legacy allocation as an auth bug.

Models don't attend uniformly across long contexts. Irrelevant content degrades output quality, it doesn't just take up space. The ceiling wasn't the constraint. Context selection was.

Constraints to consider

Model rate limits and context window. Already hitting the API directly so context caching is available, but the 1M token ceiling is hard. The fix had to work within it, not around it.

Context quality vs. quantity. A smaller focused window consistently outperforms a larger noisy one for reasoning tasks. This ruled out "just get a bigger window" as a solution.

Latency. A secondary concern alongside quality: the time from bug filed to Slack/Jira report. Runner queue time and repo checkout compound. Addressed separately via infrastructure, not the script.

Multi-language repos. Three different primary languages (TypeScript, Go, legacy Node.js) with different directory conventions. The routing table had to account for each independently.

Fix

Label-based routing. Extended the ticket fetch to include labels and components, then mapped those to repo-specific dirs and budget splits:

scheduling → frontend: 45% / backend: 45% (scheduling handlers only) / legacy: 10%
auth       → frontend: 55% (providers, hooks) / backend: 35% / legacy: 10%
default    → 50/35/15

Prompt restructure. Stable content (architecture context, codebase) at the top, ticket at the bottom. Better attention on the ticket, enables implicit caching across back-to-back runs.

The point

Use deterministic pre-filtering before the LLM sees any code. The model sees less. The output is better. Reach for context selection before a bigger window.