Two-Pass LLM Processing: When Single-Pass Classification Isn't Enough
Here's a pattern I keep running into: you have a batch of items (messages, tickets, documents, transactions) and you need to classify each one. The obvious approach is one LLM call per item. It works fine until it doesn't.
The failure mode is subtle. Each item gets classified correctly in isolation. But the relationships between items -- escalation patterns, contradictions, duplicate reports of the same issue -- are invisible to a single-pass classifier because it never sees the full picture.
The problem
Say you're triaging a CEO's morning messages. Three Slack messages from the same person:
- 9:15 AM: "API migration 60% done, no blockers"
- 10:30 AM: "Found an issue with payment endpoints, investigating"
- 11:45 AM: "3% of live payments failing, need rollback/hotfix decision within an hour"
A single-pass classifier looks at message #1 and says: "FYI, low priority." It's correct -- in isolation.
But a human reading all three messages sees an escalation from "no blockers" to "production incident requiring executive decision." The classification of message #1 should change in light of messages #2 and #3, because it's the start of a thread that ended in a crisis.
Single-pass classification can't do this. It processes each item without context from the others.
The two-pass architecture
The fix is straightforward: run the LLM twice.
Pass 1 -- Independent classification. Process each item individually. Get per-item labels: category, urgency, metadata. This is your standard classification pass. It runs fast because items can be processed in parallel.
interface Pass1Result {
itemId: string;
category: "urgent" | "delegate" | "fyi" | "ignore";
urgency: "critical" | "high" | "medium" | "low";
summary: string;
suggestedAction: string;
}
async function pass1(items: Item[]): Promise<Pass1Result[]> {
// Each item classified independently
const results = await Promise.all(
items.map(item => classifyItem(item))
);
return results;
}
Pass 2 -- Cross-reference and synthesis. Feed ALL items plus ALL Pass 1 classifications back into the LLM. Ask it to find relationships, patterns, and adjust classifications based on the full picture.
interface Pass2Result {
threads: Thread[]; // Related item clusters
flags: Flag[]; // Cross-item alerts
adjustedClassifications: Map<string, Pass1Result>;
briefing: string; // Synthesized summary
}
async function pass2(
items: Item[],
pass1Results: Pass1Result[]
): Promise<Pass2Result> {
const prompt = buildCrossReferencePrompt(items, pass1Results);
return await llm.generate(prompt);
}
The key insight: Pass 2 sees what Pass 1 cannot. It catches:
- Escalation threads: Items from the same source that increase in severity
- Contradictions: One person says X, then reverses to Y
- Scheduling conflicts: Two items reference the same time slot
- Resolved issues: Problem reported, then resolved -- no action needed
- Deal changes: Financial figures that shift between messages
Why not just do one big pass?
You might think: "Why not feed everything into one prompt and classify + cross-reference in a single call?"
Three reasons:
1. Classification quality degrades in long prompts. When you ask an LLM to do two things at once (classify each item AND find cross-item patterns), it tends to do both worse than if you split them. The model's attention is divided.
2. Structured output is more reliable for simple tasks. Pass 1 returns a clean, typed classification per item. No ambiguity, no free-text interpretation needed. Pass 2 can then assume correct per-item labels and focus entirely on relationships.
3. You can parallelize Pass 1. If you have 50 items, you can fire 50 parallel classification calls. Single-pass would need to fit all 50 items in one prompt, hitting context limits and increasing latency.
The prompt structure matters
For Pass 1, keep it focused:
function buildPass1Prompt(item: Item): string {
return `Classify this message.
Channel: ${item.channel}
From: ${item.from}
Time: ${item.timestamp}
Body: ${item.body}
Respond with JSON:
{
"category": "urgent" | "delegate" | "fyi" | "ignore",
"urgency": "critical" | "high" | "medium" | "low",
"summary": "one sentence",
"suggestedAction": "what the recipient should do"
}`;
}
For Pass 2, structure the input so the model can scan efficiently:
function buildPass2Prompt(
items: Item[],
classifications: Pass1Result[]
): string {
const combined = items.map((item, i) => ({
...item,
classification: classifications[i],
}));
return `You have ${items.length} classified messages.
Review ALL messages together and identify:
1. THREADS: Groups of messages from the same person about the same topic.
Flag if a thread escalates in severity.
2. CONTRADICTIONS: Cases where someone says X then reverses to Y.
3. SCHEDULING CONFLICTS: Multiple items referencing overlapping times.
4. RESOLVED ITEMS: Problems that were reported then resolved
(these need no action, even if Pass 1 flagged them as urgent).
5. RECLASSIFICATIONS: Any items where the Pass 1 classification
should change based on cross-item context.
Messages and their classifications:
${JSON.stringify(combined, null, 2)}
Respond with JSON matching this schema: { threads, flags, reclassifications, briefing }`;
}
When to use two-pass
This pattern adds latency (two sequential LLM calls instead of one). It's worth it when:
- Items have relationships. Messages in a thread, tickets about the same system, transactions from the same account. If items are truly independent, single-pass is fine.
- Cross-item patterns have high consequences. Missing an escalation thread or a scheduling conflict costs more than the extra 2-3 seconds of latency.
- You need both per-item labels AND a synthesis. Dashboards that show individual items with classifications AND a summary view.
- The item count fits in a single context window. Pass 2 needs all items at once. If you have 10,000 items, you'll need a different approach (clustering, then two-pass within clusters).
When it's overkill:
- Items are truly independent (product reviews, standalone support tickets from different customers)
- You only need per-item labels, not cross-item analysis
- Latency is more important than accuracy
Production considerations
Caching. Pass 1 results are deterministic for a given item. If the same item appears in multiple batches, cache the Pass 1 result.
Error handling. If Pass 2 fails, you still have valid Pass 1 classifications. Degrade gracefully to single-pass results rather than failing entirely.
Cost. Pass 2 uses more tokens (all items + all classifications in one prompt). Use a fast, cheap model for Pass 1 (Gemini Flash, Haiku) and a more capable model for Pass 2 if needed. Often the same fast model works for both.
Structured output. Both Gemini (responseMimeType: "application/json") and OpenAI (response_format: { type: "json_object" }) support forcing JSON output. Use it. Parsing free-text LLM output is fragile.
const result = await model.generateContent({
contents: [{ role: "user", parts: [{ text: prompt }] }],
generationConfig: {
responseMimeType: "application/json",
temperature: 0.1, // Low temp for classification
},
});
The general principle
Two-pass processing is a specific case of a broader pattern: separate extraction from synthesis. Pass 1 extracts structured data from each item. Pass 2 synthesizes across the extracted data.
This applies beyond text classification:
- Code review: Pass 1 annotates each file, Pass 2 finds cross-file issues
- Financial analysis: Pass 1 categorizes transactions, Pass 2 finds patterns
- Research synthesis: Pass 1 summarizes each paper, Pass 2 identifies themes and contradictions
The architectural insight is that LLMs are better at focused tasks than multi-objective tasks. Splitting the work into two focused passes produces better results than one ambitious pass, even though it uses more compute.
I build production AI systems. If you're designing LLM pipelines, I'm at astraedus.dev.
Top comments (0)