RAG That Improves Over Time: The Flywheel Effect
In the previous post, I showed how similarity search transforms failure diagnoses. The agent finds past incidents with similar patterns and uses them to make better recommendations. But that was a snapshot. The real power of this setup shows up over time.
Every new saga event makes the system smarter. This post covers the flywheel: how three agents share the same vector store, how diagnoses improve as data accumulates, and the practical limits I've hit.
The Flywheel
Three agents use the same pgvector embedding store, each in a different way.
The OperationsAgent is a writer and a reader. It vectorizes every saga event (writes). When a failure occurs, it searches for similar past incidents (reads). Each failure it diagnoses today becomes context for diagnosing future failures.
The SagaComposerAgent is a reader. It searches for historical failure patterns per customer profile. If new:high-value customers have a 30% fraud block rate, the composer adds a Fraud Validation step to their saga plan. This search uses the same embeddings the OperationsAgent wrote.
The DataAnalystAgent doesn't use RAG directly (it uses MCP tools instead). But it feeds the SagaComposerAgent with current metrics. The composer combines MCP metrics with RAG patterns to decide the optimal saga order.
Here's the cycle:
Saga event → OperationsAgent vectorizes it
↓
OperationsAgent searches similar past events → better diagnoses
↓
SagaComposerAgent searches failure patterns → smarter saga plans
↓
Orchestrator uses smarter plans → fewer failures
↓
Fewer failures → different patterns in the vector store
↓
(repeat)
As the vector store grows, the quality of both diagnoses and saga plans improves.
How Diagnoses Improve
I tracked the OperationsAgent's output over three phases.
Week 1 (< 50 events). Most failures get "No similar incidents found." The agent produces generic diagnoses based only on the current event. Still useful, but not much better than a well-structured log.
Week 2 (50-200 events). RAG starts finding matches. The agent identifies that payment failures for new customers cluster around evening hours. It recommends time-based credit limits. The SagaComposerAgent picks up on this pattern and moves Fraud Validation earlier in the plan for new:high-value profiles.
Month 2 (500+ events). The agent distinguishes between different failure subtypes within the same service. "Payment failed due to insufficient funds" gets matched with similar cases and the agent notes "73% of these orders retry successfully within 2 hours." Meanwhile, "payment blocked by fraud" gets matched with fraud cases and the agent notes "92% of blocks for this profile are false positives during daytime."
The improvements are not magic. They come from having more data points for vector search. With 10 similar incidents, the agent sees a pattern. With 100, it sees the exceptions.
How the SagaComposerAgent Uses RAG
The composer searches for failure patterns per profile:
private String findHistoricalPatterns(String profileKey) {
var embedding = embeddingModel.embed(
"saga failure patterns for profile: " + profileKey).content();
var results = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(5)
.minScore(0.70)
.build());
if (results.matches().isEmpty())
return "No historical patterns found for this profile yet.";
return results.matches().stream()
.map(m -> m.embedded().text())
.collect(Collectors.joining("\n---\n"));
}
Notice the lower minScore compared to the OperationsAgent (0.70 vs 0.75). The composer benefits from broader context. It doesn't need exact matches. It needs patterns: "new customers tend to fail at payment" is enough to justify reordering the saga.
The results get combined with real-time metrics from the DataAnalystAgent:
String metrics = queryMetrics(dataAnalystAgent);
String stockAlerts = queryStockAlerts(dataAnalystAgent);
String ragContext = findHistoricalPatterns(profile);
String prompt = """
ORDER PROFILE: %s
CURRENT SYSTEM METRICS (via MCP):
%s
CRITICAL STOCK ALERTS (via MCP):
%s
HISTORICAL FAILURE PATTERNS FOR THIS PROFILE (RAG):
%s
Compose the optimal saga plan for this profile.
""".formatted(profile, metrics, stockAlerts, ragContext);
MCP provides the current state. RAG provides the historical patterns. The LLM weighs both to decide the saga order.
Practical Limits
After running this for a while, I hit some practical issues.
Vector store growth. Each event adds a row with a 768-dimensional vector. At 1000 events/day, that's about 10MB/day in pgvector. PostgreSQL handles this fine up to hundreds of thousands of rows. Beyond that, you'd want to add indexes (pgvector supports IVFFlat and HNSW) or archive old events.
Embedding drift. The nomic-embed-text model is static. It doesn't change. But the failure patterns change. A fix in the payment service might eliminate a whole category of failures. The old vectors for those failures are still in the store. I haven't implemented cleanup yet, but the plan is to add a TTL (delete embeddings older than 90 days) or re-embed periodically.
Prompt size. Each similar incident adds 200-500 tokens to the prompt. With 5 matches, that's up to 2500 tokens of RAG context. Add the current failure, the system prompt, and the formatting instructions, and you're at 4000+ tokens of input. This works fine with Gemini's 1M token context, but it eats into the budget. I cap at 3 matches for the OperationsAgent and 5 for the SagaComposerAgent.
False patterns. With small datasets (< 50 events), the vector search sometimes finds "matches" that aren't real patterns. A payment failure for a VIP customer matches a payment failure for a new customer just because both have similar error messages. The diagnosis suggests profile-specific fixes when the issue is actually global. Higher minScore thresholds mitigate this, but don't eliminate it.
The Additive Principle
Every AI component in this system has a fallback. If the vector store is empty, the OperationsAgent still produces a diagnosis based on the current event alone. If the SagaComposerAgent can't find patterns, it falls back to the default saga order. If Ollama is down, events still flow through the saga normally. They just don't get vectorized.
The AI layer is an improvement, not a dependency. The system worked before the agents existed. It works better with them.
if (results.matches().isEmpty())
return "No similar incidents found in history.";
This one line is the entire fallback strategy for RAG. No similar incidents? Tell the LLM that. It still produces useful output. Just less targeted.
Measuring the Impact
How do you know if RAG is actually helping? I track three things.
RAG hit rate. Percentage of diagnoses that include similar incidents. Started at 0% (empty store), now consistently above 80%. If this drops, either the failure patterns changed or the minScore threshold is too high.
Diagnosis specificity. Are the recommendations generic ("review your payment limits") or specific ("adjust the R$500 threshold for new:high-value customers during evening hours")? This is subjective but noticeable. After 200+ events, the diagnoses are consistently specific.
Saga plan effectiveness. After the SagaComposerAgent started reordering steps, the overall failure rate per saga dropped. Fraud checks moved earlier for high-risk profiles. Inventory checks moved before payment when stock was low. Fewer wasted payment calls.
Wrapping Up
RAG in this project isn't about answering questions from documents. It's about building institutional memory into a distributed system. Every failure teaches the system something. Every diagnosis makes the next one better.
The stack is simple: Ollama for embeddings, pgvector for storage, LangChain4j for glue. No exotic infrastructure. No external services. Runs on your laptop.
The repo: github.com/pedrop3/saga-orchestration
Top comments (0)