Akshitha

Posted on Jun 28

Building the Brain of RecallOps: FastAPI, Synthetic Data, and Connecting Everything Together

#ai #architecture #backend #python

In every software project, someone has to be the glue. The person who makes sure all the pieces actually connect, that the data flows from one system to another, that when the frontend sends a request something actually happens on the other end.

That was my job on RecallOps. I built the backend and created the synthetic data that powers our AI agent's memory.

Here's exactly how I did it.

The Architecture I Had to Connect

RecallOps has four main systems that all need to talk to each other:

React Frontend — sends incident descriptions, receives fix recommendations
Hindsight — stores and retrieves past incidents using vector memory
cascadeflow — routes queries to the right AI model based on complexity
Groq — runs the actual LLM inference to generate fix recommendations

My FastAPI backend sits in the middle of all of this. Every request from the frontend comes to my backend. My backend calls Hindsight to find similar past incidents, builds a prompt with that context, calls Groq through cascadeflow's routing, and returns everything back to the frontend in one clean response.

Setting Up FastAPI

FastAPI is the perfect framework for this kind of backend. It's fast, it's Python, it generates automatic API documentation, and it integrates cleanly with all the libraries we needed.

My main.py starts with the basics:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from hindsight_routes import router as hindsight_router
from cascadeflow import CascadeAgent, ModelConfig
from groq import Groq
import json, os

load_dotenv()
app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"]
)

The CORS middleware is essential. Without it, the React frontend running on localhost:5173 would be blocked from calling the backend on localhost:8000 by browser security policies.

The Startup Preloader

One important optimization was preloading all incidents into memory on server startup. Hindsight stores incidents as vector embeddings for similarity search, but to get the full structured incident data (fix applied, resolution time, severity) we need a local index.

@app.on_event("startup")
def preload_incidents():
    with open("incidents.json") as f:
        incidents = json.load(f)
    for inc in incidents:
        _incident_index[inc["id"]] = inc
    print(f"✅ Preloaded {len(_incident_index)} incidents into memory index")

Every time the server starts, it reads all 30 incidents from incidents.json and loads them into the _incident_index dictionary. This means recall operations can always map Hindsight results back to full structured data without hitting the disk on every request.

The Core Query Endpoint

The /api/query endpoint is the heart of RecallOps. Here's what happens when the frontend sends an incident:

Step 1 — Search Hindsight memory:

similar = recall_similar(query=incident_text, top_k=3)

This calls Hindsight's recall API which searches all 30 stored incidents by semantic similarity and returns the 3 most relevant ones.

Step 2 — Build context from past incidents:
The 3 similar incidents get formatted into a context block that gets added to the prompt. This is the key step that makes the agent's responses memory-powered instead of generic.

Step 3 — cascadeflow routing decision:

if len(incident_text) < 100:
    selected_model = cf_models[0]  # fast cheap model
else:
    selected_model = cf_models[1]  # smart model

Short queries go to llama-3.1-8b-instant. Long complex queries go to llama-3.3-70b-versatile.

Step 4 — Groq inference:
The prompt (with memory context included) gets sent to Groq. The response comes back in under 2 seconds.

Step 5 — Return everything:

return {
    "response": agent_response,
    "similar_incidents": similar,
    "model_used": selected_model.name,
    "cost": cost,
    "routing_reason": routing_reason,
    "audit_logs": audit_logs
}

The frontend gets the fix recommendation, the similar incidents for the context panel, the model that was used, and the cost. One request, everything needed for the full UI.

Building the Synthetic Dataset

The hardest part of my job wasn't the code — it was creating 30 realistic synthetic incidents that would make the similarity search actually meaningful.

Real incident data from real companies is confidential. We couldn't use it. But toy data — "error occurred, fix applied" — wouldn't be realistic enough for the similarity search to work well.

I created 30 incidents across three categories:

10 API and Web Server Errors — 503 errors from connection pool exhaustion, SSL certificate expiry, CORS misconfigurations, memory leaks in nginx workers, load balancer health check failures, rate limiting issues, cold start latency problems, and webhook delivery failures.

10 Database Crashes — Postgres connection pool exhaustion, deadlocks from tables locked in wrong order, replica lag from bulk operations, slow queries from missing indexes, disk full from WAL accumulation, migration failures from missing transaction wrappers, Redis OOM kills from infinite TTL, foreign key constraint violations, backup job failures, and index corruption from power loss.

10 CI/CD Pipeline Failures — Build timeouts from integration tests waiting on down third-party services, flaky tests from shared state between parallel runners, Docker push failures from expired ECR tokens, deployment rollbacks from missing infrastructure, missing environment variables, IAM permission failures, S3 bucket deletion, node_modules cache mismatches, staging/production config drift, and zero-downtime deploy failures from wrong grace period settings.

Each incident follows a consistent structure with real-looking error logs, specific root causes, detailed fix descriptions, and realistic resolution times. The specificity is what makes the similarity search useful — when you search for "auth service 503", Hindsight finds INC-001 because the content is rich enough to create a meaningful semantic match.

The Seeding Process

Getting all 30 incidents into Hindsight required a seeding script. I built a seed.py file that reads the JSON and calls the /api/seed endpoint:

import json, requests

with open("incidents.json") as f:
    incidents = json.load(f)

response = requests.post(
    "http://localhost:8000/api/seed",
    json={"incidents": incidents}
)

print(response.json())

The first attempt failed with a 401 Unauthorized error — the Hindsight API key wasn't loading from the .env file because the server had started before the key was saved. Restarting the server fixed it.

The second attempt failed because the API key had quotes around it in the .env file — HINDSIGHT_API_KEY="hsk_xxx" — and Hindsight was receiving the quotes as part of the key. Removing the quotes and hardcoding the key directly in the integration file fixed it immediately.

Third attempt: {'total': 30, 'success': 30, 'failed': []} ✅

Debugging the Full Stack

The most satisfying moment of the hackathon was running the first end-to-end test. I created a test_query.py file:

import requests

response = requests.post(
    "http://localhost:8000/api/query",
    json={"query": "auth service returning 503 errors users cant login"}
)

data = response.json()
print(data["response"])
print("Model used:", data["model_used"])
print("Cost:", data["cost"])
print("Similar incidents found:", len(data["similar_incidents"]))

The output came back with a specific fix mentioning restarting the auth-service and increasing the connection pool — the exact fix from INC-001 in our dataset. The agent wasn't making it up. It was recalling real memory and applying it to the current problem.

That was the moment I knew RecallOps was real.

What I Learned

Integration is the hardest part. Writing individual components is straightforward. Making them all work together is where the complexity lives. Every integration point is a potential failure — wrong data format, wrong authentication, wrong async context, wrong field name. Expect to spend at least half your time on integration.

Synthetic data needs to be specific to be useful. Generic synthetic data produces generic search results. If your incidents all say "service failed, restart fixed it," the similarity search will return everything for every query. Specific, detailed incidents with real error logs and specific fixes produce dramatically better results.

Test every layer independently first. Before connecting everything together, I tested each layer in isolation: Hindsight alone, Groq alone, cascadeflow alone. When integration issues appeared, I knew exactly which layer was causing them.

Environment variables are the source of half of all bugs. Missing keys, quoted keys, keys set in the wrong file, keys loaded before the server starts — I hit almost every possible environment variable issue during this project. Check your .env file first whenever something doesn't work.

The Result

RecallOps has a backend that:

Handles CORS correctly so the frontend can communicate
Preloads 30 incidents into memory on startup
Searches Hindsight for similar past incidents on every query
Routes to the right model using cascadeflow
Returns responses in under 3 seconds
Tracks costs and exposes an audit log
Saves resolved incidents back to memory

It's not perfect — the similarity scores are always 0% due to a Hindsight result mapping issue, and the cascadeflow async integration required a workaround — but it works. Everything connects. The data flows. The agent remembers.

That's what I built. And I'd build it again.

DEV Community