Sanjana Kumar

Posted on Apr 12

bro what if your AI actually had memory 💀

#ai #webdev #devops #api

I want to tell you about the moment I realized our pricing team was running the same failed experiment for the third time in six months.

Same hypothesis. Same price delta. Same result: churn goes up, revenue goes sideways, everyone acts surprised.

The problem wasn't that people were dumb. The problem was that the knowledge of what happened last time lived nowhere. It was buried in some Slack thread or a spreadsheet nobody opened. So when someone new proposed the idea, there was nothing to stop them.

That's what ExpTrack AI is built to fix. It's a pricing experiment tracker with persistent AI memory — so the next time someone proposes a 20% price hike, the system can say: "hey, we literally tried this in Q1 and churn jumped 18%."

The stack: FastAPI + Groq (qwen/qwen3-32b) + Hindsight memory SDK + vanilla HTML/CSS/JS frontend. No frameworks. Hackathon-ready.

The Architecture — How It All Fits Together

Before diving into code, here's the mental model you need. There are three layers:

Memory layer (Hindsight) — stores every experiment as a searchable vector memory
Intelligence layer (Groq LLM) — analyzes past memories and generates insight
Interface layer (HTML/JS frontend) — where humans interact with the system

The key insight is the learning loop. Every experiment that gets logged enriches the memory. Every future query gets smarter because of it. It's not a chatbot — it's an institutional memory that compounds.

The Learning Loop in Plain English

Here is the exact sequence of events when a user proposes a new pricing experiment:

User submits a proposal (experiment name, original price, proposed price, hypothesis)
Backend calls hindsight.recall() with a semantic query built from the hypothesis and price delta
Top 5 most relevant past experiments are retrieved from the vector store
Past memories + new proposal are sent to Groq LLM as context
LLM returns a Hindsight Insight — either a warning or a validation
New experiment is stored in Hindsight as a "pending" memory
When the experiment ends, PATCH /update-result enriches the memory with the real outcome — closing the loop

Why step 7 matters: Updating the memory with a real outcome transforms a hypothesis into grounded evidence. The next recall() for a similar proposal will surface this — with the actual result.

The Backend — FastAPI + Hindsight + Groq

The backend has two endpoints. That's it. Clean, minimal, exactly what a hackathon needs.

Endpoint 1: POST /check-experiment

This is the pre-flight check. Before anyone runs an experiment, they hit this endpoint. Here's what happens inside:

# 1. Calculate price delta for semantic context
price_delta_pct = (proposed - original) / original * 100

# 2. Build a recall query combining hypothesis + price direction
recall_query = f"{hypothesis}. Price change of {delta:.1f}%."

# 3. Retrieve top 5 semantically similar past experiments
recalled = hindsight.recall(
    query=recall_query,
    collection='pricing_experiments',
    top_k=5
)

# 4. Send past memories + new proposal to Groq LLM
response = groq.chat.completions.create(
    model='qwen/qwen3-32b',
    messages=[system_prompt, user_prompt_with_context]
)

# 5. Store new experiment as pending memory
hindsight.store(content=memory_text, metadata={...status: 'pending'})

The recall query is intentionally rich — it combines the hypothesis intent with the price direction. This means if someone proposes a "premium tier price increase to reduce support load," the system will surface past premium tier experiments and past price increase experiments. Not just one or the other.

Endpoint 2: PATCH /update-result

This is where the learning curve gets established. When an experiment ends, the user logs the outcome:

# Retrieve existing memory
existing = hindsight.get(memory_id=request.memory_id)

# Enrich with real outcome — this closes the learning loop
hindsight.update(
    memory_id=request.memory_id,
    content=updated_content_with_outcome,
    metadata={
        ...status: 'completed',
        outcome: 'Failure',
        reason: 'Churn rose 18% in week 1'
    }
)

After this call, the memory is no longer just a hypothesis. It's evidence. The next time someone proposes anything similar, the LLM will have this as context and can reason from it directly.

Error Handling

The backend handles errors in two tiers:

Hindsight recall/store failures are non-fatal — if memory is temporarily unavailable, the system proceeds without past context and notes this in the LLM prompt
Groq LLM failures raise HTTP 502 with a clear retry message — the insight is the core value, so we surface this error explicitly
If a memory_id is local (API was offline when the experiment was created), the frontend falls back to localStorage gracefully

The AI Flow — What the LLM Actually Does

The LLM prompt is where the product thinking lives. Here's the system prompt:

You are a Pricing Strategy AI Analyst with access to
a company's full history of pricing experiments.
Your job: analyze a proposed pricing change against
past outcomes and deliver a clear, actionable Hindsight Insight.

Rules:
- Reference actual past experiment names and outcomes.
- Be concise: 2-4 sentences maximum.
- ⚠ Warning if past experiments warn against this idea.
- ✅ Validate if past experiments support it.
- Always end with one concrete recommendation.

Temperature is set to 0.3 — low enough for consistent analytical output, high enough to avoid robotic phrasing. The model gets the full past context injected into the user message, so every insight is grounded in your actual data, not generic pricing advice.

What a Good Insight Looks Like

⚠ Warning: You ran a similar 20% increase on your Starter tier in Q1 and churn rose 18% in 3 weeks. The hypothesis was almost identical. Consider a 5% incremental test first to measure elasticity before committing to a full rollout.

That's not generic. That's your history, applied to your current decision. That's the whole point.

The Frontend — Pure HTML/CSS/JS

No React. No Vue. No build step. The frontend is a single index.html file with ~400 lines of vanilla JavaScript. Hackathon judges love this — it runs anywhere, loads instantly, and there's nothing to break.

Key Frontend Decisions

API base URL is configurable via a text input — point it at localhost or a deployed URL without touching code
localStorage persistence — experiments survive page refresh, demo data seeds on first load
Graceful offline fallback — if the API is unreachable and the memory_id is local, result logging saves to localStorage without breaking
AI Insights computed locally from experiment history — patterns like repeated failures, average delta, discount trends
Cards animate in with CSS keyframes — smooth fade-slide on every new experiment

The Form to API Connection

When the user submits the form, the frontend calls POST /check-experiment and renders the Hindsight Insight inline below the form:

const res = await fetch(apiBase() + '/check-experiment', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    experiment_name: name,
    original_price: orig,
    proposed_price: prop,
    hypothesis: hypo
  })
});

const data = await res.json();
// data.memory_id saved locally for later result logging
// data.hindsight_insight rendered in the insight box

The memory_id returned from the API is stored in the local experiment object. When the user clicks "Log result" on a card, that ID is sent to PATCH /update-result to close the loop in Hindsight.

Demo Data — What Ships Out of the Box

Five sample experiments seed on first load, giving anyone an immediate picture of the product:

Experiment	Delta	Outcome
Q1 Starter Tier Bump	+30%	❌ Failure — churn jumped 18%
Pro Annual Discount	-20%	✅ Success — annual conversions up 34%
Enterprise Add-on Fee	+17%	✅ Success — zero churn, 2 new upsells
Freemium Seat Limit	new $4.99	❌ Failure — free users churned entirely
Mid-Market 5% Test	+5%	⏳ Pending

The mix of successes, failures, and a pending experiment demonstrates the full lifecycle in one view. The success rate widget updates immediately, the insights panel generates patterns, and the pending card shows the "Log result" button ready to demo.

What I'd Do Differently

If this weren't a hackathon build, I'd make a few changes:

Replace Hindsight with ChromaDB for local deployment — zero signup, same semantic search, easier for teams to self-host
Add experiment tagging (product line, market segment) so recall queries can be scoped — right now every experiment competes in one global search space
Build a timeline view — the card grid is good but a chronological view of experiments per product would show the learning curve visually
Add a "similar experiments" panel on the form — show the top 3 recalled memories directly in the UI alongside the LLM insight

The Core Idea, Restated Simply

Most AI tools are stateless. You ask a question, you get an answer, nothing is remembered. ExpTrack AI is different because it compounds — every experiment that completes makes the next recommendation smarter.

That's not a complicated technical idea. It's just: write things down in a place the AI can actually read them.

The Hindsight integration is the whole product, really. Groq and FastAPI are the plumbing. The memory layer is the value.

Built for hackathon. Works in production. Ship it. 🚀

DEV Community