<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tanish Oberoi</title>
    <description>The latest articles on DEV Community by Tanish Oberoi (@tanish_oberoi_913ad845184).</description>
    <link>https://dev.to/tanish_oberoi_913ad845184</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886825%2F74f25568-a883-40d1-8e86-5857d3f52da8.png</url>
      <title>DEV Community: Tanish Oberoi</title>
      <link>https://dev.to/tanish_oberoi_913ad845184</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tanish_oberoi_913ad845184"/>
    <language>en</language>
    <item>
      <title>How I Used Hindsight to Make an Agent Actually Learn</title>
      <dc:creator>Tanish Oberoi</dc:creator>
      <pubDate>Sun, 19 Apr 2026 03:21:51 +0000</pubDate>
      <link>https://dev.to/tanish_oberoi_913ad845184/how-i-used-hindsight-to-make-an-agent-actually-learn-35mj</link>
      <guid>https://dev.to/tanish_oberoi_913ad845184/how-i-used-hindsight-to-make-an-agent-actually-learn-35mj</guid>
      <description>&lt;p&gt;The problem I couldn’t ignore&lt;/p&gt;

&lt;p&gt;Most "AI agents" I built had one thing in common: they didn’t improve.&lt;/p&gt;

&lt;p&gt;They could generate messages. Sometimes decent, sometimes awkward. But every request was stateless. No memory. No learning. Just a fresh guess every time.&lt;/p&gt;

&lt;p&gt;At some point, it stopped being interesting.&lt;/p&gt;

&lt;p&gt;I didn’t need better prompts. I needed a way to give my agent memory.&lt;/p&gt;

&lt;p&gt;What I built instead&lt;/p&gt;

&lt;p&gt;I built an outbound prospecting system that improves its messaging over time based on actual outcomes.&lt;/p&gt;

&lt;p&gt;The loop is simple:&lt;/p&gt;

&lt;p&gt;Input a prospect&lt;br&gt;
Retrieve similar past cases&lt;br&gt;
Recall learned patterns&lt;br&gt;
Generate a message&lt;br&gt;
Track outcome&lt;br&gt;
Store it&lt;br&gt;
Learn from it&lt;/p&gt;

&lt;p&gt;That loop is the product.&lt;/p&gt;

&lt;p&gt;The stack is fairly standard:&lt;/p&gt;

&lt;p&gt;FastAPI backend&lt;br&gt;
PostgreSQL for structured data&lt;br&gt;
Chroma for vector similarity&lt;br&gt;
OpenAI for generation and embeddings&lt;br&gt;
Hindsight for long-term memory&lt;/p&gt;

&lt;p&gt;The interesting part is how these pieces interact.&lt;/p&gt;

&lt;p&gt;Why prompt engineering wasn’t enough&lt;/p&gt;

&lt;p&gt;I started the usual way: bigger prompts, more instructions, more "context".&lt;/p&gt;

&lt;p&gt;prompt = f"""&lt;br&gt;
Write a personalized cold email for a {persona}&lt;br&gt;
working at a {company_type} company.&lt;br&gt;
Focus on ROI and keep it concise.&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;It worked, until it didn’t.&lt;/p&gt;

&lt;p&gt;The system couldn’t answer basic questions:&lt;/p&gt;

&lt;p&gt;Do CTOs respond better to ROI or technical depth?&lt;br&gt;
Do founders care more about growth or vision?&lt;br&gt;
What actually leads to meetings?&lt;/p&gt;

&lt;p&gt;It had no memory of outcomes. No feedback loop.&lt;/p&gt;

&lt;p&gt;So I stopped treating generation as the core problem. The real problem was learning.&lt;/p&gt;

&lt;p&gt;Adding memory with Hindsight&lt;/p&gt;

&lt;p&gt;I came across Hindsight and decided to use it as the memory layer.&lt;/p&gt;

&lt;p&gt;Instead of storing raw logs, I started treating every outreach event as a learning unit.&lt;/p&gt;

&lt;p&gt;Here’s a simplified retain call:&lt;/p&gt;

&lt;p&gt;response = requests.post(&lt;br&gt;
    f"{self.base_url}/api/memories/retain",&lt;br&gt;
    json=memory_data,&lt;br&gt;
    headers=self.headers,&lt;br&gt;
    timeout=10&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;Each memory looks something like this:&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "persona": "CTO",&lt;br&gt;
  "industry": "SaaS",&lt;br&gt;
  "message_angle": "ROI",&lt;br&gt;
  "outcome": "meeting"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;This isn’t just logging. It’s structured experience.&lt;/p&gt;

&lt;p&gt;Using similarity before generation&lt;/p&gt;

&lt;p&gt;Before generating a new message, I retrieve context in two ways.&lt;/p&gt;

&lt;p&gt;Vector similarity (Chroma)&lt;br&gt;
results = chroma_collection.query(&lt;br&gt;
    query_texts=[prospect_description],&lt;br&gt;
    n_results=5&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;This gives me:&lt;/p&gt;

&lt;p&gt;similar prospects&lt;br&gt;
past messages&lt;br&gt;
successful patterns&lt;br&gt;
Memory recall (Hindsight)&lt;br&gt;
client.recall(&lt;br&gt;
    bank_id="outbound",&lt;br&gt;
    query="What messaging works for SaaS CTOs?"&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;This returns learned patterns and past outcomes.&lt;/p&gt;

&lt;p&gt;Now the system has context that actually matters.&lt;/p&gt;

&lt;p&gt;Generation becomes simpler&lt;/p&gt;

&lt;p&gt;Once I have:&lt;/p&gt;

&lt;p&gt;similar examples (vector search)&lt;br&gt;
learned patterns (memory)&lt;/p&gt;

&lt;p&gt;message generation is no longer guesswork.&lt;/p&gt;

&lt;p&gt;def generate_message(context):&lt;br&gt;
    prompt = build_prompt(&lt;br&gt;
        prospect=context.prospect,&lt;br&gt;
        examples=context.similar_messages,&lt;br&gt;
        insights=context.memory_patterns&lt;br&gt;
    )&lt;br&gt;
    return openai_client.generate(prompt)&lt;/p&gt;

&lt;p&gt;The model is guided instead of improvising blindly.&lt;/p&gt;

&lt;p&gt;Reflection changed everything&lt;/p&gt;

&lt;p&gt;The most useful feature wasn’t recall. It was reflection.&lt;/p&gt;

&lt;p&gt;Instead of manually analyzing results, I let the system reflect on past data:&lt;/p&gt;

&lt;p&gt;client.reflect(&lt;br&gt;
    bank_id="outbound",&lt;br&gt;
    query="What patterns lead to successful outreach?"&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;This produces insights like:&lt;/p&gt;

&lt;p&gt;CTOs respond better to ROI-focused messaging&lt;br&gt;
Founders engage more with growth narratives&lt;br&gt;
Technical detail increases replies but not meetings&lt;/p&gt;

&lt;p&gt;These insights feed back into generation.&lt;/p&gt;

&lt;p&gt;That’s where the system starts to feel different.&lt;/p&gt;

&lt;p&gt;What this looks like in practice&lt;/p&gt;

&lt;p&gt;A typical flow looks like this:&lt;/p&gt;

&lt;p&gt;Add a prospect (e.g., CTO at a SaaS company)&lt;br&gt;
Retrieve similar cases&lt;br&gt;
Recall memory insights&lt;br&gt;
Generate a message&lt;br&gt;
Track outcome (reply, meeting, ignore)&lt;br&gt;
Store memory&lt;/p&gt;

&lt;p&gt;Example message:&lt;/p&gt;

&lt;p&gt;Hi Rahul,&lt;/p&gt;

&lt;p&gt;Noticed you're scaling a SaaS platform.&lt;br&gt;
We’ve helped similar teams improve outbound ROI significantly...&lt;/p&gt;

&lt;p&gt;If it leads to a meeting, that pattern gets reinforced.&lt;/p&gt;

&lt;p&gt;Over time, messaging shifts toward what works.&lt;/p&gt;

&lt;p&gt;Why the frontend matters&lt;/p&gt;

&lt;p&gt;I underestimated this initially.&lt;/p&gt;

&lt;p&gt;If users can’t see learning, they assume it isn’t happening.&lt;/p&gt;

&lt;p&gt;So I added:&lt;/p&gt;

&lt;p&gt;reply rate trends&lt;br&gt;
persona performance&lt;br&gt;
generated insights&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;"CTOs respond better to ROI messaging than feature-heavy emails."&lt;/p&gt;

&lt;p&gt;Now the system shows its reasoning, not just outputs.&lt;/p&gt;

&lt;p&gt;Lessons learned&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Memory beats better prompts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Prompt engineering helps, but it doesn’t create continuity.&lt;/p&gt;

&lt;p&gt;Without memory, the system repeats mistakes.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Outcome tracking is everything&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bad data ruins learning.&lt;/p&gt;

&lt;p&gt;A reply isn’t the same as a meeting.&lt;/p&gt;

&lt;p&gt;You need meaningful signals.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Similarity + memory works together&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Vector search gives examples. Memory gives patterns.&lt;/p&gt;

&lt;p&gt;Together, they give context.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reflection is underrated&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Recall gives you data. Reflection gives you insight.&lt;/p&gt;

&lt;p&gt;That’s what actually improves behavior.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Most agents don’t learn&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;They generate outputs, but they don’t adapt.&lt;/p&gt;

&lt;p&gt;The difference is a feedback loop.&lt;/p&gt;

&lt;p&gt;Final thoughts&lt;/p&gt;

&lt;p&gt;I didn’t set out to build something complex.&lt;/p&gt;

&lt;p&gt;I just wanted a system that didn’t forget everything after each request.&lt;/p&gt;

&lt;p&gt;Adding memory turned a stateless generator into something that improves over time.&lt;/p&gt;

&lt;p&gt;Not because it’s impressive.&lt;/p&gt;

&lt;p&gt;Because it accumulates experience.&lt;/p&gt;

&lt;p&gt;And that’s the only way it actually gets better.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
