<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vaibhav Kushwaha</title>
    <description>The latest articles on DEV Community by Vaibhav Kushwaha (@vaibhav_kushwaha_e8eb243e).</description>
    <link>https://dev.to/vaibhav_kushwaha_e8eb243e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875272%2F8c34d3b7-cb9b-4632-b1eb-270942fd1232.png</url>
      <title>DEV Community: Vaibhav Kushwaha</title>
      <link>https://dev.to/vaibhav_kushwaha_e8eb243e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vaibhav_kushwaha_e8eb243e"/>
    <language>en</language>
    <item>
      <title>FINLEY MEMORY SYSTEM</title>
      <dc:creator>Vaibhav Kushwaha</dc:creator>
      <pubDate>Sun, 12 Apr 2026 17:51:57 +0000</pubDate>
      <link>https://dev.to/vaibhav_kushwaha_e8eb243e/finley-j3h</link>
      <guid>https://dev.to/vaibhav_kushwaha_e8eb243e/finley-j3h</guid>
      <description>&lt;p&gt;The first time I showed someone Finley processing the same vendor’s invoice twice, they paused and said, “It’s like it actually worked there before.” That single line ended up describing the entire point of what I was trying to build. Most AI agents, especially those built on top of LLMs, are stateless by design. Every request is treated as a completely fresh problem. That works fine for simple validation tasks, but it breaks down quickly in real-world workflows like invoice processing, where context and history matter more than raw correctness. An accounts payable system is not just about checking numbers; it’s about recognizing patterns, remembering vendor behavior, and applying past decisions consistently. Without memory, none of that exists.&lt;/p&gt;

&lt;p&gt;In a typical stateless setup, an invoice comes in, gets parsed, validated, and either approved or flagged. The agent can tell you if totals add up or if fields are missing, but it has no idea that a specific vendor has submitted duplicate invoices before or that their billing system consistently introduces small rounding differences that the finance team has already decided to ignore. That missing layer is institutional knowledge. In most small and mid-sized businesses, that knowledge lives in someone’s head, usually a senior accountant. Over time, they build intuition about vendors, patterns, and exceptions. When that person leaves, all that context disappears, and the system goes back to treating everything as new.&lt;/p&gt;

&lt;p&gt;Finley was built to address exactly that gap by introducing persistent memory into the workflow. Instead of treating each invoice as an isolated event, the system builds a history for every vendor and uses that history when making future decisions. I used Hindsight as the memory layer to make this possible. The idea is simple but powerful: every interaction with a vendor gets stored as a structured memory entry, and future interactions retrieve and use that context.&lt;/p&gt;

&lt;p&gt;Each vendor is assigned its own namespace so that their data stays isolated. When an invoice is processed, the system stores not just the invoice details but also the agent’s decision and any human feedback. The storage logic looks like this: &lt;code&gt;const payload = { namespace: \&lt;/code&gt;vendor:${vendorName.toLowerCase()}&lt;code&gt;, content: JSON.stringify(entry), metadata: { vendorName, invoiceId: entry.invoiceId, verdict: entry.agentDecision, userAction: entry.userAction } }; await hindsightFetch("/v1/memories", "POST", payload);&lt;/code&gt;. This ensures that every decision becomes part of a growing, structured history tied to that vendor.&lt;/p&gt;

&lt;p&gt;When the next invoice arrives from the same vendor, the system doesn’t start from scratch. Instead, it retrieves relevant past interactions and feeds them into the agent’s reasoning process. That context is built using logic like: &lt;code&gt;const memoryContext = memory.length === 0 ? "No prior history for this vendor. This is the first invoice." : \&lt;/code&gt;Vendor has ${memory.length} prior invoice(s) on record:\n${JSON.stringify(memory, null, 2)}&lt;code&gt;;&lt;/code&gt;. This memory context is passed along with the current invoice data, allowing the model to reason not just about the present input but also about historical behavior. This is where the shift happens. The agent stops behaving like a validator and starts behaving like someone who has experience.&lt;/p&gt;

&lt;p&gt;To make this difference clear, I ran a simple experiment. I seeded historical data for a vendor with known patterns, including duplicate invoices, mismatched payment terms, and a previously resolved billing dispute. When processing the first invoice without any memory, the system performed generic checks and approved it without much insight. There was no context, no reasoning beyond surface-level validation. But when processing the tenth invoice after accumulating memory, the behavior changed significantly. The agent recognized the duplicate pattern, flagged the payment terms mismatch, recalled the earlier dispute, and held the invoice for review with a much higher confidence level. The prompt did not change. The logic did not change. The only difference was memory.&lt;/p&gt;

&lt;p&gt;One design decision that turned out to be important was how the memory layer behaves when Hindsight is not available. Instead of making it a hard dependency, I added a fallback to a local in-memory store. The check is simple: &lt;code&gt;if (!HINDSIGHT_API_KEY) { console.warn("[memory] HINDSIGHT_API_KEY not set — using local fallback"); return null; }&lt;/code&gt;. This allows the system to still demonstrate learning within a session, even if persistence across restarts is not available. It made development faster and ensured that the core idea could be tested without external dependencies.&lt;/p&gt;

&lt;p&gt;Another key insight was how memory should be structured. Scoping by vendor turned out to be critical. If memory is not properly partitioned, patterns from one vendor could influence decisions for another, which would break the system’s reliability. By assigning each vendor its own namespace, the system maintains clean boundaries. This also makes reasoning more accurate, as the retrieved context is always relevant to the current interaction.&lt;/p&gt;

&lt;p&gt;Semantic retrieval also played a big role. Instead of fetching exact matches, the system retrieves the most relevant past entries. This means that even if the invoice amount or date is slightly different, the agent can still recognize patterns and apply past knowledge. This is what allows the system to generalize instead of just matching records.&lt;/p&gt;

&lt;p&gt;The most interesting part of this setup is the feedback loop. Every time a human corrects the agent — marking an invoice as a duplicate, adjusting payment terms, or approving something that was flagged — that action becomes a new memory. Over time, the system improves not because it was retrained, but because it accumulated experience. The memory itself becomes the training signal.&lt;/p&gt;

&lt;p&gt;What changed in Finley wasn’t just accuracy, but the nature of its decisions. The agent started providing reasoning that referenced past behavior, not just current inputs. It became more confident, more consistent, and more aligned with how humans actually process invoices. Instead of reacting to each invoice in isolation, it started building a narrative around each vendor.&lt;/p&gt;

&lt;p&gt;In the end, the biggest takeaway from building this was that memory is not just an add-on feature. It fundamentally changes how an agent behaves. Without memory, you have a tool that validates inputs. With memory, you have a system that learns from experience. And that difference is what makes AI agents actually useful in real-world workflows.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
