<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Baran Özdemir</title>
    <description>The latest articles on DEV Community by Baran Özdemir (@baranozdemir).</description>
    <link>https://dev.to/baranozdemir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3980261%2F350662a1-5c18-4be1-b081-045f1b3bb89b.jpeg</url>
      <title>DEV Community: Baran Özdemir</title>
      <link>https://dev.to/baranozdemir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/baranozdemir"/>
    <language>en</language>
    <item>
      <title>Why agents need memory that improves itself</title>
      <dc:creator>Baran Özdemir</dc:creator>
      <pubDate>Thu, 11 Jun 2026 23:55:59 +0000</pubDate>
      <link>https://dev.to/eidentic/why-agents-need-memory-that-improves-itself-513j</link>
      <guid>https://dev.to/eidentic/why-agents-need-memory-that-improves-itself-513j</guid>
      <description>&lt;p&gt;"Agent memory" usually means a vector database: embed everything the user said, query by similarity, paste the top matches into the prompt. It's a useful trick, but it isn't memory. It's a lookup table that never learns, never forgets correctly, and can't tell you what was true last month versus today. An agent built on it doesn't get smarter the longer you run it — it just accumulates more haystack to search.&lt;/p&gt;

&lt;p&gt;The name &lt;strong&gt;Eident&lt;/strong&gt;ic is deliberate: an agent without memory has no identity. We think real memory needs four things working together.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Facts with a lifetime
&lt;/h2&gt;

&lt;p&gt;Plain vector recall has no concept of time. If a user was on the starter plan in March and the team plan in June, both sentences sit in the index with equal weight, and the model picks whichever embeds closer. That's how agents confidently tell you yesterday's truth.&lt;/p&gt;

&lt;p&gt;Eidentic stores facts in a &lt;strong&gt;temporal knowledge graph&lt;/strong&gt; where each fact carries a validity interval. New information &lt;em&gt;supersedes&lt;/em&gt; the old without deleting it: the agent can answer "what plan are they on now" and "what plan were they on in April" from the same store, and contradictions resolve instead of piling up. Memory that can't reason about time isn't memory — it's a cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Memory the agent edits itself
&lt;/h2&gt;

&lt;p&gt;People don't store a transcript of every conversation; they keep a running summary and revise it. Eidentic gives agents &lt;strong&gt;self-editing memory blocks&lt;/strong&gt; — compact, structured notes the agent rewrites as it learns — plus passive extraction that pulls salient facts out of every turn automatically. You don't write ingestion pipelines or decide what to remember; the agent maintains its own working memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Consolidation between sessions
&lt;/h2&gt;

&lt;p&gt;If memory only ever grows, retrieval gets slower and noisier over time — the opposite of improving. Eidentic runs &lt;strong&gt;sleep-time consolidation&lt;/strong&gt;: between sessions it compresses and merges what was learned, so the next session starts knowing more without a larger prompt. This is the step that makes memory &lt;em&gt;self-improving&lt;/em&gt; rather than merely &lt;em&gt;cumulative&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Recall you can trust
&lt;/h2&gt;

&lt;p&gt;Lexical and vector retrieval each miss things the other catches, so Eidentic fuses both with reciprocal-rank fusion and returns results with citations. An answer drawn from memory can point at the session and the fact it came from — which matters the moment an agent does anything consequential.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is the hard part
&lt;/h2&gt;

&lt;p&gt;Wiring a model to a tool loop is a weekend. Memory that stays correct as it grows — across contradictions, across time, without ballooning the prompt — is the part teams underestimate and then rebuild three times. It's also what separates a demo from an agent you'd put in front of real users for months.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An agent without memory has no identity. Eidentic gives agents theirs — and keeps it honest as the world changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't theoretical: on long histories it's measurably better and cheaper than stuffing everything into context (&lt;a href="https://eidentic.dev/blog/memory-beats-full-context" rel="noopener noreferrer"&gt;the benchmarks&lt;/a&gt;). If you're building something an agent has to remember, start with the &lt;a href="https://docs.eidentic.dev" rel="noopener noreferrer"&gt;docs&lt;/a&gt; or the &lt;a href="https://github.com/eidentic/eidentic" rel="noopener noreferrer"&gt;source&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Memory beats full context on LongMemEval — and the wins we don't get</title>
      <dc:creator>Baran Özdemir</dc:creator>
      <pubDate>Thu, 11 Jun 2026 23:48:44 +0000</pubDate>
      <link>https://dev.to/eidentic/memory-beats-full-context-on-longmemeval-and-the-wins-we-dont-get-303c</link>
      <guid>https://dev.to/eidentic/memory-beats-full-context-on-longmemeval-and-the-wins-we-dont-get-303c</guid>
      <description>&lt;p&gt;A common objection to agent memory is that you don't need it: context windows are huge now, so just put the whole history in the prompt. We wanted a real answer, not a vibe, so we ran two public long-term-memory benchmarks against a full-context baseline. Here's what we found — including the case where the baseline wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;We compared two configurations on the same questions. The &lt;strong&gt;full-context baseline&lt;/strong&gt; stuffs the entire conversation history into the prompt. &lt;strong&gt;Eidentic memory&lt;/strong&gt; ingests the history into its four-tier engine and retrieves only what each question needs. Both use the same model and the same LLM judge. We ran the full sets — no sampling — and we're publishing wins and losses together.&lt;/p&gt;

&lt;h2&gt;
  
  
  LongMemEval: memory wins across the board
&lt;/h2&gt;

&lt;p&gt;LongMemEval uses long histories — roughly 115k tokens across ~50 sessions, 500 questions. This is where memory should help, and it does: &lt;strong&gt;55.2% overall vs 41.0%&lt;/strong&gt; for full context, a 14.2-point gain, winning all six question types.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question type&lt;/th&gt;
&lt;th&gt;Full context&lt;/th&gt;
&lt;th&gt;Eidentic memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-session · user&lt;/td&gt;
&lt;td&gt;67.1%&lt;/td&gt;
&lt;td&gt;84.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-session · assistant&lt;/td&gt;
&lt;td&gt;73.2%&lt;/td&gt;
&lt;td&gt;92.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-session · preference&lt;/td&gt;
&lt;td&gt;3.3%&lt;/td&gt;
&lt;td&gt;26.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-session&lt;/td&gt;
&lt;td&gt;27.8%&lt;/td&gt;
&lt;td&gt;42.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temporal reasoning&lt;/td&gt;
&lt;td&gt;20.3%&lt;/td&gt;
&lt;td&gt;34.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge update&lt;/td&gt;
&lt;td&gt;66.7%&lt;/td&gt;
&lt;td&gt;70.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;55.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The cost difference is the other half of the story. Memory answers each question with about &lt;strong&gt;2,550 tokens&lt;/strong&gt; of retrieved context; the baseline spends about &lt;strong&gt;99,435&lt;/strong&gt; re-reading the whole history every time — up to &lt;strong&gt;~39× fewer tokens&lt;/strong&gt; for the better score. Retrieval isn't just more accurate here, it's dramatically cheaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  LoCoMo: where full context still wins
&lt;/h2&gt;

&lt;p&gt;LoCoMo has a much smaller haystack. When the entire history comfortably fits in the window, brute force is hard to beat: the model can see everything at once, and single- and multi-hop questions don't need retrieval. Here the full-context baseline comes out &lt;strong&gt;7.8 points ahead&lt;/strong&gt;. Memory still uses far fewer tokens (~893 vs ~19,030), but on a small history that trade-off doesn't pay for itself on accuracy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The larger the history, the more memory wins — on accuracy and on cost. On small histories, full context stays competitive. We'd rather you know both numbers than just the flattering one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What this means in practice
&lt;/h2&gt;

&lt;p&gt;If your agent's conversations are short and bounded, you may not need a memory engine at all — and we'll tell you that. But the moment histories grow past what you want to pay to re-read on every turn, retrieval-based memory wins twice: better answers, far fewer tokens. That crossover arrives quickly in real products.&lt;/p&gt;

&lt;p&gt;Full methodology, the harness, and the raw per-question records are in the &lt;a href="https://docs.eidentic.dev/guides/benchmarks" rel="noopener noreferrer"&gt;benchmarks&lt;/a&gt; docs, and the runner lives in the &lt;a href="https://github.com/eidentic/eidentic" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. Reproduce it, and tell us where we're wrong.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing Eidentic</title>
      <dc:creator>Baran Özdemir</dc:creator>
      <pubDate>Thu, 11 Jun 2026 23:47:57 +0000</pubDate>
      <link>https://dev.to/eidentic/introducing-eidentic-51jl</link>
      <guid>https://dev.to/eidentic/introducing-eidentic-51jl</guid>
      <description>&lt;p&gt;Today we're releasing &lt;strong&gt;Eidentic&lt;/strong&gt;, an open-source TypeScript SDK for building AI agents with self-improving memory and the production fundamentals built in — not bolted on. It's Apache-2.0, with no enterprise tier, and it runs on Node, Bun, Deno, and the edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two things you keep rebuilding
&lt;/h2&gt;

&lt;p&gt;Every serious agent eventually needs the same two things, and most stacks make you assemble both yourself.&lt;/p&gt;

&lt;p&gt;The first is &lt;strong&gt;memory that actually improves&lt;/strong&gt;. Not a vector store you query and paste into a prompt, but something that remembers across sessions, resolves contradictions, and gets sharper the longer it runs. The second is the &lt;strong&gt;production layer&lt;/strong&gt;: durable runs, cost limits that are actually enforced, multi-tenant isolation, sandboxed tools, evals that gate CI. In most ecosystems that layer shows up late, as an enterprise add-on, or never.&lt;/p&gt;

&lt;p&gt;Eidentic ships both, in one composable package, fully open.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thirty seconds to a memory-backed agent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;eidentic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;AIModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;SqliteStore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;eidentic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;anthropic&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@ai-sdk/anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;support&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a support agent. Remember the user.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="na"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SqliteStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./eidentic.sqlite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ev&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What did we decide last week?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;u-42&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;stream.delta&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent recalls prior sessions for that &lt;code&gt;sessionId&lt;/code&gt; inside &lt;code&gt;query()&lt;/code&gt;, with citations, and consolidates what it learned while idle. Swap &lt;code&gt;SqliteStore&lt;/code&gt; for &lt;code&gt;@eidentic/libsql&lt;/code&gt; or &lt;code&gt;@eidentic/postgres&lt;/code&gt; and the agent code doesn't change — that's the ports-and-adapters design running through the whole SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the box
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;four-tier memory engine&lt;/strong&gt;: lexical + vector recall, self-editing memory blocks, a temporal knowledge graph, and sleep-time consolidation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durable execution&lt;/strong&gt; — checkpoint and resume with exactly-once tool dispatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforced cost ceilings&lt;/strong&gt;, rate limits, quotas, and multi-tenant isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxed tools&lt;/strong&gt;, deny-by-default permissions, and one-call GDPR erasure.&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;eval harness&lt;/strong&gt; with a CI pass-rate gate, an MCP host + server with OAuth, and A2A.&lt;/li&gt;
&lt;li&gt;First-class &lt;strong&gt;React hooks&lt;/strong&gt;, a &lt;strong&gt;Next.js&lt;/strong&gt; handler, and a CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Honest about where we are
&lt;/h2&gt;

&lt;p&gt;Eidentic is pre-1.0 and stabilizing toward v1. We'd rather over-disclose gaps than oversell, so we publish our benchmarks in full — including the ones we lose. On &lt;a href="https://eidentic.dev/blog/memory-beats-full-context" rel="noopener noreferrer"&gt;LongMemEval&lt;/a&gt;, memory beats a full-context baseline by 14.2 points at up to ~39× fewer tokens; on LoCoMo's smaller haystack, full context still wins. Both runs are public.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;Read the &lt;a href="https://docs.eidentic.dev" rel="noopener noreferrer"&gt;docs&lt;/a&gt;, browse the &lt;a href="https://github.com/eidentic/eidentic" rel="noopener noreferrer"&gt;source on GitHub&lt;/a&gt;, or clone an example for &lt;a href="https://github.com/eidentic/example-nextjs" rel="noopener noreferrer"&gt;Next.js&lt;/a&gt;, &lt;a href="https://github.com/eidentic/example-react" rel="noopener noreferrer"&gt;React&lt;/a&gt;, or &lt;a href="https://github.com/eidentic/example-express" rel="noopener noreferrer"&gt;Express&lt;/a&gt;. If you build something with it, we'd love to hear about it.&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
