<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ninad Pathak</title>
    <description>The latest articles on DEV Community by Ninad Pathak (@ninadwrites).</description>
    <link>https://dev.to/ninadwrites</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3733015%2F54b87894-6e02-4d92-b9bf-24e4e078ce37.png</url>
      <title>DEV Community: Ninad Pathak</title>
      <link>https://dev.to/ninadwrites</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ninadwrites"/>
    <language>en</language>
    <item>
      <title>Self-Hosting Mem0: A Complete Docker Deployment Guide</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Mon, 23 Feb 2026 18:47:15 +0000</pubDate>
      <link>https://dev.to/mem0/self-hosting-mem0-a-complete-docker-deployment-guide-154i</link>
      <guid>https://dev.to/mem0/self-hosting-mem0-a-complete-docker-deployment-guide-154i</guid>
      <description>&lt;p&gt;You ship an AI assistant, users love it, and then legal asks where the conversation data lives.&lt;/p&gt;

&lt;p&gt;Nobody has a great answer when the memory layer runs on someone else's servers, priced at whatever the provider decides next quarter. Self-hosting removes the problem entirely.&lt;/p&gt;

&lt;p&gt;Mem0's open-source server packages the full self-hosting stack into three Docker containers: FastAPI for the REST API, PostgreSQL with &lt;a href="https://docs.mem0.ai/components/vectordbs/dbs/pgvector" rel="noopener noreferrer"&gt;pgvector for embeddings&lt;/a&gt;, and Neo4j for entity relationships. Everything stays on your network.&lt;/p&gt;

&lt;p&gt;You'll go from an empty directory to a running deployment, then work through the REST API, swap in local models for offline operation, harden things for production, and deploy to AWS.&lt;/p&gt;

&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Mem0's self-hosted stack is three Docker containers: the API server, PostgreSQL with pgvector, and Neo4j. One &lt;code&gt;docker compose up&lt;/code&gt; gets you running.&lt;/li&gt;
&lt;li&gt;The REST API handles full CRUD on memories without needing the Python SDK. Curl works fine.&lt;/li&gt;
&lt;li&gt;Default setup is OpenAI (gpt-5-nano for extraction, text-embedding-3-small for embeddings). Swap both for Ollama models to go fully offline.&lt;/li&gt;
&lt;li&gt;No auth and wide-open CORS by default. You'll need a reverse proxy before exposing this to any network.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploying the stack step by step
&lt;/h2&gt;

&lt;p&gt;You don't need to clone the Mem0 repo. The whole deployment is three files in a fresh directory, and Docker handles the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;You need Docker Desktop, which bundles Docker Compose v2, and an OpenAI API key. The default LLM and embedding model both call OpenAI, so the key is required even though everything else runs on your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the files
&lt;/h3&gt;

&lt;p&gt;Create a fresh directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; mem0-deploy &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;mem0-deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll add three files here. First, the &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sk-your-key-here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, a &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; mem0/mem0-api-server:latest&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="s2"&gt;"psycopg[binary,pool]"&lt;/span&gt; &lt;span class="s2"&gt;"mem0ai[graph]"&lt;/span&gt; rank-bm25 langchain-neo4j neo4j
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may encounter errors in some older versions of the self-hosted Mem0 image. In that case, you may see an &lt;code&gt;ImportError&lt;/code&gt; on startup for &lt;code&gt;psycopg&lt;/code&gt;, &lt;code&gt;langchain-neo4j&lt;/code&gt;, and others. This Dockerfile pulls the official image and installs what's missing on top.&lt;/p&gt;

&lt;p&gt;The third file is &lt;code&gt;docker-compose.yaml&lt;/code&gt;, which wires all three services together. The full file is on &lt;a href="https://gist.github.com/BexTuychiev/2597529900335a66265dff5955f1cebf" rel="noopener noreferrer"&gt;this GitHub gist&lt;/a&gt; if you want to copy-paste it in one shot. Below, it's broken into chunks so you can see what each service does and why.&lt;/p&gt;

&lt;p&gt;Start with the mem0 service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mem0-selfhost&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem0&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mem0-selfhost:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8888:8000"&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mem0_network&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./history:/app/history&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
      &lt;span class="na"&gt;neo4j&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PYTHONDONTWRITEBYTECODE=1&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PYTHONUNBUFFERED=1&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_HOST=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PORT=5432&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_DB=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_USER=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PASSWORD=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_URI=bolt://neo4j:7687&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_USERNAME=neo4j&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_PASSWORD=mem0graph&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;build: .&lt;/code&gt; tells Compose to build from your Dockerfile instead of pulling the prebuilt image. That's how the patched dependencies make it into the container. The &lt;code&gt;depends_on&lt;/code&gt; block with &lt;code&gt;condition: service_healthy&lt;/code&gt; is worth paying attention to. Compose won't start the mem0 server until both databases pass their health checks. Without it, the API tries to connect before anything is listening and crashes immediately.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;volumes&lt;/code&gt; line mounts a local &lt;code&gt;./history&lt;/code&gt; directory where Mem0 writes a SQLite audit trail of every memory operation. Port &lt;code&gt;8888:8000&lt;/code&gt; maps the container's internal port to 8888 on your host, which is where you'll hit the API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ankane/pgvector:v0.5.1&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;on-failure&lt;/span&gt;
    &lt;span class="na"&gt;shm_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128mb"&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mem0_network&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_USER=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PASSWORD=postgres&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pg_isready"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-q"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-d"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-U"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_db:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8432:5432"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't standard PostgreSQL. The &lt;code&gt;ankane/pgvector&lt;/code&gt; image comes with the pgvector extension pre-installed, which adds vector similarity search to Postgres. That's what Mem0 uses to store and query embeddings. The port is mapped to 8432 instead of the default 5432 so it won't collide with any Postgres instance already running on your machine. The health check runs &lt;code&gt;pg_isready&lt;/code&gt; every 5 seconds, and that's what the mem0 service waits on before booting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;neo4j&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;neo4j:5.26.4&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mem0_network&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;wget http://localhost:7687 || exit &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;90s&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8474:7474"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8687:7687"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;neo4j_data:/data&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_AUTH=neo4j/mem0graph&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_PLUGINS=["apoc"]&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_apoc_export_file_enabled=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_apoc_import_file_enabled=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_apoc_import_file_use__neo4j__config=true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Neo4j handles the graph side of Mem0's memory pipeline. When the system extracts entities and relationships, such as &lt;code&gt;user → prefers → Python&lt;/code&gt;, they land here. The &lt;code&gt;NEO4J_PLUGINS=["apoc"]&lt;/code&gt; line loads the APOC plugin, a collection of utility procedures Mem0 depends on for graph operations.&lt;/p&gt;

&lt;p&gt;Note the &lt;code&gt;start_period: 90s&lt;/code&gt; in the health check. Neo4j is the slowest container to initialize, and this gives it a 90-second grace period before Compose starts counting failed checks. Two ports are exposed: 8474 for the Neo4j browser UI and 8687 for Bolt protocol connections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;neo4j_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres_db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem0_network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The named volumes persist database files on your host so memories survive container restarts. If you ever want a clean slate, &lt;code&gt;docker compose down -v&lt;/code&gt; wipes them. The bridge network keeps all three containers discoverable to each other by service name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bringing the stack up
&lt;/h3&gt;

&lt;p&gt;Build and start everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--build&lt;/code&gt; flag tells Compose to build the image from your Dockerfile before starting. On later runs where the Dockerfile hasn't changed, you can drop it and just use &lt;code&gt;docker compose up -d&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First run pulls three base images, roughly 500 MB total, and installs the Python dependencies inside the mem0 container. Expect 2 to 5 minutes depending on your connection.&lt;/p&gt;

&lt;p&gt;Check that all three containers are running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NAME                         IMAGE                    STATUS                      PORTS
mem0-selfhost-mem0-1         mem0-selfhost:latest    Up 16 seconds               0.0.0.0:8888-&amp;gt;8000/tcp
mem0-selfhost-neo4j-1        neo4j:5.26.4            Up 26 seconds (healthy)     ...
mem0-selfhost-postgres-1     ankane/pgvector:v0.5.1  Up 26 seconds (healthy)     ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hit the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8888/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns a 307 redirect to &lt;code&gt;/docs&lt;/code&gt;, the auto-generated OpenAPI page. Open &lt;code&gt;http://localhost:8888/docs&lt;/code&gt; in a browser to see every available endpoint and test them interactively. That's the closest thing to a UI the self-hosted version has. The dashboard is platform-only.&lt;/p&gt;

&lt;p&gt;If the mem0 container exits instead of staying up, check the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose logs mem0 &lt;span class="nt"&gt;--tail&lt;/span&gt; 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of the time it's a missing dependency, which the custom Dockerfile should handle, or a database connection timeout. For the timeout case, give it another try. Neo4j needs that full 90-second start period on the first boot, and sometimes Compose gets impatient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the self-hosted memory API
&lt;/h2&gt;

&lt;p&gt;With the stack running, here's what happens when you send it data. The LLM, gpt-5-nano by default, reads your input and extracts discrete facts. Each fact gets embedded with text-embedding-3-small and stored in pgvector. At the same time, entities and relationships sync to Neo4j. Search works in reverse. Your query gets embedded, pgvector returns the closest matches, and Neo4j optionally adds related entities through graph traversal.&lt;/p&gt;

&lt;p&gt;The REST API covers the full lifecycle of a memory. Every call below hits &lt;code&gt;http://localhost:8888&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding and searching memories
&lt;/h3&gt;

&lt;p&gt;Add a memory by sending a message with a &lt;code&gt;user_id&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8888/memories &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "messages": [{"role": "user", "content": "I love hiking in the Rockies and my favorite programming language is Python."}],
    "user_id": "test_user"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mem0 doesn't store the raw message. The LLM breaks it into individual facts and the graph store picks out entities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e50ffc5f-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Loves hiking in the Rockies"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ADD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"07689942-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Favorite programming language is Python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ADD"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"relations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"added_entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"relationship"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"loves"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hiking"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"relationship"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hiking_location"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rockies"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"relationship"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"favorite_programming_language"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One input produced two memories and three graph relationships. The LLM decided the hiking preference and the favorite language were separate facts worth storing individually.&lt;/p&gt;

&lt;p&gt;Search is a POST to &lt;code&gt;/search&lt;/code&gt; with a natural language query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8888/search &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"query": "outdoor activities", "user_id": "test_user"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e50ffc5f-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Loves hiking in the Rockies"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.62&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_user"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"07689942-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Favorite programming language is Python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test_user"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;score&lt;/code&gt; is cosine distance between the query embedding and each stored memory. Lower means a closer match. The hiking memory scored 0.62 against "outdoor activities" while the Python memory was a weaker match at 0.92, which is what you'd expect.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listing and deleting
&lt;/h3&gt;

&lt;p&gt;Fetch all memories for a user with a GET:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"http://localhost:8888/memories?user_id=test_user"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns the same memory objects plus their graph relations. Useful for debugging or building a user profile view.&lt;/p&gt;

&lt;p&gt;Delete a specific memory by ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; DELETE &lt;span class="s2"&gt;"http://localhost:8888/memories/e50ffc5f-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Memory deleted successfully"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Calling the API from Python
&lt;/h3&gt;

&lt;p&gt;The most straightforward approach is plain HTTP with the &lt;code&gt;requests&lt;/code&gt; library, since you already have a running REST server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8888&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Add a memory
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/memories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I work at a healthcare startup and prefer PyTorch for ML projects.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;py_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ADD] Works at a healthcare startup
[ADD] Prefers PyTorch for ML projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Search
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;machine learning preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;py_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prefers PyTorch for ML projects (score: 0.44)
Works at a healthcare startup (score: 0.83)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you'd rather skip the REST layer, Mem0's Python SDK also has a &lt;code&gt;Memory&lt;/code&gt; class that connects directly to the backing databases. You pass it a config dict with your Postgres and Neo4j connection details, and it runs the LLM calls and embedding logic in your own process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Memory&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pgvector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;graph_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo4j&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bolt://localhost:8687&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo4j&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mem0graph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prefers VS Code for Python development&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;py_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coding tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;py_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both approaches hit the same underlying data. The REST path is language-agnostic and keeps your application code decoupled from Mem0's internals. The &lt;code&gt;Memory&lt;/code&gt; class gives you tighter integration but ties you to Python and the &lt;code&gt;mem0ai&lt;/code&gt; package.&lt;/p&gt;

&lt;h2&gt;
  
  
  Swapping components for a fully local setup
&lt;/h2&gt;

&lt;p&gt;The default configuration calls OpenAI for every memory operation. If your whole reason for self-hosting is keeping data off external servers, that's a problem. You can replace both the LLM and the embedding model with local alternatives through Ollama.&lt;/p&gt;

&lt;h3&gt;
  
  
  Going offline with Ollama
&lt;/h3&gt;

&lt;p&gt;You'll need &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; installed and two models pulled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.1
ollama pull nomic-embed-text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass a config dict to &lt;code&gt;Memory.from_config()&lt;/code&gt; that points both the LLM and embedder at your local Ollama instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Memory&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama_base_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama_base_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pgvector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding_model_dims&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;graph_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo4j&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bolt://localhost:8687&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neo4j&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mem0graph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m a backend engineer who primarily works with Go and PostgreSQL.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ADD] Works as a backend engineer
[ADD] Primarily works with Go and PostgreSQL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;programming languages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama_user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Primarily works with Go and PostgreSQL (score: 0.45)
Works as a backend engineer (score: 0.78)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;embedding_model_dims: 768&lt;/code&gt; in the vector store config. That line is easy to miss and hard to debug without it. OpenAI's text-embedding-3-small produces 1536-dimensional vectors, which is what pgvector defaults to. Nomic-embed-text produces 768-dimensional vectors. If the dimensions don't match, pgvector throws a &lt;code&gt;DataException: expected 1536 dimensions, not 768&lt;/code&gt; error on every insert. Setting &lt;code&gt;embedding_model_dims&lt;/code&gt; tells Mem0 to create the table with the right column size.&lt;/p&gt;

&lt;p&gt;One catch: if you already have data stored with OpenAI embeddings, you can't just swap the embedder in place. The existing vectors are 1536-dimensional and the new ones would be 768. You'd need to wipe the vector store with &lt;code&gt;docker compose down -v&lt;/code&gt; and re-add your memories with the new model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative vector and graph stores
&lt;/h3&gt;

&lt;p&gt;Mem0 supports over 20 vector store backends beyond pgvector, including Qdrant, ChromaDB, Milvus, and Pinecone. On the graph side, you can swap Neo4j for Memgraph. Each swap is a config dict change. The &lt;a href="https://docs.mem0.ai/open-source/overview" rel="noopener noreferrer"&gt;Mem0 docs&lt;/a&gt; list every supported provider and its config options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardening for production
&lt;/h2&gt;

&lt;p&gt;The default stack is designed for local development. Before it touches a real network, a few things need to change.&lt;/p&gt;

&lt;p&gt;All three services bind to &lt;code&gt;0.0.0.0&lt;/code&gt; by default, meaning any device on the network can reach Postgres, Neo4j, and the API directly. In the compose file, prefix each host port with &lt;code&gt;127.0.0.1&lt;/code&gt;, for example &lt;code&gt;127.0.0.1:8888:8000&lt;/code&gt;. This locks every port to localhost. The only port that should face the outside world is the one served by a reverse proxy.&lt;/p&gt;

&lt;p&gt;Mem0 ships with no authentication and its CORS policy is &lt;code&gt;allow_origins=["*"]&lt;/code&gt;. You need a reverse proxy in front of the mem0 service. Caddy is the lowest-friction option since it handles Let's Encrypt certificates automatically with minimal config. Nginx and Traefik both work too. API key or OAuth2 authentication should live at the proxy layer because Mem0 has no auth middleware of its own. A Caddy setup with a &lt;code&gt;reverse_proxy localhost:8888&lt;/code&gt; directive and a domain name gets you TLS and a single entry point in a few lines of config.&lt;/p&gt;

&lt;p&gt;The compose file also has passwords hardcoded in plain text, &lt;code&gt;postgres&lt;/code&gt; and &lt;code&gt;mem0graph&lt;/code&gt;. Move all credentials into your &lt;code&gt;.env&lt;/code&gt; file and reference them with &lt;code&gt;${VARIABLE}&lt;/code&gt; syntax in the YAML. Generate strong values with &lt;code&gt;openssl rand -base64 32&lt;/code&gt; and add &lt;code&gt;.env&lt;/code&gt; to &lt;code&gt;.gitignore&lt;/code&gt;. For Docker Swarm or Kubernetes, use Docker secrets or a vault service instead of environment variables.&lt;/p&gt;

&lt;p&gt;Here's what those changes look like on the mem0 service, along with resource limits, a restart policy, and log rotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Partial — merge these into your existing service config&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mem0&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:8888:8000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PASSWORD=${POSTGRES_PASSWORD}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;NEO4J_PASSWORD=${NEO4J_PASSWORD}&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512M&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
    &lt;span class="na"&gt;logging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json-file"&lt;/span&gt;
      &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;max-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10m"&lt;/span&gt;
        &lt;span class="na"&gt;max-file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the same pattern to the postgres and neo4j services. Neo4j is the heaviest container, so give it a higher memory limit, around 2 GB. Without resource limits, one container can consume all available memory and starve the others.&lt;/p&gt;

&lt;p&gt;Named volumes keep your data alive across container restarts, but volumes are not backups. Schedule &lt;code&gt;pg_dump&lt;/code&gt; through a cron job for Postgres since it runs against a live database. Neo4j Community edition's &lt;code&gt;neo4j-admin database dump&lt;/code&gt; requires stopping the container first, so volume-level snapshots are more practical for zero-downtime backups. If you're running on AWS, scheduled EBS snapshots handle both databases at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to AWS from the CLI
&lt;/h2&gt;

&lt;p&gt;The local stack translates to a cloud server with almost no changes. EC2 is the most direct path since you're running the same &lt;code&gt;docker compose up&lt;/code&gt; on a remote machine instead of your laptop.&lt;/p&gt;

&lt;p&gt;A t3.medium instance, 2 vCPU and 4 GB RAM at roughly $30 per month on demand, is the minimum for Neo4j, pgvector, and the API server running side by side. If you expect steady traffic or want room for Ollama models, a t3.large with 8 GB RAM gives more headroom.&lt;/p&gt;

&lt;p&gt;Launch an instance with Amazon Linux 2023 or Ubuntu and a security group that allows only SSH on port 22 restricted to your IP and HTTPS on port 443. Then SSH in and install Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Amazon Linux 2023&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; docker
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; docker
&lt;span class="nb"&gt;sudo &lt;/span&gt;usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; docker &lt;span class="nv"&gt;$USER&lt;/span&gt;
newgrp docker  &lt;span class="c"&gt;# activates the group without requiring a full logout&lt;/span&gt;

&lt;span class="c"&gt;# Install Compose plugin&lt;/span&gt;
&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /usr/local/lib/docker/cli-plugins
&lt;span class="nb"&gt;sudo &lt;/span&gt;curl &lt;span class="nt"&gt;-SL&lt;/span&gt; https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/local/lib/docker/cli-plugins/docker-compose
&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; +x /usr/local/lib/docker/cli-plugins/docker-compose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy your three files, &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;Dockerfile&lt;/code&gt;, and &lt;code&gt;docker-compose.yaml&lt;/code&gt;, to the instance with &lt;code&gt;scp&lt;/code&gt; or pull them from a private repo. Then bring the stack up the same way you did locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8888/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the security group locked to ports 22 and 443, Postgres on 8432 and Neo4j on 8474 and 8687 stay unreachable from the outside. The compose network handles service-to-service traffic internally, and only the reverse proxy on port 443 faces the internet.&lt;/p&gt;

&lt;p&gt;For persistent storage, Docker volumes live on the instance's root EBS volume by default. That's fine for getting started, but a dedicated EBS volume mounted at &lt;code&gt;/var/lib/docker/volumes&lt;/code&gt; gives you independent snapshots and the ability to resize storage without touching the OS disk. Schedule EBS snapshots through AWS Backup or a cron job calling the AWS CLI.&lt;/p&gt;

&lt;p&gt;If you'd rather have AWS manage the orchestration, Elastic Beanstalk accepts a &lt;code&gt;docker-compose.yaml&lt;/code&gt; directly as a deployment artifact. ECS with Fargate is another option, though it's designed for stateless containers. The stateful databases with their persistent volume requirements make Fargate awkward compared to a straightforward EC2 instance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy and own your memory stack
&lt;/h2&gt;

&lt;p&gt;You now have a three-container memory system with a REST API, vector search, and a knowledge graph, all running on your infrastructure. With the hardening steps applied and the stack on EC2, it's ready for real traffic behind a reverse proxy.&lt;/p&gt;

&lt;p&gt;If your use case is personal rather than server-side, &lt;a href="https://github.com/mem0ai/mem0/tree/main/openmemory" rel="noopener noreferrer"&gt;OpenMemory MCP&lt;/a&gt; is worth a look. It runs Mem0 as a local MCP server that gives memory to coding tools like Cursor and Claude Desktop without needing a cloud deployment.&lt;/p&gt;

&lt;p&gt;For the full list of supported vector stores, graph backends, LLM providers, and config options, see the &lt;a href="https://docs.mem0.ai/open-source/overview" rel="noopener noreferrer"&gt;Mem0 open-source docs&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mem0</category>
      <category>aimemory</category>
      <category>localhackday</category>
    </item>
    <item>
      <title>Anthropic Claude Pricing: Subscription Plans and API Costs</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Mon, 23 Feb 2026 13:48:23 +0000</pubDate>
      <link>https://dev.to/mem0/anthropic-claude-pricing-subscription-plans-and-api-costs-hie</link>
      <guid>https://dev.to/mem0/anthropic-claude-pricing-subscription-plans-and-api-costs-hie</guid>
      <description>&lt;p&gt;You're building something on Claude. It's working. Then usage picks up, you check your bill, and the number is twice what you expected. Token costs compound fast - and if you don't understand exactly how Claude charges for input, output, caching, and long-context requests, you'll keep getting surprised.&lt;/p&gt;

&lt;p&gt;This guide breaks down every Claude pricing tier: subscription plans for individuals and teams, API token costs for developers, and the specific mechanics behind caching, batch processing, and tool usage. It also covers where those costs come from at the code level - and what you can do to bring them down.&lt;/p&gt;

&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Claude has two pricing models: flat-rate subscriptions for chat users, and pay-per-token API pricing for developers.&lt;/li&gt;
&lt;li&gt;Free: Basic access with rate limits (roughly 10-15 messages per session, per community reports).&lt;/li&gt;
&lt;li&gt;Pro: $20/month ($17 billed annually) - approximately 5x more usage than Free.&lt;/li&gt;
&lt;li&gt;Max: $100/month (5x Pro usage) or $200/month (20x Pro usage).&lt;/li&gt;
&lt;li&gt;Team: $25/seat/month standard ($20 annual); $125/seat/month premium ($100 annual).&lt;/li&gt;
&lt;li&gt;API: Haiku 4.5 at $1/$5 per MTok, Sonnet 4.5/4.6 at $3/$15, Opus 4.5/4.6 at $5/$25. Prompt caching cuts costs by up to 70-90% on repeated context.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Do Claude's Subscription Plans Compare at a Glance?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;Team&lt;/td&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$17-$20/month&lt;/td&gt;
&lt;td&gt;$100-$200/month&lt;/td&gt;
&lt;td&gt;$20-$25/seat/month or $100-$125/seat/month&lt;/td&gt;
&lt;td&gt;Starts at ~$20/seat/month; usage billed at API rates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Usage&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;More than Free&lt;/td&gt;
&lt;td&gt;5x or 20x more than Pro&lt;/td&gt;
&lt;td&gt;More than Pro&lt;/td&gt;
&lt;td&gt;Pooled across org&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Key features&lt;/td&gt;
&lt;td&gt;Chat (web, mobile, desktop), data visualization, web search, Slack/Google Workspace, MCP, extended thinking&lt;/td&gt;
&lt;td&gt;Claude Code, Cowork, unlimited projects, Research access, cross-conversation memory, Claude in Excel and Chrome&lt;/td&gt;
&lt;td&gt;Higher output limits, early feature access, priority at peak traffic, Claude in PowerPoint&lt;/td&gt;
&lt;td&gt;Everything in Max, central billing, admin controls, org-wide search, no model training on your data by default&lt;/td&gt;
&lt;td&gt;500k context window, RBAC, audit logs, SCIM, HIPAA-ready option, compliance API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models&lt;/td&gt;
&lt;td&gt;Sonnet and Haiku&lt;/td&gt;
&lt;td&gt;Sonnet, Opus, and Haiku&lt;/td&gt;
&lt;td&gt;Sonnet, Opus, and Haiku&lt;/td&gt;
&lt;td&gt;Sonnet, Opus, and Haiku&lt;/td&gt;
&lt;td&gt;Sonnet, Opus, and Haiku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Simple tasks and lightweight use&lt;/td&gt;
&lt;td&gt;Individual professionals and power users&lt;/td&gt;
&lt;td&gt;Users who hit Pro limits daily&lt;/td&gt;
&lt;td&gt;Small teams of 5-75&lt;/td&gt;
&lt;td&gt;Large organizations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Does Claude Cost for Individual Users?
&lt;/h2&gt;

&lt;p&gt;Claude's individual pricing is subscription-based, tiered across Free, Pro, and Max. Here's what each level actually gives you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Free Plan
&lt;/h3&gt;

&lt;p&gt;The Free plan covers basic usage: Sonnet and Haiku access via web, mobile, or desktop, along with image analysis, file creation, code execution, and web search. If you need a lightweight assistant for occasional tasks, this may be all you need.&lt;/p&gt;

&lt;p&gt;The plan is rate-limited. Based on community reporting, users typically hit limits after roughly 10-15 messages per session, depending on message length, file sizes, and conversation depth. Anthropic does not publish exact limits, but the in-app usage monitor gives you a live read on where you stand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrnarts4cu72b94c627f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrnarts4cu72b94c627f.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Pro Plan
&lt;/h3&gt;

&lt;p&gt;Pro runs $20/month (or $17/month billed annually at $200 upfront). It includes access to Opus, Sonnet, and Haiku, along with Claude Code, Cowork, unlimited projects, Research access, cross-conversation memory, and Claude in Excel and Chrome.&lt;/p&gt;

&lt;p&gt;The usage ceiling is meaningfully higher than Free. Community reports put it at around 45 prompts per session before throttling, with a reset window of approximately 5 hours. These figures are user-reported, not confirmed by Anthropic. The in-app usage monitor is the most reliable guide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Max Plan
&lt;/h3&gt;

&lt;p&gt;Max comes in two tiers. At $100/month, you get 5x the usage of Pro. At $200/month, you get 20x. Both tiers include higher output limits, priority access during peak traffic, early access to new features, and Claude in PowerPoint.&lt;/p&gt;

&lt;p&gt;For developers and researchers burning through Pro limits daily, Max is often cheaper than the equivalent API usage - especially at the 20x tier when you're running multiple long-context sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does Claude Cost for Teams and Enterprises?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Team Plan
&lt;/h3&gt;

&lt;p&gt;The Team plan is built for organizations of 5 to 75 users who need centralized management without the overhead of a full enterprise deployment.&lt;/p&gt;

&lt;p&gt;The Standard seat tier costs $25/seat/month ($20 billed annually) and includes everything in Max plus central billing, admin controls, org-wide enterprise search, and a default no-training-on-your-data policy. The Premium seat tier runs $125/seat/month ($100 annually) with 5x the usage of the standard tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise Plan
&lt;/h3&gt;

&lt;p&gt;Enterprise is for large-scale deployments. It adds everything in Team plus an enhanced 500k context window, SCIM provisioning, audit logs, compliance API, custom data retention, IP allowlisting, role-based and network-level access controls, and a HIPAA-ready option.&lt;/p&gt;

&lt;p&gt;Anthropic does not publish Enterprise pricing. Community reports suggest a minimum of around $60/seat with a 70-user floor - but treat those numbers as anecdotal. Contact the &lt;a href="https://claude.com/contact-sales" rel="noopener noreferrer"&gt;Anthropic sales team&lt;/a&gt; for actual figures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Education Plan
&lt;/h3&gt;

&lt;p&gt;Anthropic offers a discounted plan for universities and educational institutions, covering students, faculty, and staff. Details and access require reaching out to the &lt;a href="https://claude.com/contact-sales/education-plan" rel="noopener noreferrer"&gt;Anthropic education team&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Claude API Pricing Work?
&lt;/h2&gt;

&lt;p&gt;API billing is token-based. A token is roughly 4 characters or 0.75 words in English. For practical reference:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100 tokens ~ 75 words&lt;/li&gt;
&lt;li&gt;1-2 sentences ~ 30 tokens&lt;/li&gt;
&lt;li&gt;1 paragraph ~ 150 tokens&lt;/li&gt;
&lt;li&gt;1M tokens ~ 750,000 words&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Input tokens (your prompt) and output tokens (Claude's response) are billed separately and both count toward Claude's context window. The standard context window is 200k tokens. Opus 4.5/4.6 and Sonnet 4.5/4.6/4 support up to 1M tokens in Team and Enterprise plans.&lt;/p&gt;

&lt;p&gt;If you send 50k input tokens, Claude has up to 150k tokens available for output within the 200k standard window.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is the Model Pricing Breakdown for Opus, Sonnet, and Haiku?
&lt;/h3&gt;

&lt;p&gt;The table below shows input/output pricing for all current Claude models across standard (200k or fewer input tokens) and long-context (more than 200k input tokens) requests. MTok = million tokens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Input (≤200k)&lt;/td&gt;
&lt;td&gt;Input (&amp;gt;200k)&lt;/td&gt;
&lt;td&gt;Output (≤200k)&lt;/td&gt;
&lt;td&gt;Output (&amp;gt;200k)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.5/4.6&lt;/td&gt;
&lt;td&gt;$5/MTok&lt;/td&gt;
&lt;td&gt;$10/MTok&lt;/td&gt;
&lt;td&gt;$25/MTok&lt;/td&gt;
&lt;td&gt;$37.50/MTok&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.5/4.6&lt;/td&gt;
&lt;td&gt;$3/MTok&lt;/td&gt;
&lt;td&gt;$6/MTok&lt;/td&gt;
&lt;td&gt;$15/MTok&lt;/td&gt;
&lt;td&gt;$22.50/MTok&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haiku 4.5&lt;/td&gt;
&lt;td&gt;$1/MTok&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;$5/MTok&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 200k threshold is based on input tokens only. If your input exceeds 200k, all tokens in that request - input and output - shift to premium rates.&lt;/p&gt;

&lt;p&gt;Example: A request using Sonnet 4.6 with 250k input tokens and 5k output tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 250k × $6/MTok = $1.50&lt;/li&gt;
&lt;li&gt;Output: 5k × $22.50/MTok = $0.11&lt;/li&gt;
&lt;li&gt;Total: $1.61&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same request capped at 200k input tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 200k × $3/MTok = $0.60&lt;/li&gt;
&lt;li&gt;Output: 5k × $15/MTok = $0.08&lt;/li&gt;
&lt;li&gt;Total: $0.68&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crossing the 200k threshold on input doubled the total cost despite only a 25% increase in input size. Staying under the threshold where possible is one of the more effective cost controls available.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Prompt Caching Reduce API Costs?
&lt;/h2&gt;

&lt;p&gt;Prompt caching lets Claude reuse a stored prefix from prior requests instead of reprocessing it from scratch. This cuts both processing time and cost on repetitive tasks - repeated system prompts, multi-turn conversations, and document analysis pipelines all benefit significantly.&lt;/p&gt;

&lt;p&gt;By default, cached content has a 5-minute TTL (time-to-live), refreshed for free each time the cached content is used. A 1-hour cache option is available for content accessed in longer intervals.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Models&lt;/td&gt;
&lt;td&gt;5m Cache Write (1.25x input)&lt;/td&gt;
&lt;td&gt;1h Cache Write (2x input)&lt;/td&gt;
&lt;td&gt;Cache Read (0.1x output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.5/4.6&lt;/td&gt;
&lt;td&gt;$6.25/MTok&lt;/td&gt;
&lt;td&gt;$10/MTok&lt;/td&gt;
&lt;td&gt;$0.50/MTok&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.5/4.6&lt;/td&gt;
&lt;td&gt;$3.75/MTok&lt;/td&gt;
&lt;td&gt;$6/MTok&lt;/td&gt;
&lt;td&gt;$0.30/MTok&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haiku 4.5&lt;/td&gt;
&lt;td&gt;$1.25/MTok&lt;/td&gt;
&lt;td&gt;$2/MTok&lt;/td&gt;
&lt;td&gt;$0.10/MTok&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Scenario: Legal document analysis. A law firm analyzes a 150k-token contract with 10 queries (2k tokens each) over 2 hours using Sonnet 4.6 with 1-hour caching. The first request costs $0.91 (150k × $6/MTok cache write + 2k × $3/MTok query). Each following request costs $0.05 (150k × $0.30/MTok cache read + 2k input). Total across 10 queries: $1.37, versus $4.56 without caching - a 70% reduction. The 1-hour TTL was the right choice here because a 30-minute gap between queries would have expired a 5-minute cache and forced a full rewrite.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does Batch Processing Cost?
&lt;/h2&gt;

&lt;p&gt;Batch processing handles large volumes of requests asynchronously through the Message Batches API. Instead of submitting requests one at a time, you submit them in bulk and receive responses when the full batch is complete. This suits content processing, data extraction, and classification tasks well.&lt;/p&gt;

&lt;p&gt;Batch API pricing is 50% of standard API rates. For maximum savings, batch processing can be combined with prompt caching.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t875zu2tbspbnv8yk3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t875zu2tbspbnv8yk3i.png" alt=" " width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Do Claude's Tools and Extras Cost?
&lt;/h2&gt;

&lt;p&gt;Some tools carry additional costs on top of base API rates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast mode for Claude Opus 4.6 delivers faster output at 6x standard rates.&lt;/li&gt;
&lt;li&gt;Client-side tools add tokens automatically: bash (+245 tokens), text editor (+700 tokens), computer use (+735 tokens plus 466-499 system prompt tokens). All billed at standard base rates.&lt;/li&gt;
&lt;li&gt;Web Fetch (server-side) has no additional cost. You pay standard rates for the fetched content.&lt;/li&gt;
&lt;li&gt;Web search costs $10 per 1,000 searches, plus standard token costs for search-generated content.&lt;/li&gt;
&lt;li&gt;Code execution includes 1,550 free hours per month. Beyond that, it's $0.05/hour per container with a 5-minute minimum billing window. Pre-loading files triggers billing even if the tool is never called.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Do Token Costs Compound at the Code Level?
&lt;/h2&gt;

&lt;p&gt;Even with a solid understanding of the pricing tables, costs can spiral in ways that aren't immediately obvious. The problem is usually context. Every session loads system prompts, prior conversation state, and any persistent instructions - and all of that counts as input tokens before Claude generates a single word of output.&lt;/p&gt;

&lt;p&gt;For developers building on Claude Code, this compounds further. Claude Code's auto-memory feature records learnings and patterns during task execution, and its Claude.md files accumulate instructions across conversations. Both are loaded at session startup. On large projects, these files grow large, and a significant portion of every session's token budget gets consumed before the actual work begins. As those files grow, so does your bill - silently, and on every session.&lt;/p&gt;

&lt;p&gt;This is the &lt;a href="https://mem0.ai/blog/why-stateless-agents-fail-at-personalization" rel="noopener noreferrer"&gt;core failure mode of stateless AI agents&lt;/a&gt;: without intelligent memory management, agents load everything they've ever known instead of retrieving what's actually relevant. The longer a project runs, the worse the overhead becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Mem0 Reduce Claude Token Usage?
&lt;/h2&gt;

&lt;p&gt;Mem0 is a &lt;a href="https://mem0.ai/blog/what-is-ai-agent-memory" rel="noopener noreferrer"&gt;memory layer for AI applications&lt;/a&gt; that replaces full-context loading with targeted retrieval. Rather than storing full conversation transcripts and reloading them entirely each session, Mem0 extracts high-signal facts and stores them in a structured memory store. At query time, Mem0 retrieves only the memories relevant to that specific query - not everything, just what matters.&lt;/p&gt;

&lt;p&gt;The result is that each session starts with a much smaller, more relevant context. Per &lt;a href="https://arxiv.org/html/2504.19413v1" rel="noopener noreferrer"&gt;Mem0's research paper&lt;/a&gt;, this approach reduces token usage by up to 90% compared to full-context retrieval methods - not by discarding information, but by being precise about what gets loaded and when.&lt;/p&gt;

&lt;p&gt;For Claude Code specifically, Mem0 replaces the growing Claude.md and auto-memory files with a persistent, queryable memory store. You can set up &lt;a href="https://mem0.ai/blog/claude-code-memory" rel="noopener noreferrer"&gt;persistent memory for Claude Code&lt;/a&gt; in about five minutes.&lt;/p&gt;

&lt;p&gt;The pattern generalizes. Whether you're building a &lt;a href="https://mem0.ai/blog/context-aware-chatbots-with-ai-memory" rel="noopener noreferrer"&gt;context-aware chatbot&lt;/a&gt;, a &lt;a href="https://mem0.ai/blog/agentic-rag-chatbot-with-memory" rel="noopener noreferrer"&gt;multi-turn agentic RAG system&lt;/a&gt;, or navigating the tradeoffs between &lt;a href="https://mem0.ai/blog/short-term-vs-long-term-memory-in-ai" rel="noopener noreferrer"&gt;short- and long-term memory&lt;/a&gt; across agent sessions, the same principle applies: load less, retrieve smarter, spend less.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does Mem0 Look Like in Production?
&lt;/h2&gt;

&lt;p&gt;Three case studies show how this plays out at different scales.&lt;/p&gt;

&lt;p&gt;OpenNote &lt;a href="https://mem0.ai/blog/how-opennote-scaled-personalized-visual-learning-with-mem0-while-reducing-token-costs-by-40" rel="noopener noreferrer"&gt;reduced token costs by 40%&lt;/a&gt; by replacing full conversation context with Mem0's selective retrieval. Users got more personalized responses. The platform spent less per query.&lt;/p&gt;

&lt;p&gt;RevisionDojo saw a &lt;a href="https://mem0.ai/blog/how-revisiondojo-enhanced-personalized-learning-with-mem0" rel="noopener noreferrer"&gt;similar 40% token reduction&lt;/a&gt;, with the added benefit that the AI tutor retained user-specific learning patterns across sessions without reloading full history every time.&lt;/p&gt;

&lt;p&gt;Sunflower &lt;a href="https://mem0.ai/blog/how-sunflower-scaled-personalized-recovery-support-to-80-000-users-with-mem0" rel="noopener noreferrer"&gt;scaled to 80,000 users&lt;/a&gt; on a recovery support platform where personalization was non-negotiable. Mem0 made per-user memory practical at that volume without the cost structure blowing out.&lt;/p&gt;

&lt;p&gt;For teams evaluating memory solutions, Mem0's &lt;a href="https://mem0.ai/blog/benchmarked-openai-memory-vs-langmem-vs-memgpt-vs-mem0-for-long-term-memory-here-s-how-they-stacked-up" rel="noopener noreferrer"&gt;benchmark against OpenAI Memory, LangMem, and MemGPT&lt;/a&gt; on the LOCOMO dataset is the most rigorous head-to-head available.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Claude Pricing Compare to ChatGPT and Gemini?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Category&lt;/td&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Google Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic subscription&lt;/td&gt;
&lt;td&gt;Pro: $20/month&lt;/td&gt;
&lt;td&gt;Plus: $20/month&lt;/td&gt;
&lt;td&gt;Pro: $20/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium subscription&lt;/td&gt;
&lt;td&gt;Max 5x: $100/month; Max 20x: $200/month&lt;/td&gt;
&lt;td&gt;Pro: $200/month&lt;/td&gt;
&lt;td&gt;Ultra: ~$42/month ($125/3 months)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-tier API&lt;/td&gt;
&lt;td&gt;Sonnet 4.5/4.6: $3/$15&lt;/td&gt;
&lt;td&gt;GPT-5.2: $1.75/$14.00&lt;/td&gt;
&lt;td&gt;Gemini Flash: $0.50/$3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flagship API&lt;/td&gt;
&lt;td&gt;Opus 4.6: $5/$25&lt;/td&gt;
&lt;td&gt;GPT-5.2 Pro: $21/$168&lt;/td&gt;
&lt;td&gt;Gemini Pro: $2/$12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude's flagship Opus 4.6 is substantially cheaper than OpenAI's flagship on the API. Sonnet is competitive in the mid-tier. Haiku undercuts most budget models. The 200k context pricing premium is specific to Claude - neither OpenAI nor Gemini structures long-context pricing the same way, so factor that in when modeling costs for long-document workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Do You Choose the Right Claude Plan?
&lt;/h2&gt;

&lt;p&gt;For most individuals, the decision comes down to usage volume. The Free plan works for casual use. Pro handles most professional workloads. If you're hitting Pro's limits daily, Max 5x ($100/month) is the next step - and at heavy usage, Max 20x ($200/month) is often cheaper than the equivalent API spend.&lt;/p&gt;

&lt;p&gt;For teams, the Standard Team seat ($25/month) is the entry point. The Premium seat tier ($125/month) makes sense when your team runs workloads that would otherwise require individual Max subscriptions.&lt;/p&gt;

&lt;p&gt;For developers using the API: at moderate usage, Sonnet 4.5 or 4.6 at $3/$15 per MTok is the most cost-effective entry point for serious work. Combine it with prompt caching and batch processing, and the effective per-token cost drops substantially. For teams consistently processing millions of tokens per day, the Max 20x plan at $200/month frequently undercuts the API equivalent - run the math against your specific usage pattern before defaulting to API access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4dj84ru3om2v3rtsytp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq4dj84ru3om2v3rtsytp.png" alt=" " width="800" height="252"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Claude's pricing structure is straightforward in outline but has meaningful complexity in the details - particularly the 200k context threshold, caching mechanics, and the way session overhead accumulates at the code level. Subscription tiers run from Free to Max at $200/month for individuals, with Team and Enterprise plans for organizations. API pricing has dropped substantially with each model generation: Opus went from $15/$75 to $5/$25 per million tokens. Sonnet sits at $3/$15. Haiku at $1/$5.&lt;/p&gt;

&lt;p&gt;The most overlooked cost driver is context loading. Every session that loads a full conversation history or a large instructions file spends tokens before doing any real work. Managing what gets loaded - through prompt caching, batch processing, and tools like Mem0 - is where meaningful cost reduction actually happens.&lt;/p&gt;

</description>
      <category>mem0</category>
      <category>claudeai</category>
      <category>openai</category>
      <category>anthropic</category>
    </item>
    <item>
      <title>Add Persistent Memory to Claude Code with Mem0 (5-Minute Setup)</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Fri, 06 Feb 2026 19:50:19 +0000</pubDate>
      <link>https://dev.to/mem0/we-built-a-memory-plugin-for-openclawmoltbot-7e1</link>
      <guid>https://dev.to/mem0/we-built-a-memory-plugin-for-openclawmoltbot-7e1</guid>
      <description>&lt;p&gt;Claude Code is a phenomenal piece of technology. But it is affected by the same problem every LLM is affected by which is lack of memory.&lt;/p&gt;

&lt;p&gt;I’ll walk you through the steps to add a persistent memory layer to Claude Code with Mem0, covering both CLI and desktop versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Add Memory to Claude Code?
&lt;/h2&gt;

&lt;p&gt;Everytime you start a Claude Code session, you need to explain your project architecture, re-state your coding preferences, and re-describe bugs you’ve already fixed. This repetition wastes time and tokens.&lt;/p&gt;

&lt;p&gt;I read a &lt;a href="https://news.ycombinator.com/item?id=46126066" rel="noopener noreferrer"&gt;Hacker News discussion&lt;/a&gt; where a developer measured how the baseline task took 10-11 minutes with 3+ exploration agents launched. But with memory context injection, the same task was completed in 1-2 minutes with zero exploration agents.&lt;/p&gt;

&lt;p&gt;We saw similar results with our &lt;a href="https://mem0.ai/research" rel="noopener noreferrer"&gt;internal benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1eyz4lz6eixnu5e5zpo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1eyz4lz6eixnu5e5zpo.png" alt="Performance benchmarks for Mem0, AI memory for smarter agents" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When testing agents with and without memory, &lt;a href="https://github.com/mem0ai/mem0" rel="noopener noreferrer"&gt;agents with Mem0 implementation&lt;/a&gt; showed 90% lower token usage and 91% faster responses compared to full-context approaches.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;AI memory systems like Mem0 extract key facts from conversations, store them in searchable vector databases, and inject relevant context into future sessions automatically.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How To Implement Mem0 In Claude Code
&lt;/h2&gt;

&lt;p&gt;You can integrate persistent memory in Claude Code using the official &lt;a href="https://docs.mem0.ai/platform/mem0-mcp" rel="noopener noreferrer"&gt;Mem0 MCP server&lt;/a&gt;. Here’s a walkthrough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Install the &lt;a href="https://docs.mem0.ai/open-source/python-quickstart" rel="noopener noreferrer"&gt;Mem0 Python SDK&lt;/a&gt; (requires Python 3.9+, recommend 3.10+): &lt;strong&gt;pip3 install mem0ai&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Get your API key from&lt;/strong&gt; &lt;a href="https://app.mem0.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;app.mem0.ai&lt;/strong&gt;&lt;/a&gt;. The free tier includes 10,000 memories and 1,000 retrieval calls per month.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The MCP (Model Context Protocol) approach works across both CLI and desktop versions with identical configuration. MCP is Anthropic's open standard for AI-tool integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install the MCP server
&lt;/h3&gt;

&lt;p&gt;First, we need to install the Mem0 MCP server. It’s available as a pip package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;pip3&lt;/span&gt; &lt;span class="nx"&gt;install&lt;/span&gt; &lt;span class="nx"&gt;mem0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, check where the package was installed using the below command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;which&lt;/span&gt; &lt;span class="nx"&gt;mem0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep the path you see here noted somewhere, we’ll need it for the next step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2a: Configure for Claude Code CLI
&lt;/h3&gt;

&lt;p&gt;Create or edit &lt;strong&gt;.mcp.json&lt;/strong&gt; in your project root for team-shared configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcpServers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mem0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;command&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FULL PATH TO MCP SERVER FOUND FROM which COMMAND&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;args&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;${MEM0_API_KEY}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MEM0_DEFAULT_USER_ID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key as an environment variable or in your shell profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="nx"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;m0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2b: Configure for Claude Code Desktop
&lt;/h3&gt;

&lt;p&gt;If you want the Mem0 MCP server to work with Claude Code Desktop, you’ll need to edit the ~/.claude.json file. Edit the JSON file with vim or another editor of your choice (is there another choice?) and add the mem0 server entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcpServers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mem0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;command&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FULL PATH TO MCP SERVER FOUND FROM which COMMAND&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;args&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;${MEM0_API_KEY}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MEM0_DEFAULT_USER_ID&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once done, restart Claude Code. On the CLI app, run the /mcp command to see if the MCP server is connected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5ztksi3noyjzu1rmfzr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5ztksi3noyjzu1rmfzr.png" alt="Claude code with Mem0 memory layer MCP" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the desktop app, hit the plus icon and you should see the MCP listed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Enable graph memory (optional)
&lt;/h3&gt;

&lt;p&gt;Depending on how you use Claude, you may need relationship-aware memory that tracks connections between entities. If you need that, add the following line in the MCP server configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;MEM0_ENABLE_GRAPH_DEFAULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://docs.mem0.ai/open-source/features/graph-memory" rel="noopener noreferrer"&gt;Mem0’s graph memory&lt;/a&gt; improves accuracy for multi-hop reasoning but requires the Pro plan ($249/month).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Test Mem0 with Claude
&lt;/h3&gt;

&lt;p&gt;Start chatting with Claude code while your Mem0 MCP is connected. It will automatically create new memories that will be inserted as part of the context whenever they’re needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtgm6pnhq8n95l5fm1st.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtgm6pnhq8n95l5fm1st.png" alt="Demonstration of Claude Code saving a memory to Mem0." width="800" height="678"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it. Next time you ask Claude to write code, it will fetch your memory and use that information to write code exactly as you prefer.&lt;/p&gt;

&lt;p&gt;The best part is, memory turns your Claude Code into an ever evolving agent and after a while, it starts to feel so personal, it’s hard to use anything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring CLAUDE.md For Memory Context
&lt;/h2&gt;

&lt;p&gt;Claude Code automatically reads CLAUDE.md files at session start. You can use this setup to create a structured memory file at ~/.claude/CLAUDE.md for global preferences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Developer Profile&lt;/span&gt;

&lt;span class="gu"&gt;## Coding Preferences&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; TypeScript over JavaScript
&lt;span class="p"&gt;-&lt;/span&gt; 2-space indentation
&lt;span class="p"&gt;-&lt;/span&gt; Functional React components
&lt;span class="p"&gt;-&lt;/span&gt; Zod for validation

&lt;span class="gu"&gt;## Common Patterns&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All API routes use middleware for auth
&lt;span class="p"&gt;-&lt;/span&gt; Database calls through repository pattern
&lt;span class="p"&gt;-&lt;/span&gt; Error boundaries on all route components

&lt;span class="gu"&gt;## MCP Servers Available&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; mem0: Use this AI memory for storing and retrieving long-term context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create project-specific files at ./CLAUDE.md in each repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: E-Commerce Platform&lt;/span&gt;

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Next.js 16 with App Router
&lt;span class="p"&gt;-&lt;/span&gt; Supabase for database and auth
&lt;span class="p"&gt;-&lt;/span&gt; Stripe for payments

&lt;span class="gu"&gt;## Key Decisions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Server components for all SEO pages
&lt;span class="p"&gt;-&lt;/span&gt; Row-level security instead of API auth
&lt;span class="p"&gt;-&lt;/span&gt; Tailwind only, no CSS modules

&lt;span class="gu"&gt;## Current Status&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Auth: complete
&lt;span class="p"&gt;-&lt;/span&gt; Product catalog: complete
&lt;span class="p"&gt;-&lt;/span&gt; Checkout: in progress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Mem0 MCP Tools Available After Setup
&lt;/h2&gt;

&lt;p&gt;Once you configure the Mem0 MCP server, here are the tools that are available to Claude&lt;/p&gt;

&lt;p&gt;Tool&lt;/p&gt;

&lt;p&gt;Function&lt;/p&gt;

&lt;p&gt;&lt;code&gt;add_memory&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Store new memories with user/agent scope&lt;/p&gt;

&lt;p&gt;&lt;code&gt;search_memories&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Semantic search across stored memories&lt;/p&gt;

&lt;p&gt;&lt;code&gt;get_memories&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;List all memories with optional filters&lt;/p&gt;

&lt;p&gt;&lt;code&gt;update_memory&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Modify existing memory by ID&lt;/p&gt;

&lt;p&gt;&lt;code&gt;delete_memory&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Remove specific memory&lt;/p&gt;

&lt;p&gt;&lt;code&gt;delete_all_memories&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Bulk delete by scope&lt;/p&gt;

&lt;p&gt;Use natural language to invoke: "Remember that this project uses PostgreSQL with Prisma" or "What do you know about our authentication setup?"&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Persistent Memory Improve Claude Code Workflows
&lt;/h2&gt;

&lt;p&gt;If you’ve just implemented memory, you’re not going to see the benefits immediately. But let me show you a hypothetical demonstration of what your workflows would look like with and without memory implementation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Without Memory: Debugging Authentication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 1:&lt;/strong&gt; You explain the auth system uses NextAuth with Google and email providers, that tokens expire after 24 hours, and that the refresh logic lives in /lib/auth/refresh.ts. You debug an issue where tokens aren't refreshing properly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 2:&lt;/strong&gt; You re-explain the entire auth setup. Claude suggests checking token expiration, which you already know is 24 hours. You spend the first 10 minutes re-establishing context before making progress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 3:&lt;/strong&gt; The refresh bug resurfaces in a different form. You've forgotten the specific edge case you discovered in Session 1. You debug from scratch.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  With Memory: Debugging Authentication
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 1:&lt;/strong&gt; Same debugging process, but Claude automatically stores: "Auth uses NextAuth with Google/email. Tokens expire 24h. Refresh logic in /lib/auth/refresh.ts. Found edge case: refresh fails when token expires during active request."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 2:&lt;/strong&gt; When you ask a question like “Let’s continue on the auth logic fix” It asks directly: "Is this related to the token refresh edge case we found, where refresh fails during active requests?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session 3:&lt;/strong&gt; Claude immediately recalls the edge case pattern and checks if the new issue follows the same pattern.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Preference Retention
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Without memory:&lt;/strong&gt; Every session, Claude generates code with its default style unless you’ve specified it in a static CLAUDE.md file. You repeatedly correct: "Use arrow functions" or "I prefer explicit return types” or have to edit the markdown initialization file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With memory:&lt;/strong&gt; You state preferences once. Claude stores: "Prefers arrow functions, explicit TypeScript return types, 2-space indent." Future sessions generate code matching your style from the first prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Add Cross-Session Project Context to Claude Code with Mem0
&lt;/h3&gt;

&lt;p&gt;Over time, you can instruct Claude to update CLAUDE.md with repeating patterns from conversations and from memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Next.js 16 app router with Supabase backend
&lt;span class="p"&gt;-&lt;/span&gt; Auth via NextAuth with Google and email providers

&lt;span class="gu"&gt;## Patterns &amp;amp; Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All API routes use zod validation
&lt;span class="p"&gt;-&lt;/span&gt; Tailwind only, no CSS modules

&lt;span class="gu"&gt;## Gotchas &amp;amp; Pitfalls&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; RLS policy requires user_id OR org_id, not both
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This context injection eliminated the "exploration phase" where Claude reads multiple files to understand project structure. Tasks that required &lt;strong&gt;3+ exploration agents&lt;/strong&gt; completed with &lt;strong&gt;zero exploration agents&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Mem0 as the AI Memory Layer For Claude Code
&lt;/h2&gt;

&lt;p&gt;Mem0 is a universal, AI memory layer that extracts, stores, and retrieves contextual information across sessions. And it’s tried and trusted by developers, with our GitHub repository boasting over &lt;strong&gt;46,000+ GitHub stars&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Mem0 uses a hybrid technical architecture: vector stores for semantic search, key-value stores for fast retrieval, and optional graph stores for relationship modeling.&lt;/p&gt;

&lt;p&gt;On the LOCOMO benchmark, Mem0 &lt;a href="https://mem0.ai/research" rel="noopener noreferrer"&gt;shows +26% accuracy&lt;/a&gt; over OpenAI's memory implementation.&lt;/p&gt;

&lt;p&gt;Mem0 offers both cloud-hosted and self-hosted AI memory deployment.&lt;/p&gt;

&lt;p&gt;Self-hosted installations use Qdrant for vector storage by default, with support for 24+ vector databases including PostgreSQL (pgvector), MongoDB, Pinecone, and Milvus. LLM providers supported include OpenAI, Anthropic, Ollama, Groq, and 16+ others.&lt;/p&gt;

&lt;p&gt;For compliance requirements, &lt;a href="https://mem0.ai/security" rel="noopener noreferrer"&gt;Mem0 is SOC 2 Type II certified&lt;/a&gt;, GDPR compliant, and offers HIPAA compliance on Enterprise plans. Bring Your Own Key (BYOK) support addresses data sovereignty concerns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foscrl3tci8oy6z2pm2r0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foscrl3tci8oy6z2pm2r0.png" alt="Mem0 security compliance documentation" width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://trust.mem0.ai/" rel="noopener noreferrer"&gt;Source&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Mem0 Python SDK provides async operations for high-throughput applications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncMemory&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncMemory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I prefer dark mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Memory scoping supports multiple organizational levels: user_id for personal memories, agent_id for bot-specific context, run_id for session isolation, and app_id for application-level defaults.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try Mem0, The Persistent AI Memory Layer for Agents
&lt;/h2&gt;

&lt;p&gt;Adding &lt;a href="https://mem0.ai/blog/build-persistent-memory-for-agentic-ai-applications-with-mem0-open-source-amazon-elasticache-for-valkey-and-amazon-neptune-analytics" rel="noopener noreferrer"&gt;persistent memory&lt;/a&gt; to Claude Code turns it from a stateless tool into a context-aware development partner. The implementation takes less than 5 minutes using the Mem0 MCP server approach, with free tier limits sufficient for individual developers.&lt;/p&gt;

&lt;p&gt;And you get 10x faster task completion for context-dependent work, 90% reduction in token usage, and elimination of the repetitive context-building phase that opens every session.&lt;/p&gt;

&lt;p&gt;If you’re building AI-native development workflows, memory is the foundation that makes everything else work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Why does Claude Code need persistent memory?
&lt;/h3&gt;

&lt;p&gt;Claude Code starts every session with zero context, forcing you to re-explain project architecture, coding preferences, and past debugging steps. Adding persistent memory eliminates this repetition, reducing token usage by 90% and speeding up task completion by allowing Claude to recall details from previous sessions immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. How do I add memory to Claude Code?
&lt;/h3&gt;

&lt;p&gt;You can add memory by installing the Mem0 MCP server. The process involves installing the &lt;code&gt;mem0-mcp-server&lt;/code&gt; package via pip, getting an API key from Mem0, and configuring your &lt;code&gt;.mcp.json&lt;/code&gt; (for CLI) or &lt;code&gt;~/.claude.json&lt;/code&gt; (for Desktop) file with the server details and your API key.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Does this work with both Claude Code CLI and Claude Desktop?
&lt;/h3&gt;

&lt;p&gt;Yes, the Mem0 MCP integration works for both. The configuration steps are nearly identical; you just need to update the specific JSON configuration file used by each interface (&lt;code&gt;.mcp.json&lt;/code&gt; for CLI project scope or &lt;code&gt;~/.claude.json&lt;/code&gt; for the Desktop app).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Is Mem0 free to use?
&lt;/h3&gt;

&lt;p&gt;Mem0 offers a free tier that includes 10,000 memories and 1,000 retrieval calls per month, which is sufficient for most individual developers. For advanced features like graph memory (relationship tracking), a Pro plan is available.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Can I control what Claude remembers?
&lt;/h3&gt;

&lt;p&gt;Yes. You can manage memory using natural language commands or MCP tools. For example, you can tell Claude to "Remember that we use PostgreSQL" or use tools like &lt;code&gt;delete_memory&lt;/code&gt; to remove outdated information. You can also configure scoping (user_id, agent_id) to isolate context.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>clawdbot</category>
      <category>moltbot</category>
      <category>aimemory</category>
    </item>
    <item>
      <title>How to Build Context-Aware Chatbots with Memory using Mem0</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Fri, 06 Feb 2026 12:26:06 +0000</pubDate>
      <link>https://dev.to/mem0/how-to-build-context-aware-chatbots-with-memory-using-mem0-io</link>
      <guid>https://dev.to/mem0/how-to-build-context-aware-chatbots-with-memory-using-mem0-io</guid>
      <description>&lt;p&gt;By default, every API call to an LLM is a fresh event. The model knows everything about the world up until its training cutoff, but it knows nothing about you, your preferences, or the conversation you had five minutes ago irrespective of how many times you repeat yourself.&lt;/p&gt;

&lt;p&gt;So, if you're planning to build an agent that feels truly intelligent, a better model is not enough. You need an agent with memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66ida49x80qnt2sp4wol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66ida49x80qnt2sp4wol.png" alt="Illustration showing a chatbot with question marks, representing stateless LLM interactions without memory" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Problem:&lt;/strong&gt; Every time you call an LLM API, it starts fresh with no memory of previous conversations. It's like talking to someone with amnesia.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why It Matters:&lt;/strong&gt; Sending entire conversation histories with every request gets expensive and slow. You need a smarter way to remember what's important.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG is for facts, AI Memory is for state&lt;/strong&gt;: RAG retrieves static knowledge. AI Memory must manage evolving user state, including updates and contradictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mem0 is the bridge&lt;/strong&gt;: Mem0 provides a managed &lt;a href="https://mem0.ai/" rel="noopener noreferrer"&gt;AI memory layer&lt;/a&gt; that handles extraction, retrieval, and preference updates so agents remain consistent over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is context retention?
&lt;/h2&gt;

&lt;p&gt;Context retention in AI engineering is the system architecture that enables a model to recall information from previous interactions and apply it to the current generation.&lt;/p&gt;

&lt;p&gt;It's often marketed as "Personalization" or "Long-term Recall," but let's strip away the buzzwords.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At its core, context retention is string manipulation and database queries. But the difficulty lies in deciding what to store and when to retrieve it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the lowest level, an LLM API call looks like this: &lt;code&gt;function(prompt) -&amp;gt; response&lt;/code&gt;. To give an LLM "memory," you're simply changing the function to: &lt;code&gt;function(retrieved_history + prompt) -&amp;gt; response&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The challenge is in engineering. You have to decide what information is worth storing, how it should be retrieved, how long it should persist, and when it needs to be updated or removed. These choices directly affect cost, latency, and model behavior.&lt;/p&gt;

&lt;p&gt;You need a retrieval policy for past messages. If you send the full history every time, you'd be overpaying for API tokens. If you send irrelevant history, you increase model hallucinations.&lt;/p&gt;

&lt;p&gt;That's why you need well-implemented context retention. Your agents should be able to store user state and only pull the most relevant memories based on the current user query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building memory: From Naive to Production-Ready
&lt;/h2&gt;

&lt;p&gt;We're going to build a chatbot that evolves from having no memory to having perfect recall. We'll start with the naive approach to see why it breaks, examine the architecture of stateful agents, and then implement a production-grade solution using Mem0.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 1: The naive approach (list appending)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first way every developer tries to solve memory is by keeping a Python list running in the application RAM. This is often called "Buffer Memory."&lt;/p&gt;

&lt;p&gt;Here's a simple script using Google's Gemini 3 Flash model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Gemini client
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# This list lives in RAM
&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Append user input to local list
&lt;/span&gt;    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}]})&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Send the WHOLE list to the LLM
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;}]})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;

&lt;span class="c1"&gt;# Simulation
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: Hi, I am Alex and I am vegan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Hi, I am Alex and I am vegan.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: What should I eat for dinner?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What should I eat for dinner?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The output:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1tsbxm0hz9lza0uvt6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc1tsbxm0hz9lza0uvt6c.png" alt="Terminal output showing chatbot responses using simple in-memory conversation history" width="800" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It works, but let's look closer at the mechanics.&lt;/p&gt;

&lt;p&gt;In the first call, we sent around 15 tokens. In the second call, we sent 60 tokens. By the 10th turn of the conversation, we’re sending thousands of tokens for every single request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0cre4pnxv7oexss9d18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0cre4pnxv7oexss9d18.png" alt="Graph showing exponential growth of tokens sent per request as conversation history accumulates" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This approach fails in production for three reasons:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: You pay for the entire history on every turn. With Gemini 3 Flash, a long conversation history can cost significant money over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: Processing long contexts takes time. Time to First Token (TTFT) degrades linearly with prompt size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt;: If the Python script crashes or the server restarts, conversation_history is wiped. The user is a stranger again.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 2: The architecture of persistent memory&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To fix the issues above, we need to move state out of the application RAM and into a durable storage layer.&lt;/p&gt;

&lt;p&gt;However, we can't just dump everything into a database and retrieve it all. We need a system that mimics human memory. When you talk to a friend, they don't recall every word you've ever said to them chronologically. They recall relevant information based on the current context.&lt;/p&gt;

&lt;p&gt;A proper memory architecture requires three components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: A place to keep data (Vector Database + Relational Database).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: A mechanism to find relevant data (Semantic Search).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management&lt;/strong&gt;: A way to update, delete, and resolve conflicts in data (Memory consolidation).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most developers try to build this stack themselves using LangChain and a raw vector database like Pinecone or Qdrant. They usually run into the &lt;strong&gt;"Update Problem."&lt;/strong&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;The update problem:&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monday&lt;/strong&gt;: User says "I love Python." → Vector DB stores embedding for "Loves Python".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuesday&lt;/strong&gt;: User says "I hate Python, I only use Go now." → Vector DB stores embedding for "Hates Python".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wednesday&lt;/strong&gt;: User asks "Write me a script." → Vector Search retrieves both conflicting memories. The LLM gets confused.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztnhox0uz8bp13ir3ten.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztnhox0uz8bp13ir3ten.png" alt="Diagram depicting the update problem where conflicting user preferences (loving Python on Monday vs. hating Python on Tuesday) cause retrieval confusion" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You need a management layer that understands entities and updates. Mem0 is one way to handle this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 3: Implementing production memory with Mem0&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's build a personalized travel assistant. The goal is for the bot to remember my preferences across different sessions without me repeating them, and to handle updates gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;mem0ai google-genai python-dotenv

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Setup your environment variables:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GEMINI_API_KEY=your_gemini_api_key_here
MEM0_API_KEY=your_mem0_api_key_here

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemini API key from &lt;a href="https://aistudio.google.com/app/apikey" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Mem0 API key from &lt;a href="https://app.mem0.ai/" rel="noopener noreferrer"&gt;app.mem0.ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step 1: Storing memory&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;First, let's initialize Mem0 and store some initial context. In a real app, this happens dynamically as the user chats, but we'll seed it manually here to demonstrate the storage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryClient&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the memory client
# You can get a key from [https://app.mem0.ai/](https://app.mem0.ai/)
&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traveler_01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Let's simulate a user telling us something in Session 1
&lt;/span&gt;&lt;span class="n"&gt;user_input_session_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I strictly fly business class, but I hate long layovers. I am planning a trip to Japan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# We add this to memory
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input_session_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Memory stored successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory stored successfully.
{'results': [{'message': 'Memory processing has been queued for background execution', 'status': 'PENDING', 'event_id': 'a26936bf-c15d-401f-a3b2-bcadd75d9611'}]}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftooa8kkemdpn7ccn407x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftooa8kkemdpn7ccn407x.png" alt="Terminal output showing memory being stored successfully" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftymezo3y53n3f0byxran.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftymezo3y53n3f0byxran.png" alt="Mem0 dashboard displaying extracted facts about user preferences: business class seating, layover dislikes, and Japan trip plans" width="800" height="785"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that with the current Mem0 API, memory processing happens asynchronously in the background. The memory extraction and storage is queued, and you can check the Mem0 dashboard to see the extracted facts once processing is complete.&lt;/p&gt;

&lt;p&gt;When you check the dashboard (as shown in the screenshots), you'll see Mem0 extracted specific facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The user strictly flies business class, dislikes long layovers, and is planning a trip to Japan."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is distinct from RAG, which splits documents into fixed chunks. By extracting facts, we make the memory more usable for the LLM.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step 2: Retrieving context&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Now, imagine the server restarts. A week passes. The user comes back. In a naive system, we would have to ask "Where do you want to go?" and "What is your budget?" again.&lt;/p&gt;

&lt;p&gt;With Mem0, we retrieve only the user-specific memories that are relevant to the current request before calling the LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryClient&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Search Mem0 for relevant context based on the input
&lt;/span&gt;    &lt;span class="c1"&gt;# This uses semantic search to find memories related to the query
&lt;/span&gt;    &lt;span class="n"&gt;relevant_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Format the memories into a system prompt string
&lt;/span&gt;    &lt;span class="n"&gt;context_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;relevant_memories&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_memories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- DEBUG: RETRIEVED CONTEXT ---&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--------------------------------&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Construct the prompt with the retrieved context
&lt;/span&gt;    &lt;span class="n"&gt;system_instruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a travel agent. Context about user:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Generate response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

&lt;span class="c1"&gt;# Session 2: User asks a generic question
&lt;/span&gt;&lt;span class="n"&gt;new_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find me flight options for next Tuesday.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traveler_01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent Response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The output:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb74lbx1plr9cykh5gv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb74lbx1plr9cykh5gv9.png" alt="Terminal output demonstrating travel agent providing personalized flight options based on retrieved user memories without explicit context in query" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this is powerful&lt;/strong&gt;: The user never mentioned "Japan," "Business Class," or "Tokyo" in their second query. They just said "flight options."&lt;/p&gt;

&lt;p&gt;The Mem0 search() function took the query "Find me flight options," looked at the vector store associated with traveler_01, and realized that previous memories about Japan and flying preferences were semantically relevant.&lt;/p&gt;

&lt;p&gt;If the user had 1,000 other memories about "liking cats" or "hating JavaScript," Mem0 would have filtered those out because they're irrelevant to a flight search. This keeps your context window lean and your costs low.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step 3: Handling updates (memory consolidation)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Let's look at the "Update Problem" we mentioned earlier. What if the user's situation changes?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mem0&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemoryClient&lt;/span&gt;

&lt;span class="c1"&gt;# Load environment variables from .env file
&lt;/span&gt;&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the memory client
&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEM0_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traveler_01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# The user changes their mind
&lt;/span&gt;&lt;span class="n"&gt;update_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Actually, my budget got cut. I can only fly economy now.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# We add this new information
&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Let's search for flight preferences again
&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flight preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Updated Memories ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--- Updated Memories ---
- The user, who previously stated they strictly fly business class, hates long layovers, and is planning a trip to Japan, has now experienced a budget cut and can only fly economy.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhubkn916wzz7xnqvatmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhubkn916wzz7xnqvatmu.png" alt="Mem0 dashboard showing memory consolidation where budget constraint updates previous business class preference to economy seating" width="800" height="764"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Mem0 detected the conflict regarding flight class and intelligently updated the memory. Rather than simply replacing "business class" with "economy class," it preserved the context that this was a &lt;em&gt;change&lt;/em&gt; from a previous preference. This nuanced understanding makes Mem0’s memory management better suited than simple key-value storage for long-running agents.&lt;/p&gt;

&lt;p&gt;This "Dynamic Forgetting" is essential for long-running agents. Without it, your agent eventually becomes internally inconsistent, holding onto every contradictory belief the user has ever held.&lt;/p&gt;

&lt;p&gt;Looking at the Mem0 dashboard (as shown in the screenshots), you can see the changelog tracking how memories evolve:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Changelog in Mem0 Dashboard:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v1&lt;/strong&gt;: "The user strictly flies business class, dislikes long layovers, and is planning a trip to Japan."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v2&lt;/strong&gt;: "The user, who previously stated they strictly fly business class, hates long layovers, and is planning a trip to Japan, has now experienced a budget cut and can only fly economy."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This version tracking allows you to understand how user preferences change over time while maintaining only the most current, relevant information. The system preserves context about &lt;em&gt;why&lt;/em&gt; preferences changed, which is invaluable for maintaining conversational coherence.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Level 4: Advanced patterns for robust agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once you have the read/write loop working, you need to consider how to structure the data for complex use cases.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1. Session vs. user memory&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Not all memory is created equal. You should categorize memory based on its lifespan.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short-term (Session)&lt;/strong&gt;: "I just asked you to debug this specific function." This is relevant for 10 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term (User)&lt;/strong&gt;: "I prefer TypeScript over Python." This is relevant forever.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Mem0, you can handle this by using metadata filters or by separating &lt;code&gt;user_id&lt;/code&gt; (for long term) and &lt;code&gt;session_id&lt;/code&gt; (for short term). A common pattern is to dump the raw chat logs into a short-term buffer (passed directly to the LLM) and asynchronously process them into Mem0 for long-term storage.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2. Graph memory (advanced)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For most applications, the vector-based memory retrieval we've covered is sufficient. However, it's worth mentioning that Mem0 also supports graph-based memory for advanced use cases requiring complex relationship tracking between entities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graph memory is beyond the scope of this tutorial&lt;/strong&gt;. I’d recommend learning about vector-based memory first before exploring graph memory. If you're curious, you can check the &lt;a href="https://docs.mem0.ai/open-source/features/graph-memory" rel="noopener noreferrer"&gt;Mem0 graph memory documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Graph memory features are only available with Mem0's Pro plan or higher.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3. Separation of truth vs. memory&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;In highly reliable agents, you should distinguish between "Truth" and "Memory."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Truth&lt;/strong&gt;: Hard data in a SQL database (e.g., active reminders, account balance). This is deterministic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt;: Soft preferences in Mem0 (e.g., "User usually snoozes reminders by 15 mins"). This is probabilistic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your system prompt should ingest both: "Here is the exact status of your tasks (SQL). Here is how you usually like to handle them (Mem0)."&lt;/p&gt;

&lt;h2&gt;
  
  
  Start building your memory layer
&lt;/h2&gt;

&lt;p&gt;Context retention is the difference between a demo and a product. Users will forgive a hallucination or two, but they won't forgive an assistant that forgets their name or their preferences.&lt;/p&gt;

&lt;p&gt;The trap developers fall into is trying to build their own vector pipeline. You'll spend weeks optimizing chunk sizes, debating overlap strategies, and fighting with re-ranking algorithms. And after all that, you'll still have to solve the "Update Problem" manually.&lt;/p&gt;

&lt;p&gt;In many cases, your job is to build the agent, not maintain database infrastructure.&lt;/p&gt;

&lt;p&gt;Start by implementing the simple read/write loop with Mem0 shown above. Test it with conflicting information. Watch how the agent "changes its mind" based on new data without you touching the prompt manually. Once you see that happen, you won't go back to stateless bots.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does adding memory increase latency?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, there’s an extra retrieval call. But because Mem0 sends shorter, more relevant prompts to the LLM, generation is often faster, offsetting the added latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Mem0 with local LLMs like Ollama?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Mem0 is model-agnostic. You can pass the retrieved text into local models like Llama 3 just as you would with hosted models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is this different from built-in memory features?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Built-in memory is usually a black box. You can’t programmatically access, edit, or move it across models. Mem0 gives you full control and ownership of your memory data.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>aimemory</category>
      <category>agentaichallenge</category>
    </item>
    <item>
      <title>Prompt Engineering: The Complete Guide to Better AI Outputs</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Sat, 31 Jan 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/mem0/prompt-engineering-the-complete-guide-to-better-ai-outputs-1e94</link>
      <guid>https://dev.to/mem0/prompt-engineering-the-complete-guide-to-better-ai-outputs-1e94</guid>
      <description>&lt;p&gt;If you ask ten developers what prompt engineering is, you will get ten different answers. Some call it "AI whispering." Others call it "glorified spellchecking."&lt;/p&gt;

&lt;p&gt;I prefer a more technical definition. &lt;strong&gt;Prompt engineering is the practice of constraining the probabilistic output of a Large Language Model (LLM) to achieve a deterministic result.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is not magic. It is an API call where the parameters are natural language instead of strongly typed integers or booleans. When you send a request to GPT-5.2 or Claude 4.5, you are not "talking" to a computer. You are navigating a high-dimensional vector space. Your prompt is the coordinate system that guides the model from a query vector to the nearest desirable completion vector.&lt;/p&gt;

&lt;p&gt;This guide explores the mechanics of that navigation. I will explain why prompt engineering is a necessary bridge between stochastic models and reliable software, how to implement it using proven research techniques, and how the field is shifting toward "Context Engineering."&lt;/p&gt;

&lt;h2&gt;
  
  
  What exactly is prompt engineering?
&lt;/h2&gt;

&lt;p&gt;At its core, prompt engineering is input optimization. LLMs are next-token prediction engines. They compute the probability distribution of the next token based on the sequence of previous tokens.&lt;/p&gt;

&lt;p&gt;If you input "The sky is," the model assigns probabilities to "blue" (high), "gray" (medium), and "potato" (near zero). Prompt engineering is the art of manipulating the preceding tokens (the context) to skew that probability distribution toward the specific output you need.&lt;/p&gt;

&lt;p&gt;For developers, this matters because we rarely want "creative" answers. We want structured data. We want valid JSON. We want Python code that compiles. Prompt engineering turns a text-generation engine into a data-processing engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does prompt engineering even exist?
&lt;/h2&gt;

&lt;p&gt;You might ask why we need a special discipline for this. Why can't the model just "know" what we want?&lt;/p&gt;

&lt;p&gt;The answer lies in the architecture of the Transformer model. These models are probabilistic, not deterministic. If you run the same SQL query against a database twice, you get the same result. If you run the same prompt against an LLM twice with a non-zero temperature, you might get different results.&lt;/p&gt;

&lt;p&gt;Prompt engineering exists to force convergence. It mitigates three specific failures of raw LLMs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hallucination&lt;/strong&gt;: The model invents facts to satisfy the pattern of the prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Format Drift&lt;/strong&gt;: The model returns a paragraph of text when you asked for a JSON object.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context Amnesia&lt;/strong&gt;: The model forgets instructions buried in the middle of a long prompt.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This last point is critical. Research by Nelson F. Liu et al. in their paper &lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;"Lost in the Middle"&lt;/a&gt; demonstrates that LLMs are excellent at retrieving information at the start and end of a context window but often fail to retrieve information buried in the middle. Good prompt engineering structures the input to bypass this architectural limitation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do we control the model?
&lt;/h2&gt;

&lt;p&gt;We use specific patterns to guide the model's reasoning. These are not random hacks. They are techniques backed by academic research that measurably improve performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-shot and few-shot prompting
&lt;/h3&gt;

&lt;p&gt;Zero-shot prompting is asking the model to perform a task without examples. Few-shot prompting provides examples of the input and desired output.&lt;/p&gt;

&lt;p&gt;The difference is massive. In the original GPT-3 paper &lt;a href="https://arxiv.org/abs/2005.14165" rel="noopener noreferrer"&gt;"Language Models are Few-Shot Learners"&lt;/a&gt;, the authors proved that providing just one or two examples (shots) drastically increases the model's ability to follow complex formatting rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chain-of-Thought (CoT)
&lt;/h3&gt;

&lt;p&gt;This is the most significant breakthrough in prompt engineering. Introduced by Wei et al. (2022) in &lt;a href="https://arxiv.org/abs/2201.11903" rel="noopener noreferrer"&gt;"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"&lt;/a&gt;, this technique forces the model to articulate its reasoning steps before generating the final answer.&lt;/p&gt;

&lt;p&gt;Instead of asking for the answer directly, you instruct the model to "think step by step." This works because it allows the model to generate intermediate tokens that serve as a scratchpad. These intermediate tokens essentially increase the computation time the model spends on the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tree of Thoughts (ToT)
&lt;/h3&gt;

&lt;p&gt;Yao et al. (2023) expanded on CoT with &lt;a href="https://arxiv.org/abs/2305.10601" rel="noopener noreferrer"&gt;"Tree of Thoughts"&lt;/a&gt;. This method encourages the model to explore multiple reasoning paths, evaluate them, and backtrack if a path looks unpromising. It mimics human problem-solving more closely than a linear chain.&lt;/p&gt;

&lt;h3&gt;
  
  
  ReAct (Reason + Act)
&lt;/h3&gt;

&lt;p&gt;For developers building agents, &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;ReAct (Yao et al., 2022)&lt;/a&gt; is the standard. It combines reasoning (thinking about the problem) with acting (using external tools like APIs). The model generates a thought, decides to call a tool, observes the output, and then continues reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  5 practical developer workflows
&lt;/h2&gt;

&lt;p&gt;I see too many tutorials focusing on "writing poems" or "generating marketing emails." Let's look at how we actually use prompt engineering in production software.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Generating unit tests from legacy code
&lt;/h3&gt;

&lt;p&gt;Legacy code is often undocumented and untestable. You can use an LLM to generate a test suite. The trick here is to force the model to analyze the edge cases first.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You are a Senior QA Engineer. I will provide a Python function. Your goal is to write a complete pytest suite for it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;1. Analyze the function and list 5 distinct edge cases (e.g., empty inputs, negative numbers, type errors).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;2. For each edge case, write a specific test case.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3. Output only the Python code for the tests.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Code&lt;/strong&gt;:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;def calculate_discount(price, tier):&lt;/p&gt;

&lt;p&gt;if tier == "gold":&lt;/p&gt;

&lt;p&gt;return price * 0.8&lt;/p&gt;

&lt;p&gt;elif tier == "silver":&lt;/p&gt;

&lt;p&gt;return price * 0.9&lt;/p&gt;

&lt;p&gt;else:&lt;/p&gt;

&lt;p&gt;return price&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The model will first list the edge cases (invalid tier, negative price, zero price, float precision issues) and then generate the code. This intermediate step ensures the tests are robust.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Converting SQL schemas to Pydantic models
&lt;/h3&gt;

&lt;p&gt;This is a common task when building modern APIs on top of legacy databases. You want to automate the boilerplate generation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Act as a Data Engineer. I need to convert a raw SQL&lt;/em&gt; &lt;code&gt;_CREATE TABLE_&lt;/code&gt; &lt;em&gt;statement into a Python Pydantic v2&lt;/em&gt; &lt;code&gt;_BaseModel_&lt;/code&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Rules:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;1. Map&lt;/em&gt; &lt;code&gt;_VARCHAR_&lt;/code&gt; &lt;em&gt;to&lt;/em&gt; &lt;code&gt;_str_&lt;/code&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;2. Map&lt;/em&gt; &lt;code&gt;_INT_&lt;/code&gt; &lt;em&gt;to&lt;/em&gt; &lt;code&gt;_int_&lt;/code&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3. If a field is&lt;/em&gt; &lt;code&gt;_NOT NULL_&lt;/code&gt;&lt;em&gt;, it is required. If it is nullable, use&lt;/em&gt; &lt;code&gt;_Optional[type]_&lt;/code&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;4. Add&lt;/em&gt; &lt;code&gt;_Field_&lt;/code&gt; &lt;em&gt;descriptions based on the column names.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Input&lt;/strong&gt;:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;CREATE TABLE users (&lt;/p&gt;

&lt;p&gt;id INT  PRIMARY KEY,&lt;/p&gt;

&lt;p&gt;username VARCHAR(50) NOT NULL,&lt;/p&gt;

&lt;p&gt;last_login TIMESTAMP&lt;/p&gt;

&lt;p&gt;);&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;from pydantic import BaseModel, Field&lt;/p&gt;

&lt;p&gt;from typing import Optional&lt;/p&gt;

&lt;p&gt;from datetime import datetime&lt;/p&gt;

&lt;p&gt;class User(BaseModel):&lt;/p&gt;

&lt;p&gt;id: int = Field(..., description="Primary key for user")&lt;/p&gt;

&lt;p&gt;username: str = Field(..., max_length=50)&lt;/p&gt;

&lt;p&gt;last_login: Optional[datetime] = None&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Debugging stack traces with context injection
&lt;/h3&gt;

&lt;p&gt;When a CI/CD pipeline fails, digging through logs is tedious. You can prompt an LLM to find the root cause, but you must provide the &lt;em&gt;source code&lt;/em&gt; along with the error.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You are a Python debugging assistant. I have a stack trace and the relevant source file.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;1. Identify the line number in the stack trace that belongs to my code (not libraries).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;2. Look at that line in the provided source code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;3. Explain exactly why the error occurred.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;4. Propose a one-line fix.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Error:&lt;/em&gt; &lt;code&gt;_KeyError: 'details'_&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source Code:&lt;/em&gt; &lt;code&gt;_return data['response']['details']_&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The model identifies that the dictionary key 'details' is missing and suggests using&lt;/em&gt; &lt;code&gt;_.get('details', {})_&lt;/code&gt; &lt;em&gt;instead of direct access.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Refactoring for performance (O(n²) to O(n))
&lt;/h3&gt;

&lt;p&gt;LLMs are surprisingly good at algorithmic optimization if you explicitly ask for Big O notation improvements.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Review the following Python function. It currently runs in O(n^2) time complexity. Refactor it to run in O(n) or O(n log n).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Explain the time complexity change before showing the code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;def find_common(list_a, list_b):&lt;/p&gt;

&lt;p&gt;result = []&lt;/p&gt;

&lt;p&gt;for i in list_a:&lt;/p&gt;

&lt;p&gt;if i in list_b: # This search is O(n) inside a loop&lt;/p&gt;

&lt;p&gt;result.append(i)&lt;/p&gt;

&lt;p&gt;return result&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The model will explain that converting&lt;/em&gt; &lt;code&gt;_list_b_&lt;/code&gt; &lt;em&gt;to a set makes the lookup O(1), reducing the total complexity to O(n).&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. API documentation generation
&lt;/h3&gt;

&lt;p&gt;Writing OpenAPI (Swagger) specs is boring. LLMs can generate them from the implementation code.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Generate an OpenAPI 3.0 YAML definition for the following Flask route. Include response schema and error codes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;@app.route('/api/v1/user/&lt;a&gt;int:user_id&lt;/a&gt;', methods=['GET'])&lt;/p&gt;

&lt;p&gt;def get_user(user_id):&lt;/p&gt;

&lt;p&gt;user = db.get(user_id)&lt;/p&gt;

&lt;p&gt;if not user:&lt;/p&gt;

&lt;p&gt;return jsonify({"error": "User not found"}), 404&lt;/p&gt;

&lt;p&gt;return jsonify(user.to_dict()), 200&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A valid YAML block defining the parameters, the 200 success schema, and the 404 error response.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The future is context engineering
&lt;/h2&gt;

&lt;p&gt;There is a growing sentiment that prompt engineering is a temporary patch. People argue that as models get smarter, they will infer intent perfectly.&lt;/p&gt;

&lt;p&gt;I disagree. The field is not dying. It is shifting. We are moving from &lt;strong&gt;Prompt Engineering&lt;/strong&gt; (optimizing a single string) to &lt;strong&gt;Context Engineering&lt;/strong&gt; (optimizing the information environment).&lt;/p&gt;

&lt;p&gt;But the challenge is "how do I feed the model the right 5KB of data out of my 10GB database so it can answer the question?"&lt;/p&gt;

&lt;h2&gt;
  
  
  How does Mem0 solve context engineering?
&lt;/h2&gt;

&lt;p&gt;This is the exact problem &lt;a href="https://mem0.ai/" rel="noopener noreferrer"&gt;Mem0&lt;/a&gt; solves. We realized that simple vector search (RAG) is often not enough. Vector search finds similar &lt;em&gt;words&lt;/em&gt;, but it misses &lt;em&gt;relationships&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If you search for "Alice's projects," a vector database might return documents containing "Alice" and "projects." It might miss a document that says "Alice is the lead of the Delta Team" and another that says "The Delta Team owns the Mobile App."&lt;/p&gt;

&lt;p&gt;Mem0 adds a memory layer that combines &lt;a href="https://mem0.ai/blog/graph-memory-solutions-ai-agents" rel="noopener noreferrer"&gt;vector search with graph memory&lt;/a&gt;. We track user entities and their relationships over time. When you ask a question, we don't just look for keyword matches. We look at the graph of what the user knows and cares about.&lt;/p&gt;

&lt;p&gt;This allows developers to move beyond "stateless" prompt engineering. You don't have to remind the model "I am a Python developer" in every single prompt. The memory layer handles that context injection for you. The future is not about writing better prompts. It is about building better memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions about prompt engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between Zero-Shot and Few-Shot prompting?
&lt;/h3&gt;

&lt;p&gt;Zero-shot prompting relies entirely on the model's pre-trained weights without examples. &lt;a href="https://mem0.ai/blog/few-shot-prompting-guide" rel="noopener noreferrer"&gt;Few-shot prompting&lt;/a&gt; alters the model's latent state by providing specific input-output pairs (examples) within the prompt, which significantly improves reliability for structured tasks like SQL or code generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do LLMs hallucinate API endpoints?
&lt;/h3&gt;

&lt;p&gt;Hallucinations occur because models predict probable tokens based on training patterns rather than retrieving facts. If an API follows a standard naming convention, the model may predict a non-existent endpoint. This is mitigated by injecting the exact API schema into the context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the 'Lost in the Middle' phenomenon affect prompts?
&lt;/h3&gt;

&lt;p&gt;Research shows that LLM accuracy degrades for information placed in the middle of a large context window. "Context stuffing"—dumping massive documentation into a prompt—often fails because the model prioritizes data at the beginning and end of the prompt (U-shaped attention).&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is JSON mode recommended for AI agents?
&lt;/h3&gt;

&lt;p&gt;JSON mode forces the model to output valid JSON syntax, preventing conversational filler (e.g., "Here is the code"). This ensures the output is deterministic and machine-parseable, which is critical for preventing runtime errors in agentic workflows.&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Architecture of Remembrance: Architectures, Vector Stores, and GraphRAG</title>
      <dc:creator>Ninad Pathak</dc:creator>
      <pubDate>Fri, 30 Jan 2026 17:57:57 +0000</pubDate>
      <link>https://dev.to/mem0/the-architecture-of-remembrance-architectures-vector-stores-and-graphrag-54ae</link>
      <guid>https://dev.to/mem0/the-architecture-of-remembrance-architectures-vector-stores-and-graphrag-54ae</guid>
      <description>&lt;p&gt;Every time you send a request to a Large Language Model (LLM), it looks at you for the first time. It has read the entire internet, but it has no idea who you are, what you asked ten seconds ago, or why you are asking it.&lt;/p&gt;

&lt;p&gt;For the architects of the modern web, this statelessness was a feature. Developers aligned with &lt;a href="https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven" rel="noopener noreferrer"&gt;Roy Fielding’s REST principles&lt;/a&gt;, accepting that servers shouldn't remember client state to ensure scalability. But for the AI agents I build (autonomous entities designed to perform complex, multi-step tasks), this is a major failure. An agent without memory is merely a function.&lt;/p&gt;

&lt;p&gt;Memory bridges the "eternal now" of the LLM inference cycle with the continuity required for intelligence. But what exactly is it?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AI memory?
&lt;/h2&gt;

&lt;p&gt;AI memory is an AI system's ability to &lt;strong&gt;store, recall, and use past information and interactions&lt;/strong&gt; to provide context, personalize responses, and improve performance over time, moving beyond simple, stateless processing to maintain continuity like a human does. It allows AI to remember user preferences, conversation history, and learned patterns, making interactions more coherent and effective, much like human memory supports learning and reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does AI memory mimic the mind?
&lt;/h2&gt;

&lt;p&gt;To understand how to build memory for machines, we must first categorize what we are trying to simulate. Cognitive science offers a taxonomy that maps surprisingly well to software architecture. Human memory functions as a complex system of interconnected storage mechanisms rather than a single bucket.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sensory memory vs. the context window
&lt;/h3&gt;

&lt;p&gt;In biological systems, sensory memory holds information for a split second. In AI, the closest analogue is the &lt;strong&gt;context window&lt;/strong&gt;, functioning as the immediate scratchpad of the model. Information placed here is instantly accessible, processed with high fidelity, and fully integrated into the "thought process" of the LLM.&lt;/p&gt;

&lt;p&gt;However, the context window is finite. While models like Gemini or Claude boast windows of millions of tokens, filling them comes with high latency and financial cost. More importantly, the &lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;"Lost in the Middle" phenomenon&lt;/a&gt; reveals that models often fail to retrieve information buried in the center of a massive context prompt. The context window is the working RAM rather than the hard drive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Short-term memory (STM)
&lt;/h3&gt;

&lt;p&gt;Short-term memory in agents typically refers to the conversation history of the current session. It allows the agent to recall that you asked for a Python script three turns ago so it can now iterate on that script. This is transient, ephemeral, and usually discarded when the session ends.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long-term memory (LTM)
&lt;/h3&gt;

&lt;p&gt;Long-term memory allows for persistent context across sessions, days, and distinct interactions. It enables an agent to learn user preferences ("Ninad prefers TypeScript over JavaScript"), recall project structures ("The &lt;code&gt;utils&lt;/code&gt; folder contains the date formatting logic"), and build a cumulative understanding of the world. LTM implies a database, but the structure of that database determines the intelligence of the recall.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the architectures of AI agents with memory?
&lt;/h2&gt;

&lt;p&gt;While basic memory is often equated with "storing chat logs in a vector database," 2024 and 2025 have seen the rise of cognitive architectures that mimic human processing in agentic toolchains.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generative agents and the reflection mechanism
&lt;/h3&gt;

&lt;p&gt;I read a &lt;a href="https://arxiv.org/pdf/2304.03442" rel="noopener noreferrer"&gt;recent research&lt;/a&gt; project that proposed a memory architecture that goes beyond storage. It introduced the concept of the &lt;strong&gt;Memory Stream&lt;/strong&gt;, which is a comprehensive list of an agent's experiences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l9oarjd1id1ilx4apes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7l9oarjd1id1ilx4apes.png" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What was interesting to me was the &lt;strong&gt;Reflection&lt;/strong&gt; aspect. Reflections are periodic, high-level abstract thoughts generated by the agent. The agent does not just retrieve raw observations (e.g., "User ate lunch"); it synthesizes them into insights (e.g., "User creates a pattern of eating at 1 PM").&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Likability&lt;/strong&gt;: How important is this memory? (Rated by the LLM).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recency&lt;/strong&gt;: How long ago did this happen?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relevance&lt;/strong&gt;: Does this matter to the current context?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture allows agents to behave with credible social dynamics, organizing their "thoughts" rather than just regurgitating data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The operating system analogy for AI memory
&lt;/h3&gt;

&lt;p&gt;Another powerful approach facilitates treating the LLM not just as a text processor, but as an Operating System. This paradigm explicitly divides memory into hierarchies akin to computer architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Main Context (RAM)&lt;/strong&gt;: The immediate prompt window. Expensive and finite.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;External Context (Disk)&lt;/strong&gt;: Massive storage in databases. Cheap and infinite.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially, this architecture enables the LLM to manage its own memory via &lt;strong&gt;function calls&lt;/strong&gt;. The model can decide to move critical facts (like a user's birthday) to persistent storage or search historical records when needed. This "self-editing" capability prevents the context window from overflowing with noise while maintaining access to vast amounts of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  How are AI memory systems built?
&lt;/h2&gt;

&lt;p&gt;Building a memory system requires moving beyond simple list appending. It means constructing a storage and retrieval system that must effectively mimic the associative nature of the human brain.&lt;/p&gt;

&lt;h3&gt;
  
  
  The vector store: The hippocampus
&lt;/h3&gt;

&lt;p&gt;The most common implementation of agent memory today relies on &lt;strong&gt;Vector Databases&lt;/strong&gt;. When text is ingested, be it a user query, a document, or a log file, it is passed through an embedding model (like OpenAI's &lt;code&gt;text-embedding-3&lt;/code&gt;). This model converts the semantic meaning of the text into a high-dimensional vector, a list of floating-point numbers.&lt;/p&gt;

&lt;p&gt;These vectors are stored in a database like Pinecone, Weaviate, or Qdrant. When the agent needs to "remember" something, it converts the current query into a vector and performs a similarity search (often using Cosine Similarity) to find the nearest vectors in that high-dimensional space.&lt;/p&gt;

&lt;p&gt;This mimics the human hippocampus, which is essential for forming new memories and connecting related concepts. If you search for "apple," a vector store naturally surfaces concepts like "fruit," "red," and "pie," even if the word "apple" is not explicitly present, because they reside close together in the semantic vector space.&lt;/p&gt;

&lt;h3&gt;
  
  
  GraphRAG: The association cortex
&lt;/h3&gt;

&lt;p&gt;Vector stores have a weakness: they struggle with structured relationships and multi-hop reasoning. Vectors are "fuzzy." They know that "Paris" and "France" are related, but they might not explicitly encode the directional relationship "Paris is the capital of France."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GraphRAG&lt;/strong&gt; (Graph Retrieval-Augmented Generation) solves this by combining the unstructured strength of vectors with the structured rigidness of Knowledge Graphs. &lt;a href="https://docs.mem0.ai/open-source/features/graph-memory" rel="noopener noreferrer"&gt;Mem0's Graph Memory&lt;/a&gt;, for example, allows for dynamic relationship mapping that evolves as the agent learns more about its environment.&lt;/p&gt;

&lt;p&gt;Using graph databases (like Neo4j), developers can store information as nodes and edges: &lt;code&gt;(Entity: Paris) --[RELATION: CAPITAL_OF]--&amp;gt; (Entity: France)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For an agent, this is essential for complex problem-solving. If an agent is managing a supply chain, broad semantic similarity is insufficient. It needs to traverse specific paths: "Supplier A provides Part B, which is used in Product C." Graph-based memory allows the agent to "hop" across these nodes to answer questions that a simple vector similarity search would miss.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid systems
&lt;/h3&gt;

&lt;p&gt;The state-of-the-art in 2026 is &lt;strong&gt;Hybrid Memory&lt;/strong&gt;. This approach uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vector Search&lt;/strong&gt;: For unstructured retrieval (finding relevant emails, documents, or loose notes).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graph Traversal&lt;/strong&gt;: For structured facts and rigid relationships.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Episodic Storage&lt;/strong&gt;: For temporal sequences of events.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This combination provides the "intuition" of embeddings with the "precision" of graphs, ensuring the agent is both creative and factually grounded.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why should AI memory matter to developers?
&lt;/h2&gt;

&lt;p&gt;The "stateless chatbot" market is saturated and the next generation of applications requires context-aware personalization.&lt;/p&gt;

&lt;p&gt;Consider a coding assistant. A standard LLM-based tool can write a function if you paste the relevant code. A memory-enabled agent can look at your entire repository history, remember that you refactored the authentication module last week, and suggest a change that aligns with your new security patterns. Rather than just processing your request, it &lt;em&gt;understands&lt;/em&gt; the continuity of the work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Existing frameworks and integrations for building AI memory applications
&lt;/h3&gt;

&lt;p&gt;Developers do not need to build these complex Retrieval-Augmented Generation (RAG) pipelines from scratch. Usually, they lean on orchestration frameworks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LangChain&lt;/strong&gt;: Offers various &lt;code&gt;Memory&lt;/code&gt; classes (like &lt;code&gt;ConversationBufferMemory&lt;/code&gt; or &lt;code&gt;VectorStoreRetrieverMemory&lt;/code&gt;) that wrap the complexity of saving and loading history.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LlamaIndex&lt;/strong&gt;: Focuses heavily on the indexing strategy, allowing for composable indices where a list index can sit on top of a vector store, which sits on top of a graph.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AutoGPT&lt;/strong&gt;: One of the earliest autonomous agent projects, which demonstrated the necessity of a purely memory-driven loop where the agent writes its thoughts to a file or database to "sleep" on them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, these frameworks often treat memory as part of the application logic. The logic is tightly coupled with the control flow of the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is there a need for a dedicated AI memory layer?
&lt;/h2&gt;

&lt;p&gt;If you’re building an AI agent in 2026, it’s highly likely you’d need a memory layer to integrate personalization. That’s the alpha your product would have compared to the competitors.&lt;/p&gt;

&lt;p&gt;But, you don’t need to build memory from scratch. There’s already a concept of &lt;strong&gt;Memory as a Service&lt;/strong&gt; (or the Memory Layer) that decouples the memory logic from the agent's reasoning loop. Instead of manually coding "retain this context, summarize that history, store this embedding," you can use a dedicated layer that handles the cognitive overhead.&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://mem0.ai/" rel="noopener noreferrer"&gt;Mem0&lt;/a&gt; for exactly this use case. It acts as an intelligent memory layer that sits between the application and the LLM. It manages the complexities we discussed: vector storage, user personalization, and session handling through a simple API.&lt;/p&gt;

&lt;p&gt;The advantage here is specificity and meaningful filtering.&lt;/p&gt;

&lt;p&gt;A raw vector store will return the top-K chunks of text, regardless of whether they are repetitive or outdated. A dedicated memory layer like Mem0 can implement "memory management" logic: updating old memories when new conflicting information arrives (e.g., the user moved from "San Francisco" to "New York"), decaying irrelevant memories over time, and prioritizing information based on utility. For enterprise use cases, features like &lt;a href="https://mem0.ai/security" rel="noopener noreferrer"&gt;privacy and security compliance&lt;/a&gt; become critical advantages over home-rolled solutions.&lt;/p&gt;

&lt;p&gt;It integrates with the existing ecosystem. Whether using OpenAI's API, Anthropic's Claude, or frameworks like LangChain, I can plug a memory layer in to instantly upgrade "stateless" calls to "stateful" interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where can intelligent memory be applied?
&lt;/h2&gt;

&lt;p&gt;The application of these architectures extends far beyond simple chatbots. By enabling agents to retain context, we unlock new possibilities in personalized education, healthcare, and professional services.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mem0.ai/usecase/customer-support" rel="noopener noreferrer"&gt;Customer support&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Support agents often fail because they lack context. Using memory, an agent can instantly recall a user's previous tickets, frustration level, and purchase history. It stops asking "How can I help you?" and starts asking "Is this about the refund request from Tuesday?"&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mem0.ai/usecase/healthcare" rel="noopener noreferrer"&gt;Healthcare&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;For elderly care or chronic disease management, an AI companion must remember medication schedules, reported symptoms from a week ago, and the names of family members. Hallucinating a dosage or forgetting a severe allergy is not an option. Here, the precision of graph-based memory is vital.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mem0.ai/usecase/education" rel="noopener noreferrer"&gt;Education&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;An education agent should not treat a student like a stranger every day. It should remember that the student struggled with &lt;em&gt;Quadratic Equations&lt;/em&gt; yesterday and offer a review session today before moving to &lt;em&gt;Calculus&lt;/em&gt;. This requires a persistent user profile memory that grows with every interaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mem0.ai/usecase/sales" rel="noopener noreferrer"&gt;Sales &amp;amp; CRM&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Closing a deal often takes weeks. A memory-enabled sales agent remembers every stakeholder mentioned in passing, every feature request, and every objection raised in previous calls. It turns a fragmented sequence of chats into a cohesive, ongoing relationship.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://mem0.ai/usecase/e-commerce" rel="noopener noreferrer"&gt;E-commerce&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Shopping is personal. Instead of generic recommendations, a memory-aware agent recalls that you prefer sustainable brands and hate wool. It powers personalized shopping at scale, curating a storefront that feels uniquely yours.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does the future of recall look like?
&lt;/h2&gt;

&lt;p&gt;We are moving away from the era of "Prompt Engineering," where the user is responsible for stuffing the context window with the necessary background information, toward "Context Engineering," where the system automatically retrieves the perfect set of memories for the task at hand.&lt;/p&gt;

&lt;p&gt;The goal is an agent that functions as a capable colleague. It knows your shorthand. It anticipates your needs based on past interactions. It does not need to be told the same thing twice. This level of seamless interaction is only possible when memory is treated not as a database problem, but as a core component of the AI's cognitive architecture.&lt;/p&gt;

&lt;p&gt;To build agents that truly serve us, we must give them the capacity to remember.&lt;/p&gt;

</description>
      <category>promptengineering</category>
      <category>llm</category>
      <category>aimemory</category>
      <category>mem0ai</category>
    </item>
  </channel>
</rss>
