DEV Community: Rahil Pirani

The challenges of creating a semantic memory layer on Cloudflare Workers, D1, and Vectorize.

Rahil Pirani — Sat, 06 Jun 2026 11:23:52 +0000

The main concept is straightforward: embed text, store the vector, and query it later. The time-consuming part was everything else.

I created a memory layer that maintains context across AI tools using Cloudflare Workers, D1, Vectorize, and Workers AI. All this operates on the free tier. Here’s what I didn’t realize at first.

Two stores, kept strictly separate

D1 stores structured entry data, including content, tags, timestamps, importance scores, and the exact vector IDs put into Vectorize. Vectorize holds the embeddings, linked by UUID.

export interface Env {
  DB: D1Database;
  VECTORIZE: VectorizeIndex;
  AI: Ai;
  AUTH_TOKEN: string;
}

Don’t consider Vectorize your source of truth for content. It functions as a lookup index. D1 is the database. This distinction becomes crucial when updating or deleting data.

Chunk at sentence boundaries, not character counts

Long entries split before embedding. Breaking on character counts can lose semantic context. The solution is to look back for the nearest sentence or newline break before deciding on a split point.

function chunkText(text: string, maxChars = 1600, overlapChars = 200): string[] {
  if (text.length <= maxChars) return [text];
  const chunks: string[] = [];
  let start = 0;
  while (start < text.length) {
    let end = start + maxChars;
    if (end < text.length) {
      const lastPeriod = text.lastIndexOf(".", end);
      const lastNewline = text.lastIndexOf("\n", end);
      const breakPoint = Math.max(lastPeriod, lastNewline);
      if (breakPoint > start + maxChars / 2) end = breakPoint + 1;
    }
    chunks.push(text.slice(start, Math.min(end, text.length)).trim());
    start = end - overlapChars;
  }
  return chunks.filter((c) => c.length > 0);
}

Each chunk receives its own Vectorize vector with a parentId in the metadata linking back to the D1 entry. Keep track of the exact vector IDs returned and save them in D1 since Vectorize doesn’t support a "delete where parentId = x" operation.

await env.DB.prepare(
  `UPDATE entries SET vector_ids = ? WHERE id = ?`
).bind(JSON.stringify(vectorIds), id).run();

One Vectorize query, three decisions

With each write, a single embed and Vectorize query manage duplicate detection, contradiction detection, and merge decisions in one round trip. Three score bands dictate the next steps.

const DUPLICATE_BLOCK_THRESHOLD = 0.95;
const DUPLICATE_FLAG_THRESHOLD = 0.85;
const CANDIDATE_SCORE_THRESHOLD = 0.45;

>= 0.95: Exact or near-exact duplicate. Block it and skip the LLM entirely.

0.85 to 0.95: Similar enough to require a decision. A combined prompt goes to the LLM asking for one of four actions: contradiction, replace, merge, or keep_both.

0.45 to 0.85: These are candidates for contradiction only. A lighter prompt is used, with no merge logic.

The combined prompt in the flagged band looks like this:

Choose exactly one action. Prioritise in this order:
1. "contradiction" — new memory DIRECTLY CONFLICTS with an existing one
2. "replace" — new memory clearly supersedes an existing one
3. "merge" — both memories are complementary and better as one combined entry
4. "keep_both" — memories are different enough to coexist

Respond with JSON only.
{"action":"keep_both"} OR
{"action":"contradiction","conflicting_id":"<id>","reason":"<10 words max>"} OR
{"action":"replace","target_id":"<id>"} OR
{"action":"merge","target_id":"<id>","merged_content":"<text>"}

You must validate the returned ID against the candidate set before taking action. LLMs can generate incorrect IDs.

Stale vector cleanup is not optional

When merging occurs, write the new canonical entry and delete both originals. Deleting from D1 is straightforward. Deleting from Vectorize requires the exact IDs you saved earlier.

if (oldVectorIds.length) await env.VECTORIZE.deleteByIds(oldVectorIds);

If you skip this, orphaned vectors will silently accumulate. They will appear in recall results, inflate scores, and create matches pointing to entries that no longer exist in D1. This issue is hard to diagnose but easy to prevent.

For updates, the safe approach is to insert new vectors first, confirm success, and then delete the old ones. Reversing that order can lead to lost data if the insert fails after the delete succeeds.

Cosine similarity is not enough for reranking

Raw Vectorize scores are based on cosine similarity. Once entries vary in age and access frequency, this becomes a poor measure of relevance. The reranker applies three multipliers.

Recency: Exponential decay based on tag-aware half-lives. Tasks decay in 7 days, work entries in 3 months, and context entries in 6 months.

Frequency: Uses log1p of recall count, allowing frequently accessed entries to surface higher without overshadowing newer ones.

Importance: A score from 1 to 5 based on a separate LLM pass during write, scaled to a multiplier of 0.88 to 1.20 so high-importance entries can surpass the recency cap.

const recencyMultiplier = Math.exp(-ageMs / halfLifeMs);
const frequencyMultiplier = 1 + Math.log1p(rc);
const importanceMultiplier = imp === 0 ? 1.0 : 0.8 + (imp / 5) * 0.4;

return {
  ...match,
  score: match.score * recencyMultiplier * frequencyMultiplier * importanceMultiplier
};

Short appends and rolled-up entries will receive a penalty to keep noise out of the top results.

The topK multiplier problem

Multiply your topK by at least 3 before deduplicating by parentId. If an entry has 4 chunks and you query topK: 5, those 4 chunks can take most of your result budget before you have seen enough unique parents.

const VECTORIZE_TOP_K_MULTIPLIER = 3;

Query wider and then deduplicate by parentId in your application code.

The full implementation is open source if you want to explore any of these topics: github.com/rahilp/second-brain-cloudflare

I launched an open source tool on Product Hunt with no budget, no team, and no audience. Here's what actually happened.

Rahil Pirani — Mon, 01 Jun 2026 10:29:37 +0000

Three weeks before launch, second-brain-cloudflare had 0 stars on GitHub.

Yesterday it reached 168.

This is an honest account of how that happened: what was planned, what was improvised, and what I'd advise someone doing this for the first time.

What I built and why

Every AI session starts from zero. Projects, decisions, preferences go away the moment the window closes. I grew frustrated enough to build a solution: a memory layer that persists across Claude, ChatGPT, Cursor, and any MCP-compatible tool. I self-hosted it on Cloudflare's free tier. It uses semantic search instead of keyword matching.

I built it for myself. Then I open-sourced it because this problem wasn’t mine alone.

The three weeks before launch

Most Product Hunt launch advice focuses on launch day. The three weeks before it matter more.

I posted on Reddit, participated in Discord communities, released four versions (v1.3 to v1.6), created an Obsidian plugin, wrote technical articles, and directly messaged builders. None of this was organized around a launch date. I was just trying to show it to people who might use it.

Stars grew slowly: 50 by May 13, 80 by May 23, and 90 by May 29.

That slow buildup matters. The Product Hunt launch didn’t create momentum; it boosted the momentum that already existed. I don’t believe a launch on day one would have resulted in even a third of those numbers.

Finding the right hunter

I got lucky here, and I'm honest about that.

fmerian has 67K followers on Product Hunt. He hunted Mastra, which eventually got 20K GitHub stars. When I reached out, he asked one question: "What's your goal with this launch?" Not "what does your product do" but what's your goal.

That question forced me to be specific: I wanted awareness among developers frustrated with siloed AI memory, GitHub stars, and a contributor community. I sought feedback from builders using it across various tools, not monetization—community first.

He suggested launching on a Sunday. Less competition means more developers on Product Hunt during weekends. That advice alone was worth more than any paid promotion.

If you're planning a Product Hunt launch, find someone who has done it recently, understands your space, and will engage with your product. The relationship matters more than the number of followers.

Launch day

I was up at 3 AM.

The first few hours were mostly waiting—ranks are hidden during a randomization period. By 4 AM, we had 9 upvotes and a few notable early voters. By mid-morning, the comments began coming in.

Here’s a piece of advice that differs from most launch guides: reply to every comment as if it were a one-on-one conversation. Don’t treat it like a support ticket or just say thank you. Provide a real response.

Builders asked sharp questions about conflict resolution, data ownership, multi-client memory, and temporal recall. Those weren’t support requests; they were the real product conversation I had been trying to have for three weeks. Taking each question seriously resulted in better follow-up engagement than any post I made that day.

Final result: #3 Product of the Day. 253 upvotes. 46 comments.

What the star chart actually shows

That near-vertical line on May 31 represents roughly 80 stars in a single day. But the shape before it is what I keep examining: three weeks of slow, steady, organic growth. The launch caused a spike. The work before it laid the foundation.

Without that foundation, the spike probably wouldn’t happen. Or it might happen and then immediately flatline with no retention.

What I'd do differently

I would start community conversations earlier and lead with questions instead of announcements.

Most of my pre-launch posts were “here's what I built.” The Product Hunt comment threads showed me what people really wanted to discuss—conflict resolution, data ownership, whether this works across accounts and devices. If I had led with those questions earlier, I would have built a more engaged audience by launch day.

Another thing: don’t treat launch day as the finish line. The comments kept coming in the next morning. The GitHub stars kept rising. The launch opened a door—what happens after it depends entirely on what you do in the first 48 hours.

Where it goes from here

168 stars. 28 forks. Two roadmap features came directly from Product Hunt comments.

Building in public. Everything is open source at github.com/rahilp/second-brain-cloudflare.

AI memory has a contradiction problem nobody is talking about

Rahil Pirani — Mon, 25 May 2026 06:53:45 +0000

Most discussions about AI memory focus on a few main concerns: whether it lasts across sessions, how quickly it retrieves information, and whether it can scale. These are important questions. However, there’s a simpler issue that often gets overlooked, and it slowly worsens memory systems over time.

What happens when two stored memories conflict?

You tell your AI assistant that you prefer short, direct answers. A month later, you mention wanting more detailed explanations with examples. Both preferences get stored. Now, every recall brings up both. The system tries to accommodate both, but neither aligns with what you actually want at that moment.

This isn’t just a hypothetical situation. It happens with any memory system that only adds information over time. Your preferences change. Your situation evolves. But the earlier version of you is still there, pushing in the opposite direction.

Most people default to using a review inbox. It identifies conflicts and lets the user decide. It sounds good in theory but is frustrating in practice.

No one wants to manage their AI's memory manually. The goal is for it to work in the background. A review inbox turns memory management into a task that often gets ignored, leading to a buildup of contradictions anyway.

Another common method is timestamp-based overwriting: when new information comes in, it checks for similarities and replaces the old. But similarity doesn’t equal contradiction. "I work best in the mornings" and "I do my best thinking late at night" may be very different but share low similarity. A vector search won’t catch this. Both get stored and recalled.

The right question isn’t "how do we find similarities?" It should be "how do we identify logical incompatibility?"

This is a semantic reasoning challenge, not just a retrieval one. Two memories might not seem similar, yet can still contradict each other. The only way to recognize this is with a language model, not through distance metrics.

When we integrated contradiction detection into second-brain, our key design choice was to use a large language model (LLM) to check if new memories contradict any of the most recently recalled ones. We inquire not only "is this similar?" but "can both of these be true at the same time?"

When a conflict arises, the new memory prevails. The old one gets deleted entirely, from both storage and the vector index. It's gone. The new memory is the only version that exists.

There’s a real trade-off worth noting. Conditional preferences can be tricky.

For example, "I want short responses when I’m coding, long ones when I’m strategizing" isn’t a contradiction. Those statements can coexist. An unsophisticated LLM check might flag them as conflicting. To get this right, enough context needs to be passed through the check so the model can distinguish between real conflicts and situational variations.

This is a more complex issue, and the current implementation doesn't address it entirely. It handles clear cases well: factual contradictions, changes in preferences, updated decisions. The conditional cases represent a known gap.

However, catching the clear cases already makes a significant difference. A memory system that sometimes overlooks nuanced conditions is still better than one that continuously accumulates contradictions without end.

Storage is the easy part of AI memory; everyone can provide it. What truly matters for long-term usefulness is coherence over time, not just a lot of noise. To achieve coherence, contradictions must be treated as a primary issue, not just a task to clean up later.

I built persistent AI memory for Claude on Cloudflare's free tier

Rahil Pirani — Wed, 20 May 2026 04:45:51 +0000

Every Claude session starts fresh. You copy context, explain your setup, reintroduce your project, and then do it all over again the next day. I got tired of this and created a solution.

second-brain-cloudflare is a self-hosted MCP server that provides Claude, ChatGPT, Cursor, and any MCP-compatible client with persistent memory across sessions. It operates entirely on Cloudflare's free tier. Here’s how it works.

The stack

Cloudflare Workers: MCP server, REST API, and web UI, all from one wrangler deploy
D1 (SQLite): stores entry content, tags, source, timestamps, and vector chunk IDs
Vectorize: the vector index (bge-small-en-v1.5, 384 dimensions)
Workers AI: bge-small-en-v1.5 for embeddings, @cf/meta/llama-4-scout-17b-16e-instruct for web UI synthesis

One deployment. No external databases. No API keys needed beyond your Cloudflare account token.

Tag-based time-decay reranking

Pure vector similarity has a drawback. A memory from three months ago can outrank something you saved yesterday if it’s semantically closer. The solution is to fetch three times more candidates than needed (topK=5 pulls 15), then score each using a tag-aware half-life:

Tasks: 7-day half-life
Work: 3-month half-life
Context: 6-month half-life
Default: 30-day half-life

adjusted_score = cosine_similarity × e^(-age_in_days / half_life)

Duplicate detection

Before storing anything, embed the incoming content and query Vectorize for its nearest neighbor:

Score ≥ 95%: block
Score 85–94%: store with duplicate-candidate tag
Score < 85%: store normally

Without this step, Claude creates 20–30 nearly identical entries for the same decision.

Smart chunking

Long notes split at sentence ends, with a 200-character overlap. Each chunk receives its own vector. Chunk IDs are stored in D1, so forget() reliably removes all related vectors.

Temporal recall (v1.2.0)

Queries now support time limits:

recall("API decisions", after="7 days ago")
recall("standup notes", after="2026-05-12") Supports: "today", "yesterday", "last week", "this month", ISO dates, and epoch timestamps.

AI synthesis in the web UI

Queries flow through @cf/meta/llama-4-scout-17b-16e-instruct before being rendered. Answers stream in real time, with source memories that can be collapsed underneath. You’ll find Append and Forget buttons. This runs on your own Cloudflare account.

Why the free tier works

D1: 5GB storage, 5 million row reads per day
Vectorize: 5 million vectors, 30 million queried dimensions per month (adequate for team scale but fine for personal use)
Workers AI: 10,000 Neurons per day

Try it

Deploy: https://thesecondbrain.dev
GitHub: https://github.com/rahilp/second-brain-cloudflare

If this was helpful, please give it a star.

I gave Claude a persistent memory for $0/month using Cloudflare

Rahil Pirani — Sun, 10 May 2026 05:35:41 +0000

I gave Claude a persistent memory for $0/month using Cloudflare

Claude is great. But every time you start a new conversation, it forgets everything. Your projects, your preferences, what you decided last week — gone.

The official memory feature exists, but it's vague and you can't really control it. You can't query it, tag it, or search it semantically. It's a black box that occasionally surfaces something useful.

So I built my own.

What it is

It's a self-hosted MCP server that runs on Cloudflare Workers. Four tools: remember, recall, list_recent, forget. Claude calls them automatically. You never think about it.

The interesting part is how recall works — it's not keyword search. Every note gets embedded as a 384-dimensional vector using bge-small-en-v1.5 on Workers AI. When you ask Claude something, it searches by meaning, not exact words.

Store: "users drop off at the payment step."

Query: "onboarding problems."

It finds it. No keyword overlap needed.

Why Cloudflare

Honestly, cost. The whole stack — Workers, D1 (SQLite), Vectorize, Workers AI embeddings — runs on Cloudflare's free tier at personal scale. You don't even need a credit card to get started.

The other reason is deployment. There's a one-click deploy button that provisions everything automatically. It takes about 3 minutes to go from zero to a running second brain connected to Claude Desktop.

How to set it up

1. Deploy — click the button in the repo, Cloudflare provisions D1 + Vectorize and deploys the Worker.

2. Run the schema — one SQL snippet in the Cloudflare dashboard.

3. Set your auth token — one command with wrangler.

4. Connect Claude Desktop — add a few lines to your config JSON:

{
  "mcpServers": {
    "second-brain": {
      "command": "npx",
      "args": ["mcp-remote", "https://<your-worker-url>/mcp"]
    }
  }
}

That's it. Claude now has persistent memory across every conversation.

What I actually use it for

I have Claude set up to call recall at the start of every conversation, before it says anything. So when I open a new chat and say "continue the onboarding work from last week," it already knows what that means.

I also capture from everywhere — there's a browser bookmarklet that saves any highlighted text or page with one click, and iOS Shortcuts for voice capture on the go. "Hey Siri, brain dump" and I can dictate a note that shows up in Claude's memory immediately.

What it doesn't do (yet)

There's no UI for browsing your memory. You can hit the /list endpoint, but it's raw JSON. I want to build a proper dashboard eventually — something that shows your memory visually, lets you edit or delete entries, maybe shows what Claude has recalled most often.

Also, the local dev experience is slightly annoying because Vectorize and Workers AI don't run locally — you end up pointing at remote resources for real testing. Not a dealbreaker, but worth knowing.

The repo

Everything is open source under MIT. One-click deploy, manual setup instructions, iOS Shortcuts templates, bookmarklet source — it's all there.

→ github.com/rahilp/second-brain-cloudflare

If you use it, I'd genuinely like to know what you end up storing in it. That's the part I'm most curious about — what people actually find worth remembering.