DEV Community

Cover image for The challenges of creating a semantic memory layer on Cloudflare Workers, D1, and Vectorize.
Rahil Pirani
Rahil Pirani

Posted on

The challenges of creating a semantic memory layer on Cloudflare Workers, D1, and Vectorize.

The main concept is straightforward: embed text, store the vector, and query it later. The time-consuming part was everything else.

I created a memory layer that maintains context across AI tools using Cloudflare Workers, D1, Vectorize, and Workers AI. All this operates on the free tier. Here’s what I didn’t realize at first.


Two stores, kept strictly separate

D1 stores structured entry data, including content, tags, timestamps, importance scores, and the exact vector IDs put into Vectorize. Vectorize holds the embeddings, linked by UUID.

export interface Env {
  DB: D1Database;
  VECTORIZE: VectorizeIndex;
  AI: Ai;
  AUTH_TOKEN: string;
}
Enter fullscreen mode Exit fullscreen mode

Don’t consider Vectorize your source of truth for content. It functions as a lookup index. D1 is the database. This distinction becomes crucial when updating or deleting data.


Chunk at sentence boundaries, not character counts

Long entries split before embedding. Breaking on character counts can lose semantic context. The solution is to look back for the nearest sentence or newline break before deciding on a split point.

function chunkText(text: string, maxChars = 1600, overlapChars = 200): string[] {
  if (text.length <= maxChars) return [text];
  const chunks: string[] = [];
  let start = 0;
  while (start < text.length) {
    let end = start + maxChars;
    if (end < text.length) {
      const lastPeriod = text.lastIndexOf(".", end);
      const lastNewline = text.lastIndexOf("\n", end);
      const breakPoint = Math.max(lastPeriod, lastNewline);
      if (breakPoint > start + maxChars / 2) end = breakPoint + 1;
    }
    chunks.push(text.slice(start, Math.min(end, text.length)).trim());
    start = end - overlapChars;
  }
  return chunks.filter((c) => c.length > 0);
}
Enter fullscreen mode Exit fullscreen mode

Each chunk receives its own Vectorize vector with a parentId in the metadata linking back to the D1 entry. Keep track of the exact vector IDs returned and save them in D1 since Vectorize doesn’t support a "delete where parentId = x" operation.

await env.DB.prepare(
  `UPDATE entries SET vector_ids = ? WHERE id = ?`
).bind(JSON.stringify(vectorIds), id).run();
Enter fullscreen mode Exit fullscreen mode

One Vectorize query, three decisions

With each write, a single embed and Vectorize query manage duplicate detection, contradiction detection, and merge decisions in one round trip. Three score bands dictate the next steps.

const DUPLICATE_BLOCK_THRESHOLD = 0.95;
const DUPLICATE_FLAG_THRESHOLD = 0.85;
const CANDIDATE_SCORE_THRESHOLD = 0.45;
Enter fullscreen mode Exit fullscreen mode

>= 0.95: Exact or near-exact duplicate. Block it and skip the LLM entirely.

0.85 to 0.95: Similar enough to require a decision. A combined prompt goes to the LLM asking for one of four actions: contradiction, replace, merge, or keep_both.

0.45 to 0.85: These are candidates for contradiction only. A lighter prompt is used, with no merge logic.

The combined prompt in the flagged band looks like this:

Choose exactly one action. Prioritise in this order:
1. "contradiction"  new memory DIRECTLY CONFLICTS with an existing one
2. "replace"  new memory clearly supersedes an existing one
3. "merge"  both memories are complementary and better as one combined entry
4. "keep_both"  memories are different enough to coexist

Respond with JSON only.
{"action":"keep_both"} OR
{"action":"contradiction","conflicting_id":"<id>","reason":"<10 words max>"} OR
{"action":"replace","target_id":"<id>"} OR
{"action":"merge","target_id":"<id>","merged_content":"<text>"}
Enter fullscreen mode Exit fullscreen mode

You must validate the returned ID against the candidate set before taking action. LLMs can generate incorrect IDs.


Stale vector cleanup is not optional

When merging occurs, write the new canonical entry and delete both originals. Deleting from D1 is straightforward. Deleting from Vectorize requires the exact IDs you saved earlier.

if (oldVectorIds.length) await env.VECTORIZE.deleteByIds(oldVectorIds);
Enter fullscreen mode Exit fullscreen mode

If you skip this, orphaned vectors will silently accumulate. They will appear in recall results, inflate scores, and create matches pointing to entries that no longer exist in D1. This issue is hard to diagnose but easy to prevent.

For updates, the safe approach is to insert new vectors first, confirm success, and then delete the old ones. Reversing that order can lead to lost data if the insert fails after the delete succeeds.


Cosine similarity is not enough for reranking

Raw Vectorize scores are based on cosine similarity. Once entries vary in age and access frequency, this becomes a poor measure of relevance. The reranker applies three multipliers.

Recency: Exponential decay based on tag-aware half-lives. Tasks decay in 7 days, work entries in 3 months, and context entries in 6 months.

Frequency: Uses log1p of recall count, allowing frequently accessed entries to surface higher without overshadowing newer ones.

Importance: A score from 1 to 5 based on a separate LLM pass during write, scaled to a multiplier of 0.88 to 1.20 so high-importance entries can surpass the recency cap.

const recencyMultiplier = Math.exp(-ageMs / halfLifeMs);
const frequencyMultiplier = 1 + Math.log1p(rc);
const importanceMultiplier = imp === 0 ? 1.0 : 0.8 + (imp / 5) * 0.4;

return {
  ...match,
  score: match.score * recencyMultiplier * frequencyMultiplier * importanceMultiplier
};
Enter fullscreen mode Exit fullscreen mode

Short appends and rolled-up entries will receive a penalty to keep noise out of the top results.


The topK multiplier problem

Multiply your topK by at least 3 before deduplicating by parentId. If an entry has 4 chunks and you query topK: 5, those 4 chunks can take most of your result budget before you have seen enough unique parents.

const VECTORIZE_TOP_K_MULTIPLIER = 3;
Enter fullscreen mode Exit fullscreen mode

Query wider and then deduplicate by parentId in your application code.


The full implementation is open source if you want to explore any of these topics: github.com/rahilp/second-brain-cloudflare

Top comments (0)