DEV Community

I Built a Knowledge Evaluator That Uses Notion to Judge What's Worth Remembering

Daniel Nwaneri on March 12, 2026

Foundation kept everything. After 300 conversations, search returned noise. That was the problem I was actually trying to solve. I've been buildin...

Read full post

crow • Mar 14

The relevance scoring angle is interesting — deciding what's worth remembering is half the problem with AI memory systems. I built a semantic memory layer for my local AI that does something similar, storing exchanges as embeddings and retrieving by cosine similarity. The curation problem is real.

Daniel Nwaneri • Mar 14

Cosine similarity tells you what's related. It doesn't tell you what's worth keeping. That distinction is what the scoring signals are trying to formalize — usage, validation, specificity as proxies for "did this actually matter."
Curious how you handle the promotion decision. Do you auto-promote above a similarity threshold or is there a manual gate somewhere in the loop

crow • Mar 14

Right now it's threshold only — no promotion layer. Your point about usage and validation as proxies is exactly the gap I'm going to close. Building a memory promotion layer this weekend — usage frequency + outcome scores from the regret index as the signal for what's worth keeping. I'm open to any ideas you have in mind as well.

Daniel Nwaneri • Mar 14

The regret index as a promotion signal is clever — outcome scoring after the fact is more honest than trying to predict value upfront. One thing I'd watch: frequency alone promotes what gets repeated, not what's correct. A wrong pattern repeated 10 times scores higher than a right one used once.
The validation signal in my setup tries to catch that — was it confirmed to work, not just used. Might be worth pairing your outcome scores with a recency decay so recent regret-free events outweigh stale frequency. Happy to share the scoring code if useful.

crow • Mar 14

Good catch on the frequency problem — you're right, repeated wrong patterns would outscore a correct one used once. I added recency decay alongside the outcome scores, half-life around 42 days. Recent regret-free retrievals now outweigh stale frequency. The validation signal idea is interesting — do you confirm correctness explicitly or infer it from downstream success?

Daniel Nwaneri • Mar 15

Inferred, not explicit. The validation signal fires when the excerpt contains confirmation language - "confirmed in production," "tested," "works on," referenced docs. It's pattern matching on the text, not downstream tracking.
Explicit confirmation would be stronger but it requires closing the loop after the fact - knowing whether the thing actually worked. I don't have that yet. The regret index you have is closer to true validation than anything I'm doing. You're measuring actual outcomes. I'm measuring stated confidence at capture time.

The 42-day half-life is interesting - how did you land on that number?

leob • Mar 12

What's so special about Notion, couldn't you just use whatever database/table(s) as the "review" queue, or was it just that it turned out to be a convenient choice?

Daniel Nwaneri • Mar 12

You could and in production Foundation uses Vectorize/D1 for this. For this challenge, Notion was the right choice for 2 reasons: it's a human-readable UI with no frontend to build and the MCP server let Claude Desktop query the same Review Queue the Worker writes to. That bidirectional loop — REST writes, MCP reads was the point of the submission.

leob • Mar 13

Okay makes sense :-)

(I'm not familiar with Notion, never used it)

Daniel Nwaneri • Mar 13

Notion is basically a flexible database with a built-in UI that non-developers can actually use. That human-friendly layer is what made it the right fit here.

Swift • Mar 12

Personally I would think about Notion as an abstracted front-end interface. It's where the collaboration and data originates, the agent is taking advantage of that using the MCP connection.

Daniel Nwaneri • Mar 12

Exactly this. And the MCP connection is what makes it more than just a UI. The Worker writes to Notion via REST, Claude Desktop queries it back via the MCP server and a human resolves it in the same view. 3 different actors, one surface.

Juan Isidoro García Cifuentes • Mar 17

I love it! It's an idea with a lot of potential. You've identified a significant gap. Best of luck!!

Daniel Nwaneri • Mar 17

Thanks,the gap felt obvious once I hit it — 300 conversations in Foundation and search started returning noise instead of signal. The evaluator is the piece that was always missing. Appreciate the kind words.

Paul R • Mar 15

the video implementation wise could be improved,to me some were point blank basic

Daniel Nwaneri • Mar 16

Appreciate the feedback. The videos are screen recordings of a live system - the value is in watching the Worker evaluate, route to Notion, and have Claude read it back via MCP in real time. Happy to hear what specifically you'd improve.