<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: gen99</title>
    <description>The latest articles on DEV Community by gen99 (@_d1ea2a1f71316e743f41).</description>
    <link>https://dev.to/_d1ea2a1f71316e743f41</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3921053%2F930909ed-9e29-44b8-9008-a709e57320db.png</url>
      <title>DEV Community: gen99</title>
      <link>https://dev.to/_d1ea2a1f71316e743f41</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_d1ea2a1f71316e743f41"/>
    <language>en</language>
    <item>
      <title>I spent 5 weeks building an open-source multi-agent orchestrator. The hard part wasn't the agents — it was the memory.</title>
      <dc:creator>gen99</dc:creator>
      <pubDate>Tue, 02 Jun 2026 15:36:47 +0000</pubDate>
      <link>https://dev.to/_d1ea2a1f71316e743f41/i-spent-5-weeks-building-an-open-source-multi-agent-orchestrator-the-hard-part-wasnt-the-agents--43j3</link>
      <guid>https://dev.to/_d1ea2a1f71316e743f41/i-spent-5-weeks-building-an-open-source-multi-agent-orchestrator-the-hard-part-wasnt-the-agents--43j3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is the launch post in a series on building &lt;strong&gt;Praxia&lt;/strong&gt;, an Apache-2.0 multi-agent orchestrator. Later posts go deep on the TiDB Vector memory backend and a Japanese-specialized STT integration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;This spring I built and shipped &lt;strong&gt;Praxia&lt;/strong&gt;, a multi-agent orchestrator OS, from scratch in about 5 weeks of nights and weekends (Apache-2.0).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🚀 &lt;strong&gt;PyPI&lt;/strong&gt;: &lt;code&gt;pip install praxia&lt;/code&gt; — &lt;a href="https://pypi.org/project/praxia/" rel="noopener noreferrer"&gt;https://pypi.org/project/praxia/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/praxia-dev/praxia" rel="noopener noreferrer"&gt;https://github.com/praxia-dev/praxia&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🎬 &lt;strong&gt;60-second demo&lt;/strong&gt;: &lt;a href="https://youtu.be/o_6NbjJU1AA" rel="noopener noreferrer"&gt;https://youtu.be/o_6NbjJU1AA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Landing&lt;/strong&gt;: &lt;a href="https://praxia.tools/" rel="noopener noreferrer"&gt;https://praxia.tools/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The differentiator is &lt;strong&gt;automatic personal → organizational memory promotion&lt;/strong&gt;. The "prompts that actually work," which a senior engineer painstakingly tunes, usually stay locked in that one person's head. Praxia tries to solve that with a &lt;strong&gt;5-layer memory stack + a 3-path promotion engine&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually started — the decision
&lt;/h2&gt;

&lt;p&gt;I'd been using LangChain / CrewAI / AutoGen at work since late 2025, and one structural discomfort kept nagging at me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What really separates a good agent from a bad one isn't the library or the model — it's the &lt;strong&gt;accumulated domain-specific trial and error&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that accumulation almost always lives in one senior person's head (their Cursor / VS Code / Obsidian). Tacit knowledge that evaporates the day they leave. General-purpose frameworks are &lt;strong&gt;powerless&lt;/strong&gt; against that.&lt;/p&gt;

&lt;p&gt;In April 2026 I started writing code between day-job hours. A month later I shipped &lt;code&gt;v0.1.0&lt;/code&gt; to PyPI and GitHub.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why existing frameworks didn't cut it
&lt;/h2&gt;

&lt;p&gt;Four walls I hit in practice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Wall&lt;/th&gt;
&lt;th&gt;What was happening&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2-3 days just to get something running. Can't make a production call.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tacit knowledge doesn't propagate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The prompts that work stay in one person's private space.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No evidence for evaluation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"It runs" doesn't guarantee "it works."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agents stagnate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Build it once, and there's no feedback loop.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The second one was the killer. General agent frameworks give you strong primitives, but &lt;strong&gt;the process of accumulating knowledge into the organization is left entirely to the implementer&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core design — 5 layers + 3 paths
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The 5-layer memory stack
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L1 PersonalMemory   Per-user (6 backends: JSON / Mem0 / Letta / Zep / Hindsight / LangMem)
L2 PromotionEngine  Nightly batch. Decides L1 → L3 promotion
L3 SharedMemory     Org-wide. RBAC gating, time decay
L4 MarkdownStore    Git-managed, PR review required, immutable
L5 GraphLayer       Optional (Zep / Graphiti), relation extraction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is that &lt;strong&gt;L1 → L4 is not manual&lt;/strong&gt;. A sleep-time consolidator (nightly batch) scans personal memory and auto-promotes the right parts into organizational knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 3-path promotion engine
&lt;/h3&gt;

&lt;p&gt;Memory promotion is evaluated in parallel across &lt;strong&gt;three independent signals&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Frequency&lt;/strong&gt; — facts repeated across N+ people&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outcome correlation&lt;/strong&gt; — co-occurrence with wins / approved PRs / passing tests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM self-eval&lt;/strong&gt; — a 0..1 "org-knowledge candidacy" score&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The final score is a weighted blend, and &lt;strong&gt;any single decisive path triggers promotion&lt;/strong&gt;. This deliberately avoids single-mechanism dependence (where one broken signal takes the whole thing down).&lt;/p&gt;




&lt;h2&gt;
  
  
  The design choices that made "I can build this myself" possible
&lt;/h2&gt;

&lt;p&gt;Shipping in 5 weeks came down to four design choices.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Seven extension points
&lt;/h3&gt;

&lt;p&gt;Every extension point is built on the same &lt;code&gt;praxia.extensions.Registry&lt;/code&gt; primitive:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Extension point&lt;/th&gt;
&lt;th&gt;~LoC&lt;/th&gt;
&lt;th&gt;Entry point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Connector (Box / Notion / Slack …)&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.connectors&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory backend&lt;/td&gt;
&lt;td&gt;~80&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.memory_backends&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File parser&lt;/td&gt;
&lt;td&gt;~30&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.parsers&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output exporter&lt;/td&gt;
&lt;td&gt;~30&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.exporters&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OAuth provider&lt;/td&gt;
&lt;td&gt;~20&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.oauth_providers&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.skills&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flow&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;td&gt;&lt;code&gt;praxia.flows&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"&lt;strong&gt;Extend via a &lt;code&gt;pyproject.toml&lt;/code&gt; entry point, never edit core files&lt;/strong&gt;" keeps the cognitive load low when you write your own.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Apache-2.0 with everything included (no paywall)
&lt;/h3&gt;

&lt;p&gt;SSO (Google / Microsoft Entra / Okta / GitHub / Keycloak), RBAC, audit logs, per-user OAuth (13 providers), KMS-backed token encryption (AWS / Azure / GCP / Vault / local) — &lt;strong&gt;all in the OSS core&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Most of what commercial agent platforms paywall as an "Enterprise tier" ships here under Apache-2.0. That directly speeds up adoption decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 100+ providers via LiteLLM
&lt;/h3&gt;

&lt;p&gt;Provider quirks (Anthropic not supporting &lt;code&gt;response_format&lt;/code&gt;, GPT-5.x disallowing &lt;code&gt;temperature&lt;/code&gt;, Azure's deployment-name format, …) are &lt;strong&gt;absorbed at the LiteLLM layer&lt;/strong&gt;. No provider-specific API keys leak into Praxia core. Fully offline operation is possible too (Ollama + a local model + &lt;code&gt;backend=json&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. A Streamlit UI that's easy to throw away
&lt;/h3&gt;

&lt;p&gt;The UI is Streamlit, but the &lt;strong&gt;backend also runs headless as &lt;code&gt;praxia serve&lt;/code&gt; (FastAPI)&lt;/strong&gt;. When I swap to Next.js or mobile later, I throw away only the UI layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What went into v0.1.0, and what didn't
&lt;/h2&gt;

&lt;p&gt;Shipped (deliberately a full set):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5-layer memory + 3-path promotion engine&lt;/li&gt;
&lt;li&gt;6 business skills (investment / sales / design / procurement / patent / legal)&lt;/li&gt;
&lt;li&gt;6 LTM backends + Composite/Routed parallel fusion&lt;/li&gt;
&lt;li&gt;Per-user OAuth for 13 providers&lt;/li&gt;
&lt;li&gt;SSO + RBAC + ACL + audit logs&lt;/li&gt;
&lt;li&gt;Autonomous agent (LLM-driven tool-use loop)&lt;/li&gt;
&lt;li&gt;Document Designer (sandboxed &lt;code&gt;python-pptx&lt;/code&gt; / &lt;code&gt;docx&lt;/code&gt; → designed file output)&lt;/li&gt;
&lt;li&gt;i18n in 8 languages (en / ja / zh-CN / ko / es / fr / de / pt-BR)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deliberately deferred (v0.2+):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenant GUI&lt;/strong&gt; — OSS targets "single-org self-host"; SaaS-grade tenant isolation belongs in a future Open Core tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF output&lt;/strong&gt; — LibreOffice-based workflow recommended for now&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native Pinecone / Weaviate / Qdrant backends&lt;/strong&gt; — can be wrapped via &lt;code&gt;mem0&lt;/code&gt; through Composite/Routed, so low priority&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "what NOT to build" list was as important as the "what to build" list.&lt;/p&gt;




&lt;h2&gt;
  
  
  The TOP 3 things that actually ate my time
&lt;/h2&gt;

&lt;p&gt;This was a "ship something working in 5 weeks" sprint, and the three things that genuinely ate time were all in the &lt;strong&gt;memory layer and promotion engine&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Blending three signals that live on completely different scales
&lt;/h3&gt;

&lt;p&gt;The PromotionEngine evaluates L1 → L3 promotion across &lt;strong&gt;Frequency / Outcome correlation / LLM self-eval&lt;/strong&gt; in parallel. This was trickier than I expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frequency&lt;/strong&gt; is an integer in 0..∞ (reference count for the same fact)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outcome correlation&lt;/strong&gt; is a ratio in 0..1 (success rate of tasks where the fact co-occurred)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM self-eval&lt;/strong&gt; is a continuous 0..1 — but &lt;strong&gt;non-deterministic&lt;/strong&gt;, and the score distribution differs per provider (GPT-5 strict, median ~0.4; Claude lenient ~0.7; Gemini bimodal)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My first implementation was a plain weighted average. Because frequency is unbounded, "facts I just happened to touch a lot in L1" floated to the top. I ended up here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PromoteSignal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;frequency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;       &lt;span class="c1"&gt;# raw count
&lt;/span&gt;    &lt;span class="n"&gt;outcome_corr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;    &lt;span class="c1"&gt;# 0..1
&lt;/span&gt;    &lt;span class="n"&gt;self_eval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;       &lt;span class="c1"&gt;# 0..1, median of N=3 LLM calls
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decide_promotion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PromoteSignal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PromoteConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Z-score normalize on a rolling 30-day population, then sigmoid
&lt;/span&gt;    &lt;span class="n"&gt;z_freq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;frequency&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freq_mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freq_std&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# OR logic with per-path threshold
&lt;/span&gt;    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;z_freq&lt;/span&gt;              &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;freq_threshold&lt;/span&gt;       &lt;span class="c1"&gt;# 0.85
&lt;/span&gt;        &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outcome_corr&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outcome_threshold&lt;/span&gt;    &lt;span class="c1"&gt;# 0.70
&lt;/span&gt;        &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;self_eval&lt;/span&gt;    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;self_eval_threshold&lt;/span&gt;  &lt;span class="c1"&gt;# 0.80
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Z-score normalization&lt;/strong&gt; tames the runaway frequency signal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OR logic with per-path thresholds&lt;/strong&gt; means "any single decisive path promotes" — deliberately avoiding single-mechanism dependence&lt;/li&gt;
&lt;li&gt;LLM self-eval uses the &lt;strong&gt;median of N=3 calls&lt;/strong&gt; to average out non-determinism (I tried N=5 — the benefit plateaued)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;decide_promotion&lt;/code&gt; is a &lt;strong&gt;pure function&lt;/strong&gt;, so I can replay historical promotion logs to do parameter sensitivity analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It took ~5 redesigns to land on "Z-score → sigmoid → OR with thresholds." Lesson: &lt;strong&gt;don't start signal fusion with a weighted average. Per-path decisive thresholds + OR turned out to be the most robust.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Reconciling "fan-out search × single-destination write" in the Composite backend
&lt;/h3&gt;

&lt;p&gt;The Composite backend sends search queries in parallel to multiple backends (JSON / Mem0 / TiDB / Letta…) and fuses results with Reciprocal Rank Fusion (RRF). But writes go to a single backend specified by &lt;code&gt;write_to=&lt;/code&gt;. This asymmetry created three pitfalls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(a) Add a backend later, and past data is invisible to it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;Composite(backends=[A, B], write_to="A")&lt;/code&gt; and then adding C means C has none of the past writes. Search fan-out becomes lopsided — 2 backends hit, 1 doesn't. I discovered this during dogfooding as "one backend just has lower search quality."&lt;/p&gt;

&lt;p&gt;→ Added a &lt;code&gt;replay_writes(source=A, target=C, since=...)&lt;/code&gt; admin API after the fact. Needed for rebalancing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(b) Graceful degradation when &lt;code&gt;write_to&lt;/code&gt; goes down&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;write_to&lt;/code&gt; is down, stopping writes but continuing reads looks attractive. But that can cause &lt;strong&gt;"a record that was hitting in search disappears on the next search"&lt;/strong&gt; (because the write never re-runs after recovery).&lt;/p&gt;

&lt;p&gt;I ended up &lt;strong&gt;dropping graceful degradation and raising instead&lt;/strong&gt;. Half-baked availability stacks a data-truthfulness problem on top of eventual consistency. "&lt;strong&gt;When it's down, honestly say it's down&lt;/strong&gt;" turned out to be the easiest to operate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(c) Tie-breaking in RRF fusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When multiple backends return the &lt;strong&gt;same record_id at rank 1&lt;/strong&gt;, the scores tie exactly. Without a tie-breaking rule, ordering depends on backend registration order, and CI tests go flaky.&lt;/p&gt;

&lt;p&gt;→ Introduced lexicographic tie-breaking on &lt;code&gt;(rrf_score, timestamp DESC, backend_priority)&lt;/code&gt;. Now ordering is reproducible even in CI.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Idempotency of the sleep-time consolidator
&lt;/h3&gt;

&lt;p&gt;Since the nightly batch re-runs L1 → L3 promotion, &lt;strong&gt;not promoting the same fact twice&lt;/strong&gt; is the crux of idempotency. Strict &lt;code&gt;record_id&lt;/code&gt;-based dedup wasn't enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The same user records "Customer X prioritizes ROI" and "X-san values ROI" as two different phrasings → different record_ids, semantically identical&lt;/li&gt;
&lt;li&gt;LLM self-eval is &lt;strong&gt;non-deterministic&lt;/strong&gt;: the same text scores 0.78 / 0.82 / 0.76, so near a threshold you get "didn't promote yesterday, promoted today"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I added &lt;strong&gt;fuzzy dedup&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l3_recent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Already-promoted check: any L3 record within recent window whose
&lt;/span&gt;    &lt;span class="c1"&gt;# embedding cosine-sim &amp;gt;= threshold is treated as duplicate
&lt;/span&gt;    &lt;span class="n"&gt;cand_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;cosine_sim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cand_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.92&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;l3_recent&lt;/span&gt;  &lt;span class="c1"&gt;# already filtered to last 30 days
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hard part was &lt;strong&gt;tuning the two thresholds (similarity + window)&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Problem it causes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;0.95 / 7 days&lt;/strong&gt; (tight)&lt;/td&gt;
&lt;td&gt;"Same fact" with different wording ends up duplicated across L3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;0.85 / 90 days&lt;/strong&gt; (loose)&lt;/td&gt;
&lt;td&gt;"New but similar facts" (e.g. a similar trend for customer Y) get suppressed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;0.92 / 30 days&lt;/strong&gt; (adopted)&lt;/td&gt;
&lt;td&gt;On a hand-labeled set of 50, both false-merge and false-split stayed &amp;lt; 5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The deciding factor was &lt;strong&gt;instrumenting both directions at once&lt;/strong&gt;. Not just "rate of missed duplicates" but also &lt;strong&gt;"rate of treating genuinely distinct facts as duplicates."&lt;/strong&gt; Track only one side and you inevitably tune lopsided. I later applied this to the PromotionEngine's whole calibration loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  v0.1.0 by the numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Codebase&lt;/td&gt;
&lt;td&gt;~25,000 LoC (Python + tests + i18n)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tests&lt;/td&gt;
&lt;td&gt;431 passing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported LLM providers&lt;/td&gt;
&lt;td&gt;100+ via LiteLLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bundled connectors&lt;/td&gt;
&lt;td&gt;19 (pull + push)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-user OAuth providers&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docs&lt;/td&gt;
&lt;td&gt;30,000+ chars across EN + JA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dev time&lt;/td&gt;
&lt;td&gt;~5 weeks (nights and weekends)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Next 3 months
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;v0.2: First-class TiDB Vector / pgvector backends (currently via &lt;code&gt;mem0&lt;/code&gt; wrap)&lt;/li&gt;
&lt;li&gt;v0.2: localhost loopback OAuth (drop the &lt;code&gt;praxia serve&lt;/code&gt; requirement)&lt;/li&gt;
&lt;li&gt;v0.3: Multi-tenant org features (the Open Core entry point)&lt;/li&gt;
&lt;li&gt;Docs: expanded English tutorials&lt;/li&gt;
&lt;li&gt;Community: Discord, more active Discussions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  For anyone in the same spot
&lt;/h2&gt;

&lt;p&gt;Three things I felt after sprinting for 5 weeks, for anyone debating whether to start a solo OSS project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ship frequency over polish.&lt;/strong&gt; Shipping &lt;code&gt;v0.1.0&lt;/code&gt; teaches you far more than polishing &lt;code&gt;v0.0.4&lt;/code&gt; forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State your differentiator in one line.&lt;/strong&gt; For Praxia: "personal → org memory auto-promotion." If you can't say it, you can't write the post, can't pitch it, and PRs won't come.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The real competitor for OSS is the commercial SaaS solving the same problem&lt;/strong&gt; — not LangChain or CrewAI, but the paywall on paid agent platforms. Bundling those features under Apache-2.0 is itself the differentiation.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;⭐ Stars / 🍴 Forks / Issues / PRs all welcome.&lt;br&gt;
&lt;strong&gt;&lt;a href="https://github.com/praxia-dev/praxia" rel="noopener noreferrer"&gt;github.com/praxia-dev/praxia&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you liked this, the 60-second demo is here: &lt;a href="https://youtu.be/o_6NbjJU1AA" rel="noopener noreferrer"&gt;https://youtu.be/o_6NbjJU1AA&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Connecting "individual brilliance × organizational continuity" with AI — that's the mission Praxia started with this spring.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next in this series: the enterprise-platform features I put directly in the OSS core (SSO, RBAC, audit logs, KMS-envelope-encrypted OAuth tokens) and why none of them are behind an Enterprise tier.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
