<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nguyen Thien</title>
    <description>The latest articles on DEV Community by Nguyen Thien (@thien_nguyen).</description>
    <link>https://dev.to/thien_nguyen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4009551%2F6f750961-b798-4b58-a465-7dcd1c42ba2a.png</url>
      <title>DEV Community: Nguyen Thien</title>
      <link>https://dev.to/thien_nguyen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thien_nguyen"/>
    <language>en</language>
    <item>
      <title>We built Nebula: GraphRAG that runs in your browser tab, not someone else's cloud</title>
      <dc:creator>Nguyen Thien</dc:creator>
      <pubDate>Tue, 30 Jun 2026 13:19:40 +0000</pubDate>
      <link>https://dev.to/thien_nguyen/we-built-nebula-graphrag-that-runs-in-your-browser-tab-not-someone-elses-cloud-31gd</link>
      <guid>https://dev.to/thien_nguyen/we-built-nebula-graphrag-that-runs-in-your-browser-tab-not-someone-elses-cloud-31gd</guid>
      <description>&lt;p&gt;Most AI note apps ship your notes to a cloud vector database and a hosted model, then ask you to trust the privacy policy. For the work we do (regulated industries, sensitive data) that is a non-starter. So we built the opposite and open-sourced it: &lt;strong&gt;Nebula&lt;/strong&gt;, a private, local-first AI knowledge base that runs entirely inside a browser tab. No backend, no account, no server. Its tagline says it plainly: notes that think, nothing leaves your device.&lt;/p&gt;

&lt;p&gt;Repo: &lt;strong&gt;&lt;a href="https://github.com/beevr-labs/Nebula" rel="noopener noreferrer"&gt;https://github.com/beevr-labs/Nebula&lt;/a&gt;&lt;/strong&gt; (Apache-2.0). Live demo, no signup: &lt;strong&gt;&lt;a href="https://beevr-labs.github.io/Nebula/" rel="noopener noreferrer"&gt;https://beevr-labs.github.io/Nebula/&lt;/a&gt;&lt;/strong&gt;. Here is why we went fully on-device, and what it cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy by architecture, not by promise
&lt;/h2&gt;

&lt;p&gt;The usual privacy pitch is a policy: "we won't look at your data." Nebula's is structural: there is nowhere for your data to go. Everything runs in the browser. Notes, embeddings, and the search index live in local browser storage. There is no sync service, no account system, and therefore no server to breach or to put under a data-processing agreement. For sensitive notes (client records, health information, anything you would not paste into a cloud chatbot) that is the whole point.&lt;/p&gt;

&lt;h2&gt;
  
  
  What runs where
&lt;/h2&gt;

&lt;p&gt;It is a SvelteKit single-page app that does real ML in the browser:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On-device chat&lt;/strong&gt; via WebLLM, GPU-accelerated with &lt;strong&gt;WebGPU&lt;/strong&gt;. You pick the model, from tiny-and-fast to large-and-accurate, and Nebula shows the download size before you commit. Qwen and Llama models are supported.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search&lt;/strong&gt; powered by &lt;strong&gt;bge-m3&lt;/strong&gt; (Apache-2.0), about 570 MB on first use, then cached and fully offline. It is multilingual, including Vietnamese, so it works across mixed-language notes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebAssembly&lt;/strong&gt; handles the compute-heavy parts.&lt;/li&gt;
&lt;li&gt;After the first model download, the whole thing works &lt;strong&gt;offline&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why a graph, not just vectors
&lt;/h2&gt;

&lt;p&gt;Flat vector search finds notes that are similar. It does not understand that "the client from the Tuesday call" and "Acme Corp" are the same entity across ten different notes. Nebula builds an &lt;strong&gt;entity knowledge graph&lt;/strong&gt; automatically (people, projects, clients) and uses GraphRAG to answer questions by walking those relationships, then links every answer back to the source notes. You ask in plain language and get an answer you can trace, instead of a keyword hunt across disconnected files.&lt;/p&gt;

&lt;h2&gt;
  
  
  It is also just a good notes app
&lt;/h2&gt;

&lt;p&gt;The AI is useless if the notes app underneath is not real, so it is: Markdown, wikilinks and backlinks, tabs, a quick switcher, daily notes, templates, tags, and folders. You can bring your own files (PDF, CSV, text) and export the whole vault as plain &lt;code&gt;.md&lt;/code&gt; files whenever you want. No lock-in: your notes go in and out as portable Markdown. The codebase ships with 430+ automated tests, because local-first does not mean fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard parts (what we learned)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On-device models are smaller, so structure has to carry more weight.&lt;/strong&gt; The knowledge graph recovers context that a small local model alone would miss, which is a big part of why we went graph-first instead of leaning on raw model size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainable retrieval matters as much as accuracy.&lt;/strong&gt; Showing the path through the graph back to source notes is what makes the answer trustworthy, and for regulated buyers that traceability is not a nice-to-have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The browser is a surprisingly capable runtime in 2026.&lt;/strong&gt; WebGPU plus WebAssembly means "install nothing, runs offline, GPU-accelerated" is actually achievable, not a science project.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why open source
&lt;/h2&gt;

&lt;p&gt;The same reason we open-source the rest of our hardest work: in AI, "verifiable" beats "trust me." A buyer evaluating us for sensitive data can read exactly how retrieval works, and confirm for themselves that nothing leaves the device, instead of taking our word for it.&lt;/p&gt;

&lt;p&gt;Nebula is Apache-2.0 at &lt;strong&gt;&lt;a href="https://github.com/beevr-labs/Nebula" rel="noopener noreferrer"&gt;https://github.com/beevr-labs/Nebula&lt;/a&gt;&lt;/strong&gt;, with a live demo at &lt;strong&gt;&lt;a href="https://beevr-labs.github.io/Nebula/" rel="noopener noreferrer"&gt;https://beevr-labs.github.io/Nebula/&lt;/a&gt;&lt;/strong&gt;. If you need AI built on sensitive or regulated data, on-device or otherwise, made to survive an audit rather than just a demo, &lt;a href="https://beevr.ai/ai-development-company" rel="noopener noreferrer"&gt;here is how we work&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Anyone else running RAG fully in the browser? What model and hardware combo is actually working for you?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://beevr.ai/blog/nebula-on-device-graphrag" rel="noopener noreferrer"&gt;beevr.ai&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>We open-sourced Kite, our agent framework. Here is what building production agents taught us.</title>
      <dc:creator>Nguyen Thien</dc:creator>
      <pubDate>Tue, 30 Jun 2026 13:17:37 +0000</pubDate>
      <link>https://dev.to/thien_nguyen/we-open-sourced-kite-our-agent-framework-here-is-what-building-production-agents-taught-us-1l29</link>
      <guid>https://dev.to/thien_nguyen/we-open-sourced-kite-our-agent-framework-here-is-what-building-production-agents-taught-us-1l29</guid>
      <description>&lt;p&gt;Everyone has an agent demo in 2026. Far fewer have agents they would put in front of a paying customer, an auditor, or a patient. The gap between "it worked in the notebook" and "it works every time, safely, and we can explain what it did" is where most agent projects quietly die, and it is the gap we built &lt;strong&gt;Kite&lt;/strong&gt; to close.&lt;/p&gt;

&lt;p&gt;We just open-sourced it: &lt;strong&gt;&lt;a href="https://github.com/beevr-labs/Kite" rel="noopener noreferrer"&gt;https://github.com/beevr-labs/Kite&lt;/a&gt;&lt;/strong&gt;. It is Python, MIT licensed, and &lt;code&gt;pip install kite-agent&lt;/code&gt; away. This is the honest writeup of why it exists and what we learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem Kite solves
&lt;/h2&gt;

&lt;p&gt;We build production software for regulated industries, so we kept hitting the same wall: the popular agent frameworks are great for a prototype and painful for production. Getting to a first working agent in LangChain or AutoGen is a configuration project, and once you are there you still have to bolt on the parts that actually matter in production: guardrails, retries, idempotency, observability, evaluation. We were rebuilding that same scaffolding for every client. Kite is the framework we wish we had started with: opinionated about safety, fast to a running agent, and small enough to read.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one design decision everything hangs on: treat the LLM as untrusted
&lt;/h2&gt;

&lt;p&gt;This is the core idea. In Kite, &lt;strong&gt;the model proposes actions, it does not execute them.&lt;/strong&gt; A controlled kernel sits between the agent and the real world and validates every proposed action against policy before anything runs. So when an agent decides to call &lt;code&gt;agent.run("rm -rf /")&lt;/code&gt;, the kernel refuses it instead of your filesystem finding out the hard way.&lt;/p&gt;

&lt;p&gt;It sounds simple. It changes everything about how comfortable you are giving an agent real tools. The model becomes a planner you can sandbox, not a process with your credentials. For anyone running agents on sensitive data or real infrastructure, that boundary is the difference between a demo and something you can actually deploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get out of the box
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Five reasoning patterns&lt;/strong&gt;, selectable per agent: ReAct (think, act, observe), ReWOO (plan upfront and run steps in parallel, which Kite clocks at roughly 2x faster), Tree of Thoughts (explore multiple paths), Plan-Execute (decompose and replan on failure), and Reflective (generate, critique, improve).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production safety primitives:&lt;/strong&gt; a circuit breaker that stops cascading failures, a kill switch (per-agent or global) for when you need everything to stop now, and idempotency keyed on operation IDs so a retried action does not charge a customer twice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval that is not a toy:&lt;/strong&gt; HyDE, hybrid BM25 plus vector search, MMR deduplication, and reranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt A/B testing&lt;/strong&gt; with statistical confidence intervals on real traffic, because "the new prompt feels better" is not a deployment criterion.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What it looks like
&lt;/h2&gt;

&lt;p&gt;The fastest path is the generator. Describe the agent, get a runnable file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;kite-agent
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key
kite generate &lt;span class="s2"&gt;"research assistant that searches and summarizes"&lt;/span&gt; &lt;span class="nt"&gt;--out&lt;/span&gt; agent.py
python agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or build one directly in Python and pick the reasoning pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kite&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Kite&lt;/span&gt;

&lt;span class="n"&gt;ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Kite&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;react&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kite's own benchmarks put time to first agent at under a minute (versus roughly 30 minutes for LangChain and 20 for AutoGen in their tests) and cold startup around 50ms (versus ~2s and ~1s). Take the comparison as the authors' figures, not an audit, but the design intent is clear: get to a safe, running agent fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we learned running agents in production
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The model is about 10% of the work.&lt;/strong&gt; The other 90% is tools, retries, guardrails, idempotency, and evaluation. A better model does not save you from a missing kill switch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most "agent failures" are IO failures in disguise.&lt;/strong&gt; A flaky tool, a duplicated side effect, a partial write. Observability and idempotency beat another round of prompt tuning almost every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The untrusted-component framing is freeing, not limiting.&lt;/strong&gt; Once the kernel is the thing that says yes or no, you stop being afraid to hand the agent real capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why we open-sourced it
&lt;/h2&gt;

&lt;p&gt;In a field full of black boxes, "you can read the code" is a differentiator, not a giveaway. We build production AI for regulated industries, and the way we earn a technical buyer's trust is by letting them inspect the hardest parts of our stack instead of taking a pitch on faith.&lt;/p&gt;

&lt;p&gt;Kite is MIT licensed and lives at &lt;strong&gt;&lt;a href="https://github.com/beevr-labs/Kite" rel="noopener noreferrer"&gt;https://github.com/beevr-labs/Kite&lt;/a&gt;&lt;/strong&gt;. Issues and PRs welcome. If you are building production-grade or compliance-bound AI and want a partner who ships the boring 90%, &lt;a href="https://beevr.ai/ai-development-company" rel="noopener noreferrer"&gt;here is how we work&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What are you using to build agents in production, and what keeps breaking? Curious where Kite would and would not help.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://beevr.ai/blog/kite-open-source-agent-framework" rel="noopener noreferrer"&gt;beevr.ai&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>python</category>
    </item>
    <item>
      <title>Your AI agent isn't HIPAA-compliant just because the model is good</title>
      <dc:creator>Nguyen Thien</dc:creator>
      <pubDate>Tue, 30 Jun 2026 12:02:06 +0000</pubDate>
      <link>https://dev.to/thien_nguyen/your-ai-agent-isnt-hipaa-compliant-just-because-the-model-is-good-1m03</link>
      <guid>https://dev.to/thien_nguyen/your-ai-agent-isnt-hipaa-compliant-just-because-the-model-is-good-1m03</guid>
      <description>&lt;p&gt;In 2026 everyone has shipped an AI agent. Far fewer have shipped one they could defend in an audit. Surveys keep finding the same gap: most security leaders are worried about AI-agent risk, and only a handful have actually put mature controls around it. Teams are deploying agents faster than they can govern them, and in healthcare, finance, or anywhere regulated, that's how a great demo becomes a reportable breach.&lt;/p&gt;

&lt;p&gt;Here's the category error underneath it: &lt;strong&gt;a capable model is not a compliant system.&lt;/strong&gt; You can point the best model in the world at protected health information (PHI) and still be wildly non-compliant. Compliance isn't a property of the model; it's a property of the architecture around it. (We've argued before that &lt;a href="https://beevr.ai/blog/is-chatgpt-hipaa-compliant" rel="noopener noreferrer"&gt;a good model doesn't make a tool HIPAA-compliant&lt;/a&gt;; with agents, the gap gets wider.)&lt;/p&gt;

&lt;h2&gt;
  
  
  An agent's compliance surface is bigger than a chatbot's
&lt;/h2&gt;

&lt;p&gt;A chatbot reads and replies. An &lt;em&gt;agent&lt;/em&gt; does things: it calls tools, queries databases, writes records, sends messages, and remembers across turns. Every one of those is a new place regulated data can leak or an unlogged action can happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool calls&lt;/strong&gt; reach into systems that hold PHI, and each tool is a new data path that needs a BAA and least-privilege scoping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous actions&lt;/strong&gt; can change real state (book, cancel, message a patient). Anything affecting care can't be a black-box decision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory and logs&lt;/strong&gt; quietly persist PHI, often in places nobody put under a Business Associate Agreement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data egress&lt;/strong&gt; to a hosted model provider is a transfer of PHI to a third party. No BAA with that provider, no compliance. Full stop.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model is maybe 10% of the risk. The other 90% is everything the agent is wired to &lt;em&gt;touch&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The governance checklist for agents on regulated data
&lt;/h2&gt;

&lt;p&gt;If an agent goes near PHI, these aren't nice-to-haves; they're the difference between "audit-ready" and "liability":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;BAA chain, including the model provider.&lt;/strong&gt; Every service that processes PHI on your behalf (cloud, database, &lt;em&gt;and the LLM API&lt;/em&gt;) needs a signed Business Associate Agreement before a single token flows. A consumer LLM endpoint with no BAA is an instant fail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimize and mask PHI before the model sees it.&lt;/strong&gt; Strip or tokenize identifiers at the boundary. The less PHI reaches the model, the smaller your breach blast radius.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop on anything affecting care.&lt;/strong&gt; Measured accuracy plus a human sign-off, not autonomous decisions on treatment, eligibility, or anything clinical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tamper-evident audit logging of every action and tool call.&lt;/strong&gt; Who, what, when, why, retained per HIPAA's six-year expectation. "What did the agent do at 2am?" must have an answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least-privilege tools.&lt;/strong&gt; Scope each tool to the minimum data and actions it needs. An agent that &lt;em&gt;can&lt;/em&gt; read every record will eventually read the wrong one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No training on PHI.&lt;/strong&gt; Confirm contractually that your data isn't used to train the provider's models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measured, reported accuracy.&lt;/strong&gt; Evaluation is part of the build, not a launch-day afterthought, and in regulated settings you have to be able to show it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  "But it's just RAG / it's read-only"
&lt;/h2&gt;

&lt;p&gt;Doesn't matter. Read-only still means PHI egresses to wherever you embed and store it. RAG still puts patient data in a vector store and a prompt. The questions an auditor asks (&lt;em&gt;where did the data go, who could see it, what's logged, who signed a BAA&lt;/em&gt;) don't care whether your agent writes anything. They care where the data went.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The winners in regulated AI aren't the teams with the flashiest agent. They're the teams whose agent can pass the audit, because the governance was designed in, not bolted on after the demo got applause. If your agent touches PHI (or card data, under PCI), build the framework first and let the model be the easy part.&lt;/p&gt;

&lt;p&gt;That's how we build production AI for regulated industries: &lt;a href="https://beevr.ai/hipaa-mvp-development" rel="noopener noreferrer"&gt;compliance by design&lt;/a&gt;, with the agent governance auditors actually ask for. If that's the bar your product has to clear, &lt;a href="https://beevr.ai/ai-development-company" rel="noopener noreferrer"&gt;here's how we work&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you're running agents on regulated data, what's your hardest governance problem right now? Logging, BAAs, or keeping humans in the loop without killing the UX?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>healthcare</category>
      <category>programming</category>
    </item>
    <item>
      <title>We let AI coding agents into our codebase. The modular monolith won.</title>
      <dc:creator>Nguyen Thien</dc:creator>
      <pubDate>Tue, 30 Jun 2026 11:58:48 +0000</pubDate>
      <link>https://dev.to/thien_nguyen/we-let-ai-coding-agents-into-our-codebase-the-modular-monolith-won-50je</link>
      <guid>https://dev.to/thien_nguyen/we-let-ai-coding-agents-into-our-codebase-the-modular-monolith-won-50je</guid>
      <description>&lt;p&gt;For a decade, "microservices vs monolith" was an argument about &lt;em&gt;human&lt;/em&gt; teams: Conway's Law, independent deploys, blast radius. In 2026 a new participant walked into the codebase and quietly changed the math: the AI coding agent.&lt;/p&gt;

&lt;p&gt;We build software for a living. Our work is production systems for regulated industries, and we've spent the last year with agents (Claude Code, Cursor, the usual suspects) reading and writing real code alongside us. The pattern is consistent enough to say out loud: &lt;strong&gt;agents reason far better over a well-structured modular monolith than over a fleet of microservices.&lt;/strong&gt; And we're not alone in moving that direction. The CNCF has reported a wave of teams &lt;em&gt;consolidating&lt;/em&gt; services rather than splitting further.&lt;/p&gt;

&lt;p&gt;Here's why, and where it still breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microservices were optimized for a constraint AI doesn't have
&lt;/h2&gt;

&lt;p&gt;The original case for microservices was largely organizational: let many teams ship independently without stepping on each other. That's a real benefit for humans, at scale.&lt;/p&gt;

&lt;p&gt;But an AI agent's bottleneck isn't team coordination. It's &lt;strong&gt;context&lt;/strong&gt;. An agent is only as good as what it can see and hold at once. And microservices are, by design, an architecture of &lt;em&gt;hidden&lt;/em&gt; context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The logic for one user action is smeared across N repositories. To change a behavior, the agent has to discover, clone, and correlate code it can't see from where it started.&lt;/li&gt;
&lt;li&gt;A function call became a &lt;strong&gt;network call&lt;/strong&gt;. The agent can read a function and reason about it; it cannot "read" a flaky gRPC hop, a retry storm, or a partial failure between services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eventual consistency&lt;/strong&gt; replaced transactions. The agent can't reason cleanly about state that's correct "soon."&lt;/li&gt;
&lt;li&gt;There's no single stack trace. When something breaks, the truth is spread across logs in five services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the same costs we wrote about in &lt;a href="https://beevr.ai/blog/my-trip-to-microservices-hell" rel="noopener noreferrer"&gt;our trip to "microservices hell"&lt;/a&gt;: the network tax, the observability tax, the eventual-consistency headache. For a human team they're an operational drag. For an AI agent they're a &lt;strong&gt;reasoning wall&lt;/strong&gt;, because the information it needs to be correct is precisely the information the architecture hides.&lt;/p&gt;

&lt;h2&gt;
  
  
  A modular monolith is an agent's best-case environment
&lt;/h2&gt;

&lt;p&gt;Flip every one of those and you get the modular monolith, with strong module boundaries inside a single deployable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One repository = one context.&lt;/strong&gt; The agent can load the whole picture: the call site, the function, the data model, the test, in one place. Repository-level understanding, the thing every 2026 agent is racing to do better, is trivial when there's one repository.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-process calls.&lt;/strong&gt; A call is a call: typed, traceable, refactorable. The agent can follow it and change both sides atomically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real transactions.&lt;/strong&gt; State is consistent &lt;em&gt;now&lt;/em&gt;, so the agent's mental model matches reality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One stack trace.&lt;/strong&gt; When a test fails, the agent sees the whole failure and can iterate, which is exactly how agentic coding loops work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The modular monolith keeps the &lt;em&gt;design&lt;/em&gt; benefit of microservices (clean separation, clear boundaries) while removing the &lt;em&gt;operational&lt;/em&gt; fog that both humans and agents trip on. You get roughly 90% of the architectural benefit at about 10% of the cost, and now there's a second reason it matters: it's the difference between an agent that can safely refactor your system and one that flails across repos it can't hold in its head.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But our agents will just handle the complexity"
&lt;/h2&gt;

&lt;p&gt;The hope is that agents get good enough to manage distributed systems for us. Maybe, eventually. But today, handing an agent a microservices estate mostly multiplies the surface area where it can be confidently wrong, and distributed-systems bugs are the most expensive kind to be wrong about. Giving the agent a smaller, coherent world isn't a limitation; it's how you get &lt;em&gt;trustworthy&lt;/em&gt; output. The teams getting the most out of agents in 2026 aren't the ones with the most services. They're the ones whose codebase an agent can actually understand.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you should still split (the rule hasn't changed)
&lt;/h2&gt;

&lt;p&gt;This isn't anti-microservices zealotry. Split a module into its own service when you have a &lt;strong&gt;clear, painful, obvious&lt;/strong&gt; reason: a component with a wildly different scaling profile, a hard security or compliance isolation boundary, or a team that genuinely must deploy on its own cadence. That's a refactoring step you &lt;em&gt;earn&lt;/em&gt;, not a starting point, and not a default you adopt because a conference talk said so. Start with the modular monolith; extract a service the day a specific module forces your hand, and not before.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Microservices solved a human-team problem and charged an operational tax to do it. AI agents don't have the problem, yet they pay the tax twice, because the architecture hides exactly the context they need to reason. If you're rebuilding your team around AI agents in 2026, the highest-leverage architectural decision you can make is to give them a codebase they can hold in one head: a modular monolith.&lt;/p&gt;

&lt;p&gt;We build this way on purpose, on production systems where an agent (and a new senior engineer, and an auditor) can understand the whole thing. If that's the kind of software you need built, &lt;a href="https://beevr.ai/ai-development-company" rel="noopener noreferrer"&gt;here's how we do it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What's your experience letting agents loose on microservices vs a monolith? I'd genuinely like to hear where this breaks.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
