DEV Community: Nguyen Thien

Software Outsourcing in 2026: Models, Costs & The Four-Question Test

Nguyen Thien — Tue, 07 Jul 2026 11:37:13 +0000

Software outsourcing is hiring an external team to design, build or maintain your software instead of employing every engineer yourself. In 2026 it spans everything from a $15K MVP built by a studio to thousand-person offshore programs. Done right, it cuts costs 40–70% and ships faster than hiring; done wrong, it produces code you rebuild within a year. This guide covers the models, the real costs, and the four questions that predict which outcome you get.

What is software outsourcing?

Software outsourcing means contracting software work to an outside company — onshore (your country), nearshore (a close timezone) or offshore (typically Asia or Eastern Europe). It's not one thing: outsourcing a scoped product build to a senior studio and renting five junior developers by the hour are both "outsourcing," with opposite risk profiles. The label matters less than the engagement model underneath it, which is where most advice — and most failure — hides.

The four outsourcing models (and who each is for)

Model	How it works	Best for	Watch out for
Project outsourcing (fixed price)	Vendor owns delivery of a defined scope for an agreed price	Founders and teams that need a known number — MVPs, defined products	Vague scope = change-order hell; fix by phasing the scope
Staff augmentation	You rent engineers into your own team, hourly/monthly	Teams with strong in-house leadership needing extra hands	You carry all delivery risk; quality depends on your management
Dedicated team	A stable external team works only on your product long-term	Funded startups scaling beyond one product surface	Costs run monthly regardless of output; needs real product direction
BOT (build-operate-transfer)	Vendor builds a team you later absorb as your own office	Enterprises committing to a country for years	Heavy setup; only pays off at scale

For a founder or a team shipping a specific product, fixed-price project outsourcing wins because it converts estimation risk into the vendor's problem — the full argument is in fixed price vs time & materials.

How much does software outsourcing cost?

Senior-engineer rates in 2026: $100–$200/hr in the US, $45–$75 in Eastern Europe, $35–$70 in Latin America, and $20–$45 in Vietnam — with India spanning $15–$40 and the widest variance. In project terms, the same mid-complexity product that costs $120K–$300K at US rates lands at $30K–$80K with a senior Vietnam team. Two rules keep the math honest: rate is not cost (a junior bench that rebuilds everything is the most expensive option at any rate), and hourly is the wrong unit — insist on a fixed price per phase. Full breakdowns: rates by country and what an MVP really costs.

Outsourcing vs hiring in-house: which is right?

Pre-revenue or pre-product-market-fit, outsourcing to a senior studio is usually faster and cheaper: a US engineer runs $180K+ fully loaded per year, takes months to hire, and your first hires carry existential risk if they're wrong. Once you've raised and the product is proven, building an in-house core team makes sense — often keeping an external team for parallel workstreams. The detailed comparison is in agency vs in-house, and if you need senior technical direction without a full-time hire, see fractional CTO vs development agency.

What are the biggest software outsourcing risks?

Four, and they're all contractual before they're technical. Junior teams behind senior rates — ask who writes your code by name and seniority. Open-ended billing — insist on fixed price per phase with a working demo every week. IP and code hostage-taking — require full IP assignment, source code and repository access from day one; the test is in do you own the code? Compliance blindness — if your product touches payment or health data, a vendor without concrete HIPAA/PCI answers is a liability, not a bargain. A vendor that passes all four is safe in any country; a vendor that fails them is dangerous in every country.

Which country is best for software outsourcing?

There's no universal winner — India for scale, Latin America for US-timezone overlap, Eastern Europe for EU proximity, and Vietnam for the strongest senior-talent-per-dollar in 2026 (senior engineers at 50–70% below US rates with genuine engineering depth). We compare all destinations honestly in best countries for offshore software development — and make the case that the engagement model predicts your outcome more than the flag does.

How do you choose a software outsourcing company?

Filter with evidence, not portfolios: ask for a named senior team, a fixed price per phase, full IP from day one, weekly working demos, and a concrete compliance answer if you're regulated. Then test communication with a small paid discovery phase before committing to a build — how a vendor handles a one-week engagement predicts the six-month one. The full checklist is in how to choose a development partner.

We're a senior, founder-led software outsourcing studio in Vietnam — fixed price per phase from $10K/month, full IP ownership from day one, weekly working demos, and compliance designed in for fintech and healthcare. See how we work, our custom software development services, or tell us what you're building for a fixed quote in one 30-minute call.

How Much Does an AI Agent Cost to Build in 2026? [Build + Run]

Nguyen Thien — Tue, 07 Jul 2026 11:33:13 +0000

In 2026, building a custom AI agent costs $15,000–$75,000 for a single-purpose agent with real integrations, and $80,000–$500,000 for multi-agent or compliance-bound systems. A thin wrapper around an LLM API can be done for $5,000–$25,000 — but the gap between a wrapper and an agent you can trust in production is exactly where the money goes. This guide breaks down the numbers, including the part most estimates hide: what it costs to run.

How much does it cost to build an AI agent in 2026?

It depends on how much you let the agent do — autonomy is the cost driver, not intelligence:

Agent type	Typical build cost	What's inside
Wrapper / assistant (chat over your docs)	$5K–$25K	LLM API + RAG over a knowledge base, no real actions
Workflow agent (one job, few tools)	$15K–$75K	Scoped tools, structured outputs, evals, human review queue
Autonomous multi-tool agent	$50K–$150K	Planning, tool orchestration, guardrails, observability, cost control
Regulated-industry agent (HIPAA/PCI)	$70K–$180K+	All of the above + BAA chain, audit trails, approval gates
Enterprise multi-agent platform	$150K–$500K+	Multiple coordinated agents, SSO, tenancy, compliance

Industry surveys put agentic-AI project overruns at 35–50% above initial estimates — significantly worse than traditional software. The overruns come from the same place every time: teams budget for the demo and discover the production work later. We wrote about that gap in what production-ready AI actually means.

Why did AI agent costs become front-page news in 2026?

Because the running costs stopped being a rounding error. Gartner now predicts AI coding costs will grow to match developer salaries; Uber famously exhausted its annual AI-tools budget in about four months. Token economics compound with agents: a single autonomous task can chain dozens of model calls, so an agent that costs $0.10 per simple query can cost $5–15 per complex task. Budgeting an agent means budgeting build + run, and most quotes you'll get only cover the first half.

How much does it cost to run an AI agent?

Plan for three recurring lines. Inference: from tens of dollars a month for a low-volume internal agent to thousands for customer-facing volume; frontier models run roughly $1–15 per million input tokens depending on tier, and agent workflows burn tokens on every planning step, not just the final answer. Monitoring and evals: logging, tracing and regression testing when models or prompts change — typically 10–20% of the build cost per year. Model churn: providers deprecate and reprice models; pin versions and budget a small re-validation effort per upgrade. A well-architected agent controls inference cost by routing easy steps to cheap models and reserving frontier models for the hard ones — that routing logic is part of what you're paying for in the build.

What actually drives the build cost?

Four things, in order of impact. Actions, not answers: the moment an agent writes to a system — files a ticket, updates a record, sends an email — you need approval gates, rollback paths and an audit trail; that's the difference between $20K and $80K. Evaluation: an agent without an eval suite is a demo; building the test harness that proves it behaves is often a third of the budget. Integrations: each system the agent touches (CRM, EHR, ERP) adds scoped tools, permissions and error handling. Compliance: in healthcare or fintech, the agent inherits the full regulatory surface — we detailed the architecture in HIPAA-compliant AI agents.

Should you build on an agent framework or from scratch?

Use a framework for orchestration plumbing and spend your budget on what's unique: your tools, your evals, your guardrails. Open-source frameworks (including Kite, our open-source agent framework — built on the principle that the LLM is an untrusted component) cut weeks off the build without locking you in. Buying an off-the-shelf agent platform is right when your use case is generic (support triage, meeting notes); it's wrong when the agent is your product, because you inherit their limits and their pricing.

Is an AI agent worth it for a startup MVP?

Yes if the agent is the product; carefully if it's a feature. An agent-as-product MVP typically lands at $50K–$120K — comparable to any AI SaaS MVP — and investors will probe the eval suite and unit economics harder than the demo. If the agent is a feature inside a bigger product, start with the workflow-agent tier: one job, few tools, human review, and expand autonomy only after the eval data says you can. Full AI budgeting context in our AI development cost guide.

We build production AI agents for startups and regulated industries — senior team, fixed price per phase, full IP ownership, and honest answers about what should stay human-approved. See our AI development services, the fixed-price cost breakdown, or tell us what you want the agent to do — we'll scope it and tell you what it really costs, build and run.

Best Countries for Offshore Software Development (2026)

Nguyen Thien — Tue, 30 Jun 2026 21:09:04 +0000

"What is the best country for offshore software development?" is a slightly wrong question with a useful answer hiding inside it. There is no single best country — there is a best fit for what you are optimizing: cost, timezone overlap, working English, engineering seniority, or domain depth. This guide compares the destinations founders actually shortlist in 2026, honestly, and then makes the case that the country matters less than how the team is run.

The honest shortlist for 2026

A handful of regions dominate serious offshore shortlists. None is "best" in the abstract; each wins on something and loses on something else.

India. The deepest talent pool and the lowest entry prices, with everything from solo freelancers to 100,000-person firms. The trade-off is enormous variance: outcomes depend entirely on which slice you hire, and the cheap end is where most outsourcing horror stories come from. Best for: large staff augmentation and teams that can vet hard.
Vietnam. Senior engineers at roughly 50–70% below US rates, a fast-growing and genuinely strong engineering culture, and widespread working English. The trade-off is timezone (ahead of US hours) and a smaller pool than India. Best for: founders and lean teams who want senior delivery on a fixed budget.
Philippines. Strong English and good cultural fit with US clients, historically deepest in support and BPO, with a growing dev scene. The trade-off is that senior product-engineering depth is thinner than Vietnam or India. Best for: English-heavy products and teams that value communication.
Eastern Europe (Poland, Ukraine, Romania). Excellent senior engineering and close to EU timezones. The trade-off is price — rates are well above Asia, closer to nearshore than offshore. Best for: EU companies prioritizing seniority over cost.
Latin America. The nearshore pick for US companies: same-timezone overlap and improving talent. The trade-off is price (above Asia) and a market still maturing. Best for: US teams that need real-time overlap above all else.

Why Vietnam keeps topping the lists

The reasons are real, and worth stating without the marketing gloss. Senior engineers cost a fraction of US rates while the seniority is genuine, not a junior bench with a senior title. The engineering culture is deep and still compounding, English is good enough for async product work, and the country is politically stable with a strong tech-education pipeline. The honest caveat is timezone: Vietnam runs ahead of US hours, so the teams worth hiring offset it with disciplined async communication and a working demo every week, not a once-a-quarter reveal. Cost is the headline; senior-talent-per-dollar is the real story.

The mistake: optimizing for country instead of model

Here is the part most "best country" lists skip. The country sets a price band and a timezone. It does not decide whether your project succeeds. Four things do, and they cut across every country:

Senior vs junior. A low day-rate usually means a junior bench learning on your code. You pay twice — once to build, once to rebuild. Ask who writes your code, by name and seniority.
Fixed price vs open-ended hourly. Time-and-materials with no cap pushes all the estimation risk onto you. For a defensible budget, insist on a fixed price per phase.
Ownership. Some vendors retain IP, license the product back, or gate the source behind a maintenance contract. Require full IP, source code and the repository from day one, in writing.
Compliance. If your product touches regulated data, a vendor with no concrete answer on HIPAA, PCI or SOC 2 is a liability. Compliance is architecture, not a checkbox.

A senior, fixed-price, full-ownership team in Vietnam will beat a cheap junior shop anywhere — and a careless engagement in any country will fail regardless of the flag on the map.

Which country is best for offshore software development?

There is no universal winner. If you need a large team for a multi-year program, India's scale is hard to beat. If you need real-time US overlap above all, Latin America. If you want senior delivery on a fixed budget with full ownership, Vietnam is the strongest value in 2026. But filter any country through the four questions above first — they predict the outcome more than the location does.

Is Vietnam good for offshore software development?

Yes, for the right buyer. Vietnam offers senior engineers at roughly 50–70% below US rates, a deep and growing talent pool, and widespread working English. The main trade-off is timezone, which good teams offset with async discipline and a weekly working demo. The risk to avoid is the cheap-and-junior model, not the location.

How much does offshore software development cost?

Senior engineering in Asia typically runs around $20–45 per hour, well below US and EU rates; Eastern Europe and Latin America sit higher. But hourly rate is the wrong unit — what protects your runway is a fixed price per phase.

This article was originally published on the BeevR blog. We are a senior, founder-led software studio in Vietnam: senior engineers only, fixed price per phase, and you own 100% of the code.

We built Nebula: GraphRAG that runs in your browser tab, not someone else's cloud

Nguyen Thien — Tue, 30 Jun 2026 13:19:40 +0000

Most AI note apps ship your notes to a cloud vector database and a hosted model, then ask you to trust the privacy policy. For the work we do (regulated industries, sensitive data) that is a non-starter. So we built the opposite and open-sourced it: Nebula, a private, local-first AI knowledge base that runs entirely inside a browser tab. No backend, no account, no server. Its tagline says it plainly: notes that think, nothing leaves your device.

Repo: https://github.com/beevr-labs/Nebula (Apache-2.0). Live demo, no signup: https://beevr-labs.github.io/Nebula/. Here is why we went fully on-device, and what it cost.

Privacy by architecture, not by promise

The usual privacy pitch is a policy: "we won't look at your data." Nebula's is structural: there is nowhere for your data to go. Everything runs in the browser. Notes, embeddings, and the search index live in local browser storage. There is no sync service, no account system, and therefore no server to breach or to put under a data-processing agreement. For sensitive notes (client records, health information, anything you would not paste into a cloud chatbot) that is the whole point.

What runs where

It is a SvelteKit single-page app that does real ML in the browser:

On-device chat via WebLLM, GPU-accelerated with WebGPU. You pick the model, from tiny-and-fast to large-and-accurate, and Nebula shows the download size before you commit. Qwen and Llama models are supported.
Semantic search powered by bge-m3 (Apache-2.0), about 570 MB on first use, then cached and fully offline. It is multilingual, including Vietnamese, so it works across mixed-language notes.
WebAssembly handles the compute-heavy parts.
After the first model download, the whole thing works offline.

Why a graph, not just vectors

Flat vector search finds notes that are similar. It does not understand that "the client from the Tuesday call" and "Acme Corp" are the same entity across ten different notes. Nebula builds an entity knowledge graph automatically (people, projects, clients) and uses GraphRAG to answer questions by walking those relationships, then links every answer back to the source notes. You ask in plain language and get an answer you can trace, instead of a keyword hunt across disconnected files.

It is also just a good notes app

The AI is useless if the notes app underneath is not real, so it is: Markdown, wikilinks and backlinks, tabs, a quick switcher, daily notes, templates, tags, and folders. You can bring your own files (PDF, CSV, text) and export the whole vault as plain .md files whenever you want. No lock-in: your notes go in and out as portable Markdown. The codebase ships with 430+ automated tests, because local-first does not mean fragile.

The hard parts (what we learned)

On-device models are smaller, so structure has to carry more weight. The knowledge graph recovers context that a small local model alone would miss, which is a big part of why we went graph-first instead of leaning on raw model size.
Explainable retrieval matters as much as accuracy. Showing the path through the graph back to source notes is what makes the answer trustworthy, and for regulated buyers that traceability is not a nice-to-have.
The browser is a surprisingly capable runtime in 2026. WebGPU plus WebAssembly means "install nothing, runs offline, GPU-accelerated" is actually achievable, not a science project.

Why open source

The same reason we open-source the rest of our hardest work: in AI, "verifiable" beats "trust me." A buyer evaluating us for sensitive data can read exactly how retrieval works, and confirm for themselves that nothing leaves the device, instead of taking our word for it.

Nebula is Apache-2.0 at https://github.com/beevr-labs/Nebula, with a live demo at https://beevr-labs.github.io/Nebula/. If you need AI built on sensitive or regulated data, on-device or otherwise, made to survive an audit rather than just a demo, here is how we work.

Anyone else running RAG fully in the browser? What model and hardware combo is actually working for you?

Originally published on beevr.ai.

We open-sourced Kite, our agent framework. Here is what building production agents taught us.

Nguyen Thien — Tue, 30 Jun 2026 13:17:37 +0000

Everyone has an agent demo in 2026. Far fewer have agents they would put in front of a paying customer, an auditor, or a patient. The gap between "it worked in the notebook" and "it works every time, safely, and we can explain what it did" is where most agent projects quietly die, and it is the gap we built Kite to close.

We just open-sourced it: https://github.com/beevr-labs/Kite. It is Python, MIT licensed, and pip install kite-agent away. This is the honest writeup of why it exists and what we learned.

The problem Kite solves

We build production software for regulated industries, so we kept hitting the same wall: the popular agent frameworks are great for a prototype and painful for production. Getting to a first working agent in LangChain or AutoGen is a configuration project, and once you are there you still have to bolt on the parts that actually matter in production: guardrails, retries, idempotency, observability, evaluation. We were rebuilding that same scaffolding for every client. Kite is the framework we wish we had started with: opinionated about safety, fast to a running agent, and small enough to read.

The one design decision everything hangs on: treat the LLM as untrusted

This is the core idea. In Kite, the model proposes actions, it does not execute them. A controlled kernel sits between the agent and the real world and validates every proposed action against policy before anything runs. So when an agent decides to call agent.run("rm -rf /"), the kernel refuses it instead of your filesystem finding out the hard way.

It sounds simple. It changes everything about how comfortable you are giving an agent real tools. The model becomes a planner you can sandbox, not a process with your credentials. For anyone running agents on sensitive data or real infrastructure, that boundary is the difference between a demo and something you can actually deploy.

What you get out of the box

Five reasoning patterns, selectable per agent: ReAct (think, act, observe), ReWOO (plan upfront and run steps in parallel, which Kite clocks at roughly 2x faster), Tree of Thoughts (explore multiple paths), Plan-Execute (decompose and replan on failure), and Reflective (generate, critique, improve).
Production safety primitives: a circuit breaker that stops cascading failures, a kill switch (per-agent or global) for when you need everything to stop now, and idempotency keyed on operation IDs so a retried action does not charge a customer twice.
Retrieval that is not a toy: HyDE, hybrid BM25 plus vector search, MMR deduplication, and reranking.
Prompt A/B testing with statistical confidence intervals on real traffic, because "the new prompt feels better" is not a deployment criterion.

What it looks like

The fastest path is the generator. Describe the agent, get a runnable file:

pip install kite-agent
export GROQ_API_KEY=your_key
kite generate "research assistant that searches and summarizes" --out agent.py
python agent.py

Or build one directly in Python and pick the reasoning pattern:

from kite import Kite

ai = Kite()
agent = ai.create_agent(name="Bot", agent_type="react")
result = await agent.run("user request")

Kite's own benchmarks put time to first agent at under a minute (versus roughly 30 minutes for LangChain and 20 for AutoGen in their tests) and cold startup around 50ms (versus ~2s and ~1s). Take the comparison as the authors' figures, not an audit, but the design intent is clear: get to a safe, running agent fast.

What we learned running agents in production

The model is about 10% of the work. The other 90% is tools, retries, guardrails, idempotency, and evaluation. A better model does not save you from a missing kill switch.
Most "agent failures" are IO failures in disguise. A flaky tool, a duplicated side effect, a partial write. Observability and idempotency beat another round of prompt tuning almost every time.
The untrusted-component framing is freeing, not limiting. Once the kernel is the thing that says yes or no, you stop being afraid to hand the agent real capabilities.

Why we open-sourced it

In a field full of black boxes, "you can read the code" is a differentiator, not a giveaway. We build production AI for regulated industries, and the way we earn a technical buyer's trust is by letting them inspect the hardest parts of our stack instead of taking a pitch on faith.

Kite is MIT licensed and lives at https://github.com/beevr-labs/Kite. Issues and PRs welcome. If you are building production-grade or compliance-bound AI and want a partner who ships the boring 90%, here is how we work.

What are you using to build agents in production, and what keeps breaking? Curious where Kite would and would not help.

Originally published on beevr.ai.

Your AI agent isn't HIPAA-compliant just because the model is good

Nguyen Thien — Tue, 30 Jun 2026 12:02:06 +0000

In 2026 everyone has shipped an AI agent. Far fewer have shipped one they could defend in an audit. Surveys keep finding the same gap: most security leaders are worried about AI-agent risk, and only a handful have actually put mature controls around it. Teams are deploying agents faster than they can govern them, and in healthcare, finance, or anywhere regulated, that's how a great demo becomes a reportable breach.

Here's the category error underneath it: a capable model is not a compliant system. You can point the best model in the world at protected health information (PHI) and still be wildly non-compliant. Compliance isn't a property of the model; it's a property of the architecture around it. (We've argued before that a good model doesn't make a tool HIPAA-compliant; with agents, the gap gets wider.)

An agent's compliance surface is bigger than a chatbot's

A chatbot reads and replies. An agent does things: it calls tools, queries databases, writes records, sends messages, and remembers across turns. Every one of those is a new place regulated data can leak or an unlogged action can happen:

Tool calls reach into systems that hold PHI, and each tool is a new data path that needs a BAA and least-privilege scoping.
Autonomous actions can change real state (book, cancel, message a patient). Anything affecting care can't be a black-box decision.
Memory and logs quietly persist PHI, often in places nobody put under a Business Associate Agreement.
Data egress to a hosted model provider is a transfer of PHI to a third party. No BAA with that provider, no compliance. Full stop.

The model is maybe 10% of the risk. The other 90% is everything the agent is wired to touch.

The governance checklist for agents on regulated data

If an agent goes near PHI, these aren't nice-to-haves; they're the difference between "audit-ready" and "liability":

BAA chain, including the model provider. Every service that processes PHI on your behalf (cloud, database, and the LLM API) needs a signed Business Associate Agreement before a single token flows. A consumer LLM endpoint with no BAA is an instant fail.
Minimize and mask PHI before the model sees it. Strip or tokenize identifiers at the boundary. The less PHI reaches the model, the smaller your breach blast radius.
Human-in-the-loop on anything affecting care. Measured accuracy plus a human sign-off, not autonomous decisions on treatment, eligibility, or anything clinical.
Tamper-evident audit logging of every action and tool call. Who, what, when, why, retained per HIPAA's six-year expectation. "What did the agent do at 2am?" must have an answer.
Least-privilege tools. Scope each tool to the minimum data and actions it needs. An agent that can read every record will eventually read the wrong one.
No training on PHI. Confirm contractually that your data isn't used to train the provider's models.
Measured, reported accuracy. Evaluation is part of the build, not a launch-day afterthought, and in regulated settings you have to be able to show it.

"But it's just RAG / it's read-only"

Doesn't matter. Read-only still means PHI egresses to wherever you embed and store it. RAG still puts patient data in a vector store and a prompt. The questions an auditor asks (where did the data go, who could see it, what's logged, who signed a BAA) don't care whether your agent writes anything. They care where the data went.

The takeaway

The winners in regulated AI aren't the teams with the flashiest agent. They're the teams whose agent can pass the audit, because the governance was designed in, not bolted on after the demo got applause. If your agent touches PHI (or card data, under PCI), build the framework first and let the model be the easy part.

That's how we build production AI for regulated industries: compliance by design, with the agent governance auditors actually ask for. If that's the bar your product has to clear, here's how we work.

If you're running agents on regulated data, what's your hardest governance problem right now? Logging, BAAs, or keeping humans in the loop without killing the UX?

We let AI coding agents into our codebase. The modular monolith won.

Nguyen Thien — Tue, 30 Jun 2026 11:58:48 +0000

For a decade, "microservices vs monolith" was an argument about human teams: Conway's Law, independent deploys, blast radius. In 2026 a new participant walked into the codebase and quietly changed the math: the AI coding agent.

We build software for a living. Our work is production systems for regulated industries, and we've spent the last year with agents (Claude Code, Cursor, the usual suspects) reading and writing real code alongside us. The pattern is consistent enough to say out loud: agents reason far better over a well-structured modular monolith than over a fleet of microservices. And we're not alone in moving that direction. The CNCF has reported a wave of teams consolidating services rather than splitting further.

Here's why, and where it still breaks.

Microservices were optimized for a constraint AI doesn't have

The original case for microservices was largely organizational: let many teams ship independently without stepping on each other. That's a real benefit for humans, at scale.

But an AI agent's bottleneck isn't team coordination. It's context. An agent is only as good as what it can see and hold at once. And microservices are, by design, an architecture of hidden context:

The logic for one user action is smeared across N repositories. To change a behavior, the agent has to discover, clone, and correlate code it can't see from where it started.
A function call became a network call. The agent can read a function and reason about it; it cannot "read" a flaky gRPC hop, a retry storm, or a partial failure between services.
Eventual consistency replaced transactions. The agent can't reason cleanly about state that's correct "soon."
There's no single stack trace. When something breaks, the truth is spread across logs in five services.

These are the same costs we wrote about in our trip to "microservices hell": the network tax, the observability tax, the eventual-consistency headache. For a human team they're an operational drag. For an AI agent they're a reasoning wall, because the information it needs to be correct is precisely the information the architecture hides.

A modular monolith is an agent's best-case environment

Flip every one of those and you get the modular monolith, with strong module boundaries inside a single deployable:

One repository = one context. The agent can load the whole picture: the call site, the function, the data model, the test, in one place. Repository-level understanding, the thing every 2026 agent is racing to do better, is trivial when there's one repository.
In-process calls. A call is a call: typed, traceable, refactorable. The agent can follow it and change both sides atomically.
Real transactions. State is consistent now, so the agent's mental model matches reality.
One stack trace. When a test fails, the agent sees the whole failure and can iterate, which is exactly how agentic coding loops work.

The modular monolith keeps the design benefit of microservices (clean separation, clear boundaries) while removing the operational fog that both humans and agents trip on. You get roughly 90% of the architectural benefit at about 10% of the cost, and now there's a second reason it matters: it's the difference between an agent that can safely refactor your system and one that flails across repos it can't hold in its head.

"But our agents will just handle the complexity"

The hope is that agents get good enough to manage distributed systems for us. Maybe, eventually. But today, handing an agent a microservices estate mostly multiplies the surface area where it can be confidently wrong, and distributed-systems bugs are the most expensive kind to be wrong about. Giving the agent a smaller, coherent world isn't a limitation; it's how you get trustworthy output. The teams getting the most out of agents in 2026 aren't the ones with the most services. They're the ones whose codebase an agent can actually understand.

When you should still split (the rule hasn't changed)

This isn't anti-microservices zealotry. Split a module into its own service when you have a clear, painful, obvious reason: a component with a wildly different scaling profile, a hard security or compliance isolation boundary, or a team that genuinely must deploy on its own cadence. That's a refactoring step you earn, not a starting point, and not a default you adopt because a conference talk said so. Start with the modular monolith; extract a service the day a specific module forces your hand, and not before.

The takeaway

Microservices solved a human-team problem and charged an operational tax to do it. AI agents don't have the problem, yet they pay the tax twice, because the architecture hides exactly the context they need to reason. If you're rebuilding your team around AI agents in 2026, the highest-leverage architectural decision you can make is to give them a codebase they can hold in one head: a modular monolith.

We build this way on purpose, on production systems where an agent (and a new senior engineer, and an auditor) can understand the whole thing. If that's the kind of software you need built, here's how we do it.

What's your experience letting agents loose on microservices vs a monolith? I'd genuinely like to hear where this breaks.