DEV Community: Dennis Traub

[Boost]

Dennis Traub — Tue, 26 May 2026 20:50:39 +0000

Home network tools and DIY container builds

Jess Lee for The DEV Team

May 26

Your agent keeps using that word ...

Dennis Traub — Thu, 21 May 2026 20:52:57 +0000

"You keep using that word. I do not think it means what you think it means."
— Inigo Montoya, The Princess Bride (1987)

Two people using the same word while meaning completely different things has been a staple of comedy for centuries.

Lewis Carroll built an entire scene around it in 1871:

"When I use a word," Humpty Dumpty tells Alice, "it means just what I choose it to mean - neither more nor less."

Now imagine what happens if you give Humpty Dumpty a code editor and 1M tokens of context...

An AI coding agent is the most confident version of Humpty Dumpty you'll ever encounter. It will interpret your vocabulary however it sees fit, and it will never tell you it's confused. It won't raise its hand and ask "wait, do you mean a purchase order or a delivery order?" It will pick one interpretation, generate plausible code, and move on. As confidently as ever.

How AI agents amplify vocabulary ambiguity

The problem isn't new. Teams have always talked past each other. But in traditional development, ambiguity encounters friction: code reviews, pair programming, whiteboard sessions, the "wait, what exactly do you mean by 'order'?" conversation over coffee. Misunderstandings cascade, but they mostly get caught eventually.

With AI agents, that friction disappears. The code compiles, the tests pass, and the semantic mismatch only surfaces when a downstream system interprets the same word differently. Or when the wrong action leads to a production incident.

In How Creating a Ubiquitous Language Ensures AI Builds What You Actually Want, Daniel Schleicher wrote: "LLMs are amplifiers. When we give an AI agent ambiguous instructions where 'order' could mean a dozen different things, it amplifies the chaos by generating code that reflects our confusion."

The flip side is equally true: precise vocabulary gets amplified into correct class names, correct lifecycle states, and correct boundaries. The agent works with whatever you give it - if you give it ambiguity, then that's what it amplifies.

Russell Miles tells a story about a finance agent that confused booking (revenue recognition in one bounded context) and reservation (risk allocation in another):

"The agent mashed them like a bad DJ, and began taking action that would've landed the CFO in handcuffs."
— Russell Miles, Domain Driven Agent Design

Ambiguous words, different contexts, no clear definitions, and code that compiled just fine all the way to production.

Ubiquitous Language: DDD's pattern for shared vocabulary

The good news is that Domain-Driven Design (DDD), a widely-adopted methodology, introduced by Eric Evans in 2003, provides both the vocabulary and the pattern for a solution. It's called Ubiquitous Language:

Use the model as the backbone of a language. Commit the team to exercising that language relentlessly in all communication within the team and in the code.
— Eric Evans, Domain-Driven Design (2003)

Martin Fowler describes it as "the practice of building up a common, rigorous language between developers and users."

And Vaughn Vernon sharpens it even more:

"[...] a shared language developed by the team - a team comprised of both domain experts and software developers"
— Vaughn Vernon, Implementing Domain-Driven Design (2013)

The important distinction here: It's not a glossary imposed from above, not industry jargon adopted wholesale. It is a language the team builds together through use.

But note that - despite the name - the Ubiquitous Language is all but ubiquitous. In Vernon's words: "There is one Ubiquitous Language per Bounded Context."

What does "Customer" really mean?

The term "Customer" seems universal - it shows up in the codebase of virtually every single business application.

But look at how each subdomain actually refers to the same person: marketing calls them a lead, sales calls them a prospect, the warehouse calls them a consignee, finance calls them a debtor, and support calls them a claimant.

All these subdomains may refer to the same person, but the words they use aren't synonyms. They carry different attributes, different lifecycle states, different invariants, and different behavior.

And nobody really "creates" or "deletes" anything: a lead is captured, a prospect is qualified, a purchase intent is confirmed, a shipment is dispatched. Each subdomain has its own language, refined over decades - or centuries - of practice.

Flattening all of this into a Customer class with a type field and CRUD operations may feel like it's simplifying things. But it actually erases the precision and clarity each domain has built, and replaces it with generic ambiguity, which the code, the database schema, the team, and - more often than not - the users of the software have to work around.

When writing the original book, Evans described the Ubiquitous Language for teams of humans - developers, product managers, and domain experts - modelling a shared domain.

But now, we have a new participant: the coding agent. And when the agent reads your prompt, your system instructions, your project rules, it's doing something analogous to a new developer onboarding onto the team.

Except it onboards every session, with no memory of last time's disambiguation, and no instinct to ask when a term's exact meaning is unclear. Tell it "delete the order" and it will. Generically, ambiguously, and very confidently.

Practitioners applying DDD's Ubiquitous Language to AI agents

I'm not the only one seeing this. Across the industry, people are arriving at the same pattern from different directions.

Martin Fowler described at the Pragmatic Summit 2026 how a colleague is "developing a precise language to communicate about the domain with the agent... that's basically the kind of model building, language building, domain-driven design stuff that we're used to doing, but it makes him more efficient to talk to the agent."

Matt Pocock built a glossary skill that scans the codebase, extracts terminology into markdown, and feeds it to the agent. Reading the agent's thinking traces, he found it improved implementation alignment while even reducing verbosity - an unexpected benefit in terms of token use and cost.

Daniel Schleicher formalized it with the "Spec Ambiguity Resolver" pattern: a living glossary where AI flags ambiguous terms, proposes definitions, and waits for human approval. Once approved, consistency is enforced project-wide:

"The greatest leverage in modern software development with AI lies not in accelerating typing, but in establishing semantic agreement before implementation begins."
— Daniel Schleicher, How Creating a Ubiquitous Language ...

Tim Oleson named "Language Consistency - usage of ubiquitous language in agent communications" as a first-class domain alignment metric for agent systems.

Whether they came to it through practicing Domain-Driven Design or arrived at it independently, the pattern is the same. And it shows up in places you might not expect: every MCP tool name is a vocabulary decision the LLM reasons over. confirm_purchase_intent() and submit_order() produce different agent behavior even when the underlying operation is identical. Chris Hughes writes that the tools layer "protects your domain from the LLM's interface requirements - translating between 'strings the LLM can reason about' and 'rich domain objects your code works with.'"

Marc Brooker's Specification Loop explains how this compounds over time: each resolved ambiguity becomes a stable term that enriches the shared vocabulary. The next conversation starts from a higher baseline and the loop gets cheaper over time as the vocabulary grows. "I believe that the trips around the loop are fundamental to the success of the whole enterprise. They're what we've been doing all along."

How to define domain vocabulary for your coding agent

The Ubiquitous Language isn't a document you have to write upfront and rigidly apply. It's a living artifact that evolves through use.

A common practice in teams: whenever someone realizes there's an ambiguity - two people using the same word differently, or one word meaning two things - the team pauses, discusses, and agrees on a term. And this doesn't have to be the academically "correct" term. What matters is that everyone on the team knows exactly what it means. Then you write it down, and use it consistently. In conversations, in code, and in the documentation.

This practice naturally extends to conversations with a coding agent. Whenever there's a misunderstanding, missing clarity, or ambiguity in what the agent produces, spend a moment to establish a specific term for the concept. Define it. Add it to whatever artifact represents the current shared state of the vocabulary: your CLAUDE.md or AGENT.md, your cursor rules, or just a simple domain-terms.md, referenced as the canonical glossary of terms. The agent will start using it from that point forward, each disambiguation compounds, and every new conversation starts from a higher baseline.

But remember: each bounded context has its own vocabulary. If your application has a fulfillment module and a billing module, each gets its own language section in that module's CLAUDE.md (or whatever file your agent reads when entering a directory). The terms in one context don't need to match the terms in another - that's the whole point.

Here's what a modules/fulfillment/CLAUDE.md might contain:

## Fulfillment Domain Language

**Consignee**:
The person or entity receiving a shipment. Identified by delivery
address, contact phone, and access instructions. A consignee may
differ from the purchaser.
*Avoid*: Customer, Buyer, User

**Consignment**:
The full set of items to be delivered to a consignee. May be
fulfilled as one or more Shipments depending on stock location
and logistics.
*Avoid*: Order, Delivery (a delivery is one shipment arriving,
not the full set)

**Shipment**:
A physical package dispatched to a consignee via a carrier. Has
tracking number, weight, dimensions, and a declared value.
*Avoid*: Order, Package (ambiguous with code packages)

**Dispatch**:
The act of handing a shipment to a carrier. Irreversible - once
dispatched, the shipment is the carrier's responsibility.
*Avoid*: Send, Ship (verb), Process

And modules/billing/CLAUDE.md in the same application, even though it might refer to the same physical person or purchase event, is using terms native to its own domain:

## Billing Domain Language

**Debtor**:
The legal entity liable for payment. Identified by tax ID, billing
address, and agreed payment terms (net-30, net-60, etc.).
*Avoid*: Customer, Account, User

**Invoice**:
A legal demand for payment tied to one or more line items. Carries
a due date, VAT status, and regulatory classification. An invoice
is never deleted - it is reversed with a **Credit Note**.
*Avoid*: Order, Bill (ambiguous with infrastructure billing)

**Settlement**:
The act of a debtor satisfying an invoice in full or in part.
Records payment method, date, and reconciliation reference.
*Avoid*: Payment (ambiguous - is it the act or the money?),
Transaction

Both examples describe the same person making the same purchase. But the vocabularies are completely different, because the contexts model different responsibilities, different invariants, different behaviors, and different lifecycles.

Tell an agent "handle the customer's order" without this, and you get a generic Customer class with a type field. Tell it within a defined context, and you get a Consignee with a delivery address or a Debtor with payment terms.

The vocabulary manifests in an architecture that captures the richness of the business domains it models. And - whichever context you're in - the agent knows exactly what you mean.

And remember: You don't need a formal glossary to start.

All you need is one line the next time something goes wrong: *"In this project, 'order' means a confirmed purchase intent, not a request or a reservation."

Add it. Use it. Let it accumulate.

If you want to learn more about how Domain-Driven Design maps to the age of coding agents, I recently wrote about how engineers building MCP servers are reaching for DDD's Bounded Contexts and Anti-Corruption Layers.

AI isn't replacing junior devs. Your org chart is.

Dennis Traub — Tue, 19 May 2026 15:50:26 +0000

Two of this year's most-shared pieces on AI's effect on junior software engineers read as direct contradictions.

In What about juniors?, Marc Brooker (VP, Distinguished Engineer at AWS) argues that junior developers have a structural advantage. Their accumulated heuristics haven't calcified into reflex yet, so they don't have to unlearn anything. The field, he says, is "more powerful than ever" for someone willing to expand their scope.

Around the same time, Mark Russinovich (CTO, Deputy CISO at Microsoft Azure) and Scott Hanselman (VP, Member of Technical Staff at Microsoft) published Redefining the Software Engineering Profession for AI, a paper arguing the exact opposite: AI imposes a "drag" on early-career developers. Juniors must steer, verify, and integrate AI output before they have an opportunity to develop the judgment required to do it well. If organizations focus only on short-term efficiency, they "risk hollowing out the next generation of technical leaders."

Both papers were widely discussed, both were treated as the definitive take, and they appear to flatly disagree.

What if they don't?

What if both sides are right about AI and junior developers?

Brooker is describing what individual junior developers can do. The person who arrives without wrong heuristics, who's comfortable expanding scope into business context and system ownership, who carries a pager and learns economics alongside code. That person will thrive. The absence of accumulated assumptions becomes an advantage when the assumptions are expiring anyway.

Russinovich and Hanselman are describing what happens to the cohort of junior developers. When companies stop hiring juniors, or hire them and pair them with AI agents instead of experienced engineers, or measure only short-term output velocity, the organizational pipeline dries out, because the structure fundamental to developing judgment has been removed.

Both claims are true, operating at different scales. Brooker is talking about individual disposition. Russinovich and Hanselman are talking about organizational design. The apparent contradiction doesn't exist, they're just zooming into different levels.

Which means the question worth asking isn't "will AI hurt or help junior developers?" but "who is making that decision, and are we making it intentionally?"

How automation has reorganized professions before

Arthur M. Wellington, a 19th century railroad engineer known for his work on infrastructure economics, defined engineering as "the art of doing well with one dollar what any bungler can do with two after a fashion."

The "art" in his example is the art of building economically. When execution becomes cheap, the profession reorganizes around judgment, trade-offs, and constraint management.

And this is a pattern, not a prediction.

The clearest quantitative example comes from manufacturing. When CNC machines automated metalworking through the 1970s and onward, operators running hand-operated tools were displaced. But the profession didn't shrink. In Computerized Machine Tools and the Transformation of US Manufacturing, a 2022 NBER study tracking four decades of data, the research team found that employment for college graduates in affected industries rose 86% from a low base, while losses hit high school graduates at 7-8%.

The tasks that grew in demand were, in the researchers' words, "more conceptual and socially connected, and require more training, preparation, and learning" than the ones that declined.

Execution became programmable, and the profession reorganized around judgment, customization, and system-level thinking.

You probably already heard the spreadsheet version of this story. After Excel's introduction, bookkeepers and clerks shrank from 2 million to 1.5 million, but accountants, auditors, and financial managers surged. When the ledger work got cheap, the profession moved upstream into analysis and advisory.

The most structurally similar case, though, is playing out in a different discipline right now: large law firms are shrinking junior ranks as discovery and contract review are being automated.

The human center shifts to narrative construction, litigation strategy, and client counseling. But moving professions upstream isn't that easy: Discovery was historically where junior lawyers developed judgment. The grind of reviewing thousands of documents taught pattern recognition, relevance filtering, and adversarial thinking. Automating the execution layer removed the drudgery - along with the training ground.

Sound familiar?

The pattern is consistent: when execution gets cheap, professions reorganize around judgment. The question that varies, case by case, is whether the reorganization includes the next generation, or discards them.

The hiring decisions you're already making - whether you're aware or not

Every company that chose not to open a junior headcount this quarter made this decision. Every team that paired a senior engineer with an AI coding agent instead of an early-career developer made it too. Every manager who measures output velocity without accounting for the mentorship that isn't happening anymore makes it by default.

These are decisions being made today. Mostly unintentionally.

The complication that makes this harder than "just hire juniors and pair them up": Brooker's other essay from around the same time, My heuristics are wrong. What now?, describes an "extinction-level event for rules of thumb." Seniors are simultaneously revising their own heuristics while being asked to transmit judgment. The experienced engineer's mental model of system maintainability, code costs, API design, and service boundary assumptions is actively being invalidated in near-real time.

This is why Russinovich and Hanselman's preceptor model (structured mentoring through paired practice) specifies a year-long pairing as equals, not a senior dispensing wisdom to an intern. The model works because both parties are learning together. The senior contributes pattern recognition and system context. The junior contributes comfort with AI tooling and freedom from heuristics that no longer apply. Neither is the teacher. Both are learning.

Your choice, if you're hiring, is between making this structural decision consciously or letting it happen unintentionally - having a hundred headcount conversations without ever considering your pipeline.

If you're a junior: How to build judgment without waiting for your org

Everything above is about forces acting on you. Forces you can't change, like organizational decisions, cohort dynamics, and pipeline economics.

But you can still optimize, regardless of which way your company chooses.

Brooker's prescription is straightforward: expand scope. Engage with business context, customer needs, economics, and trade-offs. Own systems. Carry a pager. Don't wait for the structure to develop your judgment. Build it yourself by going where the ambiguity is.

Albert Zhao, a fellow AWS engineer, made this concrete in a recent video: rather than "learn to code better," the path is "learn to make decisions about systems": understand why a service boundary exists where it does, why a particular trade-off was chosen, and what the second-order effects are when requirements change.

The organizational design question is one you can't control. What you can control is whether you're building the kind of judgment that matters on either side of the decision.

Rediscovering Domain-Driven Design, one MCP server at a time

Dennis Traub — Mon, 18 May 2026 13:17:09 +0000

A few days ago, a devops engineer posted on r/devops:

"MCP servers just showed up in our infrastructure and I genuinely have no idea how to secure them, anyone been through this?"

Filesystem access, shell permissions, database connectors - all callable by agents without human approval. At the time I'm writing this, the thread has 76 upvotes and 39 comments from fellow engineers improvising solutions: "separate by blast radius," "don't mix list_files and execute_shell in one server," "three security surfaces, not one."

They're all describing the same thing, rediscovering patterns that Eric Evans described in Domain-Driven Design (DDD).

In his book, Eric introduced concepts like Bounded Contexts and Anti-Corruption Layers, which gave us the vocabulary we've been using for system boundaries ever since. They helped us survive the microservices transition, and they apply directly to the architectural problems AI systems are creating right now.

We made the same mistakes a decade ago

In the 2010s, many teams adopted microservices without understanding what made them work. They took monoliths, split them apart, and called the pieces "services." More often than not, the result was distributed monoliths - all the operational complexity of distribution with none of the architectural benefits of real, well-defined boundaries.

The correction took years. We learned (painfully) that a microservice boundary isn't where you split the code. It's where you split the mental model you have of the application. A payment service and a user service don't just live in different containers - they have different vocabularies, different invariants, different reasons to change.

And the same mistake is happening again, this time with MCP servers. We wrap existing REST APIs one-to-one and call it AI integration. David Soria Parra, one of the creators of MCP, said "it's a bit cringe, it just results in horrible things" at AI Engineer World's Fair 2025. The Thoughtworks Technology Radar placed "MCP by default" as a Caution. And if you dive into the argument they make, they're both saying the same thing: we're building distributed monoliths again.

But the correction doesn't have to take years this time. The vocabulary already exists - and has been battle-tested for more than 20 years.

Bounded Contexts: one server, one model, one language

A Bounded Context defines where a particular (data or object) model is valid. Inside the boundary, terms have precise meanings: a "transaction" in the finance context means money changing hands, a "transaction" in the booking context means a reservation. Inside each boundary lives one language and one set of rules. Across boundaries, you expect translation.

And MCP is particularly interesting from this angle: the protocol already enforces bounded contexts at the topology level.

MCP's architecture uses a one-client-per-server model. The host spawns a separate client for each MCP server, and each client talks to exactly one server. An MCP server for your database cannot accidentally leak data to an MCP server for your file system. Unlike microservices, where any service can trivially call any other over the network, an MCP server has no protocol-level way to reach another server's tools. You have to deliberately build that bridge. Cross-boundary coupling becomes visible and intentional rather than accidental.

But only if you design your servers as bounded contexts.

The failure mode is an MCP server that exposes everything, including filesystem access, shell execution, and database connectors in a single server. That's three separate concerns crammed into one boundary - the equivalent of a microservice that owns users, payments, and notifications.

The commenter on Reddit who wrote "don't mix list_files and execute_shell in one server" was actually designing context boundaries, even if he didn't know the term.

Anti-Corruption Layers: separating the tools from domain logic

An Anti-Corruption Layer (ACL) prevents one system's model from contaminating another. It translates between two different worldviews.

In AI systems, two fundamentally different models collide every time an agent calls a tool:

For the LLM, everything is strings, parameters are simple, and context is a token window. It reasons in natural language to generate structured calls.
The domain consists of rich types, configuration, state, complex error handling, and business invariants that must hold regardless of how they're invoked.

The tools layer sits between these two worlds. In Chris Hughes’s words, it “protects your domain from the LLM’s interface requirements - translating between ‘strings the LLM can reason about’ and ‘rich domain objects your code works with’.”

Here's a tool that ignores this principle:

# Everything in one function - LLM interface mixed with domain logic

@mcp.tool()
async def transfer_funds(from_account: str, to_account: str, amount: str):
    amount_decimal = Decimal(amount)
    from_acc = await db.get_account(from_account)

    if from_acc.balance < amount_decimal:
        return "Insufficient funds"

    if from_acc.is_frozen:
        return "Account frozen"

    await db.execute_transfer(from_acc, to_account, amount_decimal)
    await audit_log.record(from_account, to_account, amount_decimal)

    return f"Transferred {amount} from {from_account} to {to_account}"

And here's the same operation with a proper separation:

# Tool layer: thin adapter (the ACL)
@mcp.tool()
async def transfer_funds(from_account: str, to_account: str, amount: str):
    result = await transfer_service.execute(
        from_account=from_account,
        to_account=to_account,
        amount=Decimal(amount)
    )
    return result.to_agent_summary()

# Service layer: domain logic, testable without the LLM
class TransferService:
    async def execute(self, from_account, to_account, amount) -> TransferResult:
        account = await self.accounts.get(from_account)
        account.validate_transfer(amount)  # raises on invariant violation
        transfer = account.initiate_transfer(to_account, amount)
        await self.transfers.save(transfer)
        await self.audit.record(transfer)
        return TransferResult(transfer)

The second version gives you:

Testability: the service works without an LLM. Run it from tests, CLI, scripts.
Replaceability: change the LLM interface (tool parameters, response format) without touching business logic. Change business rules without touching the tool layer.
Composability: other MCP servers, other agents, or humans can call the same service through their own interface.

The ACL protects both sides. The domain doesn't get contaminated by the LLM's string-based worldview. The LLM doesn't get overwhelmed by domain complexity it can't reason about.

The same vocabulary in a new domain

Back to that Reddit thread.

"Separate MCP servers by blast radius." That's bounded context design. Each server owns one domain. The blast radius is contained because the boundary is real.

"Three security surfaces, not one - tool capability, tool description, and tool call chains." The ACL decomposed into its responsibilities. Tool capability is what the domain allows. Tool description is what the LLM thinks it can do. Tool call chains are cross-boundary interactions that need explicit orchestration.

"The dangerous part is not one tool in isolation. It is the chain." In DDD terms: an aggregate invariant violation. A sequence of operations crossing bounded contexts without coordination. Each operation succeeds locally while the system fails globally.

Same patterns, same structural problem, discovered independently because the problem is real.

The "abstraction tax" is the ACL doing its job

One fair criticism is that MCP adds a layer. The Thoughtworks Tech Radar calls this the "abstraction tax" - every protocol layer between an agent and an API loses fidelity. Simon Willison notes that "almost everything I might achieve with an MCP can be handled by a CLI tool instead."

This is correct. And it's exactly the same argument people made against microservice boundaries, API gateways, and anti-corruption layers in traditional systems. The translation layer comes with costs: you lose directness.

But this loss is intentional. It's the ACL doing its job. The LLM doesn't need to know about your domain's internal types, retry logic, or state management. The domain doesn't need to accommodate the LLM's string-based reasoning model. The "tax" buys you isolation, replaceability, and, ultimately, peace of mind.

It's only a mistake if we're paying this tax without getting the architectural benefit - which is exactly what REST-to-MCP 1:1 wrappers do. They add the layer without adding the boundary: all cost, no benefit.

The vocabulary already exists. Let's keep using it.

We don't have to reinvent these patterns - DDD has 20+ years of battle scars. We've learned the hard way where to draw boundaries, how to enforce them, and what happens when we don't. AI or no AI, Eric Evans's Domain-Driven Design is still the canonical reference for complex software systems.

MCP is already designed to establish bounded contexts; the tools layer is already an anti-corruption layer. Name your MCP servers after the domain they own, not the API they wrap, and when someone on your team says "separate by blast radius" - let them know that there are established patterns for what they're describing.

If you're interested in how vocabulary ambiguity gets amplified by AI coding agents - and what you can do about it - I wrote a follow-up: Your agent keeps using that word ...

8 Agents Wrote Perfect Components - And Nothing Worked

Dennis Traub — Fri, 27 Mar 2026 22:50:44 +0000

TL;DR: Parallel AI agents don't coordinate on shared contracts, such as column names, URL paths, parameter formats, or identifiers. Extract those contracts into a single reference file before generation, and run a review agent that traces end-to-end data flows once the parallel agents are done. This single step fixed all 17 bugs in one pass.

I launched 8 AI agents in parallel to build a full-stack app on AWS: infrastructure stacks, a React frontend, and a Java backend. Each agent owned one piece, and they all delivered clean, compiling code. The CDK type-checked, the Java backend followed Spring Boot conventions, the React UI looked nice.

But when I tried to wire them together I hit bugs at every single boundary.

The architecture

A full-stack app on AWS with a lot of moving parts. Multiple CDK stacks for the infrastructure (IAM, VPC, DB with seed functions, Cognito, CodePipeline, CloudFront/WAF), a Spring Boot backend on ECS Fargate, and a React frontend hosted on S3.

The implementation plan was thorough and covered every component. But it wasn't detailed enough for agents that need to agree on shared contracts.

The bugs

The first two block everything. Bugs 3 through 5 only show up after you fix the previous ones.

Bug 1: The Spring Boot app won't even start

The seed data function creates a schema with passenger_id and full_name, but the Spring Boot entity maps to id and name:

-- Agent 1: seed data function creates the schema
CREATE TABLE passengers (
    passenger_id   VARCHAR(64) PRIMARY KEY,
    full_name      VARCHAR(255) NOT NULL,
    ...
);

// Agent 2: The Spring Boot entity maps the table
@Column(name = "id")       // Schema says "passenger_id"
@Column(name = "name")     // Schema says "full_name"

With ddl-auto: validate, Hibernate checks the mapping on startup. But the columns don't exist, so the ECS task crashes before serving a single request.

Bug 2: Every call returns 404

The CDK stack registers ALB routes for /approve and /generate while the Java client sends requests to /voucher/approve and /voucher/generate:

CDK ALB routes:  /approve, /generate
Java client:     /voucher/approve, /voucher/generate

Both agents wrote correct, working code in isolation, but the CDK stack used clean paths while the Java client added a service prefix. Neither checked the other.

Bug 3: Missing request fields

A downstream service validates four required fields. The Java client sends three:

Lambda expects:  escalationId, passengerId, amount, situation
Java sends:      escalationId, passengerId, amount

Even with the URLs from bug 2 fixed, every approval returns 400.

Bug 4: User lookup doesn't work

This one was the most interesting: three systems work with the user, and each of them created their own identifier:

Cognito custom attribute:  custom:passenger_id = "pax-a1b2c3d4-e5f6-..."
RDS seed data:             passenger_id = "PAX-a1b2c3d4-e5f6-..."
JWT subject claim:         sub = "a1b2c3d4-e5f6-..."  (Cognito UUID)

The backend uses jwt.getSubject() to look up the user. That's a Cognito UUID - neither prefixed with pax- nor with PAX-. No user lookup ever returns a result.

Three agents. Three naming conventions. Zero coordination.

Bug 5: Every status lookup returns "not found"

A downstream service returns JSON. The Java client parses XML:

{"status": "FOUND_LOCAL", "location": "Warehouse-B-Shelf-47"}

String status = extractXmlElement(xml, "status");  // Looks for <status>...</status>

No XML tags in a JSON string. extractXmlElement returns empty for every single request.

The agent that wrote the downstream service followed one spec (JSON). The agent that wrote the Java client followed a different spec (XML).

Bugs 6 to 17: SSM parameter path mismatches

One CDK stack writes an SSM parameter. Another CDK stack reads it. But they never coordinated on paths:

Producer stack writes:  /${AppName}/test/data/rds-secret-arn
Consumer stack reads:   /${AppName}/${Env}/data/rds-password-secret-arn

...

Twelve SSM parameters mismatched between producer and consumer stacks. The app fails on every one of them.

Why parallel agents can't catch this

Each agent had context about the overall plan and its own component. But none of them could see the implementation details that the others came up with.

When I write an app, I hold the contracts in working memory. "The column is passenger_id, so I'll use that in both the migration and the entity." But an AI agent writing the migration doesn't know what the entity agent chose for its column name - and vice versa.

The plan contained all the high-level information, but the agents were reading different sections and making their own calls on the shared details.

Each agent wrote correct code that followed good conventions. But they never coordinated. Like digging a tunnel from two sides of a mountain - without ever checking in with each other.

How I found all of them at once

After generation, before actually deploying the app. I ran an architecture review agent with a simple instruction:

Trace the actual data flow from user login through form submission to the downstream service calls, following every cross-component boundary.

It found every one of the bugs in a single pass.

The review agent started at the user-facing entry point, traced the request through every boundary, and at each one checked whether what one component sent actually matched what the next one expected. Same thing integration tests do after deployment, but you catch it before deploying anything.

How to prevent seam bugs

Before launching parallel agents, pull every shared contract out of the plan into a single reference file and pass it to every agent as mandatory context.

Then, after your parallel agents did their thing, run a review agent that traces a few real user flows across all the boundaries.

Fix the seam bugs in one pass, then deploy.

FAQ

What are seam bugs in AI-generated code?

Seam bugs are integration defects at the boundaries between components built by different AI agents. Each agent writes correct, working code in isolation, but the components don't fit together because the agents each made their own decisions about shared details - things like what a column is called, what path an API lives at, or what format an identifier uses.

Why does parallel AI code generation produce integration bugs?

Each agent only sees its own component and the plan it was given. When two agents need to agree on something - say, what a database column is called - they each pick a reasonable name independently. Those names often don't match. The plan says what the column should represent, but not necessarily the exact string both sides should use.

How do you catch integration bugs from parallel AI agents?

Run a single review agent after generation that traces real user flows across all the boundaries. Give it a prompt like "trace the data flow from user login through the frontend, backend, to databases and downstream service calls, checking every boundary." It will catch the mismatches in one pass.

My 8 Agents Wrote Perfect Components - And Nothing Worked

Dennis Traub — Fri, 27 Mar 2026 22:50:44 +0000

But when I tried to wire them together I hit bugs at every single boundary.

The architecture

The implementation plan was thorough and covered every component. But it wasn't detailed enough for agents that need to agree on shared contracts.

The bugs

The first two block everything. Bugs 3 through 5 only show up after you fix the previous ones.

Bug 1: The Spring Boot app won't even start

The seed data function creates a schema with passenger_id and full_name, but the Spring Boot entity maps to id and name:

-- Agent 1: seed data function creates the schema
CREATE TABLE passengers (
    passenger_id   VARCHAR(64) PRIMARY KEY,
    full_name      VARCHAR(255) NOT NULL,
    ...
);

// Agent 2: The Spring Boot entity maps the table
@Column(name = "id")       // Schema says "passenger_id"
@Column(name = "name")     // Schema says "full_name"

With ddl-auto: validate, Hibernate checks the mapping on startup. But the columns don't exist, so the ECS task crashes before serving a single request.

Bug 2: Every call returns 404

The CDK stack registers ALB routes for /approve and /generate while the Java client sends requests to /voucher/approve and /voucher/generate:

CDK ALB routes:  /approve, /generate
Java client:     /voucher/approve, /voucher/generate

Both agents wrote correct, working code in isolation, but the CDK stack used clean paths while the Java client added a service prefix. Neither checked the other.

Bug 3: Missing request fields

A downstream service validates four required fields. The Java client sends three:

Lambda expects:  escalationId, passengerId, amount, situation
Java sends:      escalationId, passengerId, amount

Even with the URLs from bug 2 fixed, every approval returns 400.

Bug 4: User lookup doesn't work

This one was the most interesting: three systems work with the user, and each of them created their own identifier:

Cognito custom attribute:  custom:passenger_id = "pax-a1b2c3d4-e5f6-..."
RDS seed data:             passenger_id = "PAX-a1b2c3d4-e5f6-..."
JWT subject claim:         sub = "a1b2c3d4-e5f6-..."  (Cognito UUID)

The backend uses jwt.getSubject() to look up the user. That's a Cognito UUID - neither prefixed with pax- nor with PAX-. No user lookup ever returns a result.

Three agents. Three naming conventions. Zero coordination.

Bug 5: Every status lookup returns "not found"

A downstream service returns JSON. The Java client parses XML:

{"status": "FOUND_LOCAL", "location": "Warehouse-B-Shelf-47"}

String status = extractXmlElement(xml, "status");  // Looks for <status>...</status>

No XML tags in a JSON string. extractXmlElement returns empty for every single request.

The agent that wrote the downstream service followed one spec (JSON). The agent that wrote the Java client followed a different spec (XML).

Bugs 6 to 17: SSM parameter path mismatches

One CDK stack writes an SSM parameter. Another CDK stack reads it. But they never coordinated on paths:

Producer stack writes:  /${AppName}/test/data/rds-secret-arn
Consumer stack reads:   /${AppName}/${Env}/data/rds-password-secret-arn

...

Twelve SSM parameters mismatched between producer and consumer stacks. The app fails on every one of them.

Why parallel agents can't catch this

Each agent had context about the overall plan and its own component. But none of them could see the implementation details that the others came up with.

The plan contained all the high-level information, but the agents were reading different sections and making their own calls on the shared details.

Each agent wrote correct code that followed good conventions. But they never coordinated. Like digging a tunnel from two sides of a mountain - without ever checking in with each other.

How I found all of them at once

After generation, before actually deploying the app. I ran an architecture review agent with a simple instruction:

Trace the actual data flow from user login through form 
submission to the downstream service calls, following every 
cross-component boundary.

It found every one of the bugs in a single pass.

How to prevent seam bugs

Before launching parallel agents, pull every shared contract out of the plan into a single reference file and pass it to every agent as mandatory context.

Then, after your parallel agents did their thing, run a review agent that traces a few real user flows across all the boundaries.

Fix the seam bugs in one pass, then deploy.

FAQ

What are seam bugs in AI-generated code?

Why does parallel AI code generation produce integration bugs?

How do you catch integration bugs from parallel AI agents?

Missing from the MCP debate: Who holds the keys when 50 agents access 50 APIs?

Dennis Traub — Wed, 18 Mar 2026 12:35:57 +0000

There are two debates happening right now:

CLI vs MCP - should agents call existing CLIs or use an MCP server? And API vs MCP - does wrapping a REST API in an MCP server add value, or just complexity?

Both focus on how agents call tools. What both aren't asking is, who holds the credentials when they do.

Fifty agents, fifty sets of keys

When one developer runs one agent on one laptop, credentials are simple. You store them locally, maybe rotate them, and move on.

But that's not where we're heading. Dozens of agents per team, each needing access to Slack, GitHub, Jira, Office 365, that legacy CRM, multiple SaaS tools, and all your internal APIs.

Some of those have CLIs. Most don't - they're SaaS products with REST APIs. If you're lucky - who knows how many production systems still use a global, password-protected admin account.

So every agent needs a separate API key, OAuth token, or username/password pair. For each downstream system. On every machine. And if you've ever managed API keys for a team, you know where this goes. Keys in .env files, shared over Slack, committed to repos, never rotated.

Now hand that problem to fifty autonomous agents.

What happened to SSO?

Most organizations with any sense of security have established SSO, spent years consolidating identity. Every SaaS tool, every internal system, every third-party integration flows through one identity provider.

When someone leaves, you disable a single account. When compliance asks about access controls, there's one answer - and you know exactly where to find it.

And now, agents are about to blow a wide open hole into everything you've built. Whether your agent calls a CLI, hits a REST API, or talks to an MCP server, it needs credentials. And if those credentials live on the agent's machine, they live outside your identity boundary.

Imagine a contractor wrapping up on Friday. You disable their SSO account, but their laptop still has three agents with API keys for your CRM, your internal docs, and your deployment pipeline. Those keys don't expire with the SSO account. Those agents can continue calling your APIs long after the contractor has moved on.

Remote MCP servers are identity boundaries

This is where remote MCP servers earn their place in both debates.

The CLI vs MCP crowd argues about token efficiency. The API vs MCP crowd argues about unnecessary abstraction. Neither side is talking about the nightmare of decentralized credential management.

Charles Chen makes this point well in MCP is Dead; Long Live MCP!. Most of the debate ignores the difference between MCP over stdio (local, and yeah, mostly pointless compared to raw curl or a CLI) and MCP over streamable HTTP (remote, centralized). Once MCP runs as a centralized server, users authenticate via OAuth and never touch the downstream keys.

As he puts it: "An engineer leaves your team? Revoke their OAuth token and access to the MCP server; they never had access to other keys and secrets to start with."

Now take that one step further. In most organizations, that OAuth isn't standalone - it flows through SSO. The MCP server becomes an identity boundary. Your users never store any API keys, custom tokens, or service accounts. One auth mechanism instead of one per machine per agent per API.

Disable the SSO account, and every agent loses access. To everything.

But we already learned this, right? Every microservice managing its own database credentials was a nightmare until we centralized secrets management. Agent credentials are the same problem, just one layer up.

3 Things I Wish I Knew Before Setting Up a UV Workspace

Dennis Traub — Fri, 27 Feb 2026 17:57:09 +0000

I love uv, it's so much better than pip, but I'm still learning the ins and outs. Today I was setting up a Python monorepo with uv workspaces and ran into a few issues, the fixes of which were trivial once I knew about them.

1. Give the Root a Distinct Name

First, a virtual root (package = false) still needs a [project] name - and it can't match any member package.

I had both the root and my core package using the same name, e.g. my-app:

my-app/                   # workspace root
  pyproject.toml          # name = "my-app" <- problem!
  packages/
    core/
      pyproject.toml      # name = "my-app"
      src/core/
    cli/
      pyproject.toml      # name = "my-app-cli"
      src/cli/

When I ran uv sync, it refused outright:

$ uv sync
error: Two workspace members are both named `my-app`:
  `/path/to/my-app` and `/path/to/my-app/packages/core`

Even though the root has package = false, uv still registers its name as a workspace member identity. Same name, two members, no way to disambiguate.

The fix - give the root a workspace-specific name:

# Root pyproject.toml
[project]
name = "my-app-workspace"  # NOT "my-app"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = []

[tool.uv]
package = false

[tool.uv.workspace]
members = ["packages/*"]

[dependency-groups]
dev = [
    "pytest",
    "ruff",
]

Two things to note: package = false means "don't install me", not "don't need a name". And dev dependencies go in [dependency-groups] (PEP 735), not [project.dependencies] - the root is virtual, so project dependencies are just metadata.

2. Use `workspace = true` for Inter-Package Deps

When one workspace package depends on another, you need two things: a normal dependency declaration and a [tool.uv.sources] entry telling uv to resolve it locally.

# packages/cli/pyproject.toml
[project]
name = "my-app-cli"
dependencies = [
    "my-app",
]

[tool.uv.sources]
my-app = { workspace = true }

Without the [tool.uv.sources] entry, uv sync fails with a helpful but initially confusing error:

$ uv sync
  x Failed to build `my-app-cli @ file:///path/to/packages/cli`
  |-- Failed to parse entry: `my-app`
  \-- `my-app` is included as a workspace member, but is missing
      an entry in `tool.uv.sources`
      (e.g., `my-app = { workspace = true }`)

At least uv tells you exactly what to add.

The [project.dependencies] list stays PEP 621 compliant, so any standard Python tool can read it. The [tool.uv.sources] table is uv-specific and only affects resolution. And uv sync installs the local package as editable automatically - changes are immediately visible without reinstalling.

3. Use `importlib` Mode for pytest

When running pytest across a workspace where multiple packages have tests/ directories with same-named test files (e.g. both have test_helpers.py), pytest's default import mode breaks:

$ uv run pytest packages/ -v
collected 1 item / 1 error

ERROR collecting packages/core/tests/test_helpers.py
import file mismatch:
imported module 'test_helpers' has this __file__ attribute:
  /path/to/packages/cli/tests/test_helpers.py
which is not the same as the test file we want to collect:
  /path/to/packages/core/tests/test_helpers.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename

Pytest's default prepend import mode treats both test_helpers.py as the same module. It imports the first one, caches it, then errors when the second file doesn't match.

The fix - add importlib mode to your root pyproject.toml:

# Root pyproject.toml
[tool.pytest.ini_options]
addopts = "--import-mode=importlib"

$ uv run pytest packages/ -v
packages/cli/tests/test_helpers.py::test_cli_helper PASSED    [ 50%]
packages/core/tests/test_helpers.py::test_core_helper PASSED  [100%]

Note: Don't add __init__.py to your test directories as a workaround - with importlib mode, that can actually cause a silent bug where pytest resolves both files to the same cached module and runs the wrong tests without any error.

This isn't uv-specific - it's a Python monorepo thing. But uv workspaces make monorepos easy to set up, so you're likely to hit it early.

References

How to Use Strands Agents' Built-In Session Persistence

Dennis Traub — Tue, 17 Feb 2026 16:41:10 +0000

Today I learned that the Strands Agents SDK has a built-in persistence layer for conversation history.

Pass a SessionManager to the Agent constructor, and every message and state change is persisted automatically through lifecycle hooks. No manual save/load calls.

The Code

Save this as session_demo.py and run it with uv run session_demo.py.

The # /// script block is PEP 723 inline metadata - uv run reads it to install dependencies automatically, no venv or pip needed. All you need is uv and AWS credentials configured.

# /// script
# requires-python = ">=3.10"
# dependencies = ["strands-agents"]
# ///

from strands import Agent
from strands.session.file_session_manager import FileSessionManager

SESSION_ID = "user-abc-123"
STORAGE_DIR = "./sessions"  # defaults to /tmp/strands/sessions

# First agent instance - ask a question
agent1 = Agent(
    model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
    agent_id="assistant",
    session_manager=FileSessionManager(
        session_id=SESSION_ID, storage_dir=STORAGE_DIR
    ),
)
prompt1 = "What's the capital of France?"
print(f"Prompt: {prompt1}")
agent1(prompt1)
print()

# Second agent instance - same session_id, loads conversation from disk
agent2 = Agent(
    model="global.anthropic.claude-haiku-4-5-20251001-v1:0",
    agent_id="assistant",
    session_manager=FileSessionManager(
        session_id=SESSION_ID, storage_dir=STORAGE_DIR
    ),
)
prompt2 = "What did I just ask you?"
print(f"Prompt: {prompt2}")
agent2(prompt2)
print()

What's happening here:

agent1 and agent2 are separate Agent instances - they share no memory
agent2 can answer "What did I just ask you?" because FileSessionManager restored the conversation from disk when the second instance was created
The agent_id identifies which agent's state to save and restore - required when using a session manager

What Gets Persisted

The session manager saves three things:

Conversation history - all user and assistant messages (the messages/ directory)
Agent state - a JSON-serializable key-value dict you can use for your own data (agent.json)
Session metadata - timestamps and session type (session.json)

After running the script, here's what's on disk:

sessions/
└── session_user-abc-123
    ├── agents
    │   └── agent_assistant
    │       ├── agent.json
    │       └── messages
    │           ├── message_0.json
    │           ├── message_1.json
    │           ├── message_2.json
    │           └── message_3.json
    ├── multi_agents
    └── session.json

Each message is a separate JSON file:

{
  "message": {
    "role": "user",
    "content": [
      {
        "text": "What's the capital of France?"
      }
    ]
  },
  "message_id": 0,
  "created_at": "2026-02-17T14:45:31.439081+00:00"
}

User and assistant turns alternate through message_0.json to message_3.json.

Built-In Backends

The example uses FileSessionManager, but the SDK ships three backends:

Manager	Use Case
`FileSessionManager`	Local development, single-process
`S3SessionManager`	Production, distributed, multi-container
`RepositorySessionManager`	Custom backend (implement `SessionRepository`)

Tips and Notes

Troubleshooting tips

uv: command not found
Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh (macOS/Linux) or powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" (Windows). See uv installation docs.

NoCredentialError or Unable to locate credentials
AWS credentials aren't configured. Run aws configure to set up a default profile, or export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. See AWS CLI configuration.

AccessDeniedException when calling the model
Your AWS credentials don't have permission to invoke the Bedrock model. Make sure your IAM user or role has bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions.

Good to know

FileSessionManager is not safe for concurrent same-session writes

Two API requests with the same session_id writing to FileSessionManager concurrently can corrupt the session data. The storage layer has no locking - it reads, appends, and writes the full JSON file without coordination.

For development, this is fine. Single-process, single-user development (CLI, local testing) will never hit this. Sequential requests to the same session are safe.

For production, you have three options:

Per-session locking in your API layer - serialize requests per session_id before they reach the Agent
S3SessionManager - uses atomic S3 operations for safe concurrent writes
A custom SessionRepository - implement your own with proper concurrency handling (database-backed, etc.)

agent_id is required with a session manager

If you omit agent_id when using a SessionManager, you'll get ValueError: agent_id needs to be defined. The session system uses it as a directory key to separate state for different agents within the same session.

FileSessionManager defaults to /tmp

Without an explicit storage_dir, sessions are written to /tmp/strands/sessions - which most operating systems wipe on reboot. Set it to a project-local path like ./sessions or .data/sessions.

References

[Boost]

Dennis Traub — Sun, 15 Feb 2026 11:20:33 +0000

From Zero to Agentic Coding: Running Claude Code with Amazon Bedrock

Gunnar Grosch for AWS ・ Feb 15

#aws #ai #beginners #productivity

Why Your Chatbot is the BlackBerry of the 2020s

Dennis Traub — Fri, 13 Feb 2026 01:58:38 +0000

This is my third major technological shift, and every time I hear the same question echo through the C-Suites:

"But where's the ROI?"

In the 90s, we put our print brochures and yellow pages on the web - and called it digital transformation. In the 2000s, we put tiny keyboards or a Windows start button on a phone and called it mobile computing.

Both times, the answer wasn't better brochures or smaller buttons. The answer was Google and Facebook. It was the iPhone and Android. New ideas, by companies that didn't try to replicate the old world - with all its constraints - on a new medium. They built something entirely new. They embraced the new medium and built what couldn't have existed before.

And now we're doing it again, with AI.

Every demo is a chatbot.
Every pilot is "adding AI to our existing workflow."
Every enterprise use case tries to optimize what already exists.

But this time, it kind of works. And that's the trap.

The web's brochureware visibly broke. The stylus on Windows Mobile was physically painful. But AI chatbots deliver just enough value to feel like progress.

We're settling for 10% of what's possible and call it "revolutionizing the [industry of your choice]".

But the real inflection point isn't better support agents or travel booking chatbots. It's when we stop replicating what's already there and start building something that never existed before.

How a subtle MCP server bug almost cost me $230 a month

Dennis Traub — Wed, 11 Feb 2026 18:23:12 +0000

An important part of my job is to collect and distill feedback into recommendations for product and engineering teams. Sure, not quite as glamorous as traveling the world, but it's a lot of fun: I'm getting paid to experiment with brand new tech - and I get to directly influence the developer experience of our products.

But - just like every job - there's also a lot of routine work involved, including the boring type. And if there's anything my ADHD brain hates - with a passion! - it's boring routine work. And there's one thing right at the top of the list: processing tasks in a project management tool.

So I built an AI agent to help me: triage tasks, add context, draft comments, move items around - all through an MCP server.

As I said. Routine work. Boring and predictable. Right?

Right. Until I noticed the time it needed for what should be simple updates.

Three Calls, Zero Updates

The MCP server has an update_task tool, and the agent called it with a custom_fields parameter. The server took the request, processed it, and returned success.

But when the agent continued, the custom fields were unchanged. It tried updating again - with a different format. Success. But nothing changed. Third attempt. Success. Still nothing.

3 success responses. Zero successful updates. And the API never told the agent that anything was wrong.

Tracking It Down

So after multiple failed attempts, the agent started investigating on its own. It checked whether it had the right access permissions. Or if it can use curl to bypass the MCP layer entirely.

curl worked, showing that the problem wasn't permissions. So it must be the tool itself.

After some more back and forth, the agent discovered that create_batch_request - a completely different MCP tool - is the only way to update custom fields. The update_task tool accepts the custom_fields parameter without complaint, but the parameter isn't actually in the tool's schema. The tool silently drops it, updates everything else, and returns a success message.

Maybe a small issue, if it happened only once.

But my logs showed 16 silently failed attempts across 7 tasks. The same cycle every time: try, get "success," see nothing changed, investigate, try again, finally find a workaround.

The agent kept hitting the same wall because the API never told it the wall existed.

A crash would have been so much better - the agent would see the error and immediately try a different tool. Instead, it got a success response and had to figure out through downstream verification that the "successful" call hadn't been sucessfull after all.

Each time a new agent instance worked on a task, it went through the same process, wasting ~93K tokens just to figure out that there is a problem - and how to solve it. Learning wasn't possible, because there was no error to learn from.

Let's Look at the Math

Every number below comes directly from my session logs.

Metric	Value
Wasted tokens per failed attempt	~93,000
Average failed attempts per task	2.3
Cost per attempt (Claude Opus at $5/MTok)	~$0.47
Cost per task	~$1.08

Now imagine a 5-person team with 10 tasks per person per day.

Timeframe	Per person	Team of 5
Per day (10 tasks)	$10.80	$54
Per month (22 working days)	$238	$1,190
Per year	$2,856	$14,280

This is one parameter, in one MCP tool, on a single workflow. And even that scales really fast.

Two things are worth mentioning: this models the cost before the workaround is discovered. Once you know to use create_batch_request, the waste drops to zero. And it assumes every task hits the bug - which was true in my case, since every triage task needed custom field updates.

The point isn't the exact dollar figure. It's that silent failures delay discovery - possibly indefinitely.

A crash costs one attempt. The agent sees the error, adjusts, moves on. But silent acceptance costs multiple attempts, every time, until you realize there's a problem - if you realize it at all.

In software engineering, there's a concept called "graceful degradation". But if your API - MCP server or not - accepts parameters it can't handle and returns success, you're not being graceful. You're being expensive.

What You can Do

If you build APIs or MCP tools: Add input validation. Reject or warn on unrecognized parameters. My data says that one check would have saved every token and every dollar in this story.

If you build agents: Verify state after every state-changing call. Don't trust success responses - confirm the change actually happened. And make sure your agent surfaces anything it didn't expect.

Some say we don't need to worry about architecture anymore, because AI agents are able to figure things out.

But I think it's the opposite: Software architecture principles become more important with AI, not less.

This is Part 1 of "The Inconsistency Tax" - a 3-part series on what happens when AI agents meet inconsistent APIs. Next: why these failures aren't random, and why "just wrapping an API in an MCP server" doesn't automatically make it agent-ready.

DEV Community: Dennis Traub

[Boost]

Top 7 Featured DEV Posts of the Week

Your agent keeps using that word ...

How AI agents amplify vocabulary ambiguity

Ubiquitous Language: DDD's pattern for shared vocabulary

What does "Customer" really mean?

Practitioners applying DDD's Ubiquitous Language to AI agents

How to define domain vocabulary for your coding agent

AI isn't replacing junior devs. Your org chart is.

What if both sides are right about AI and junior developers?

How automation has reorganized professions before

The hiring decisions you're already making - whether you're aware or not

If you're a junior: How to build judgment without waiting for your org

Rediscovering Domain-Driven Design, one MCP server at a time

We made the same mistakes a decade ago

Bounded Contexts: one server, one model, one language

Anti-Corruption Layers: separating the tools from domain logic

The same vocabulary in a new domain

The "abstraction tax" is the ACL doing its job

The vocabulary already exists. Let's keep using it.

8 Agents Wrote Perfect Components - And Nothing Worked

The architecture

The bugs

Bug 1: The Spring Boot app won't even start

Bug 2: Every call returns 404

Bug 3: Missing request fields

Bug 4: User lookup doesn't work

Bug 5: Every status lookup returns "not found"

Bugs 6 to 17: SSM parameter path mismatches

Why parallel agents can't catch this

How I found all of them at once

How to prevent seam bugs

FAQ

What are seam bugs in AI-generated code?

Why does parallel AI code generation produce integration bugs?

How do you catch integration bugs from parallel AI agents?

My 8 Agents Wrote Perfect Components - And Nothing Worked

The architecture

The bugs

Bug 1: The Spring Boot app won't even start

Bug 2: Every call returns 404

Bug 3: Missing request fields

Bug 4: User lookup doesn't work

Bug 5: Every status lookup returns "not found"

Bugs 6 to 17: SSM parameter path mismatches

Why parallel agents can't catch this

How I found all of them at once

How to prevent seam bugs

FAQ

What are seam bugs in AI-generated code?

Why does parallel AI code generation produce integration bugs?

How do you catch integration bugs from parallel AI agents?

Missing from the MCP debate: Who holds the keys when 50 agents access 50 APIs?

Fifty agents, fifty sets of keys

What happened to SSO?

Remote MCP servers are identity boundaries

3 Things I Wish I Knew Before Setting Up a UV Workspace

1. Give the Root a Distinct Name

2. Use workspace = true for Inter-Package Deps

3. Use importlib Mode for pytest

References

How to Use Strands Agents' Built-In Session Persistence

The Code

What Gets Persisted

Built-In Backends

Tips and Notes

Troubleshooting tips

Good to know

References

[Boost]

From Zero to Agentic Coding: Running Claude Code with Amazon Bedrock

Gunnar Grosch for AWS ・ Feb 15

Why Your Chatbot is the BlackBerry of the 2020s

How a subtle MCP server bug almost cost me $230 a month

Three Calls, Zero Updates

Tracking It Down

Let's Look at the Math

What You can Do

2. Use `workspace = true` for Inter-Package Deps

3. Use `importlib` Mode for pytest