DEV Community: Benjamin Cane

YOLO Is a Terrible Strategy for Validating Production Changes

Benjamin Cane — Fri, 08 May 2026 00:00:00 +0000

YOLO is a terrible strategy for validating production changes.

How many times have you seen it?

Your platform is running smoothly. No alerts, no issues. Then suddenly, something breaks.

After digging in, you discover the cause: another system you depend on made a change, and that change broke your platform.

They didn’t notice it broke. You did, much too late…

How many times have you been the cause of another platform breaking?

🥶 Cold Reality

I wish the above scenario were rare, but it happens constantly across the technology industry.

It happens between internal teams, third-party integrations, and shared infrastructure teams.

These scenarios make you wonder, “How was that change validated?”

Maybe they tested it, and their validation had gaps. Maybe they did little validation at all. If any.

Either way, the result is the same: they validated their change with 100% of production traffic. Bad plan.

💡 Better Ways to Validate Changes

There are many ways teams can reduce production risk when rolling out changes, and the best teams combine the following approaches.

Canary Releases 🐤

I talk about canary deployments often.

Instead of moving 100% of traffic at once, move small percentages gradually and observe behavior closely.

That observed part matters. Look at error rates, latency changes (beyond normal platform warmup), resource spikes, and unexpected retries. All of these indicate customer impact.

Canary deployments are one of the best ways to reduce the blast radius of changes, identify problems quickly, and self-correct.

Shadow Traffic 🪞

Traffic mirroring sends production traffic to a new version before routing live traffic there.

Responses are ignored, but you observe behavior and monitor the same signals you would with a canary release without sacrificing a customer request.

Synthetic Traffic 🤖

Synthetic traffic simulates user behavior continuously. It’s great for monitoring customer experience, but also a great way to validate new deployments.

Route synthetic traffic to upgraded instances first and verify behavior before moving real traffic. If it fails with synthetic traffic, it likely won’t survive real traffic.

Smoke Tests 😶‍🌫️

The classic approach. After deployment, run a small set of fast tests to confirm the platform is fundamentally working.

Smoke tests don’t need to be fancy; they can be shell scripts, API calls, read-only requests, a test file, or full end-to-end validation.

Their purpose is simple: to quickly catch obvious breakage.

🧠 Final Thoughts

Don’t think of the above methods as mutually exclusive choices. Combine them.

Some platforms I work on combine canary releases, shadow traffic, and synthetic traffic. Others use smoke tests plus canary releases.

The more layers of validation you have, the more likely you are to catch issues before your customers do. Because having your customers validate changes for you is a poor strategy.

Deterministic routing is one of the most effective ways distributed systems reduce consistency problems at scale

Benjamin Cane — Fri, 01 May 2026 00:00:00 +0000

Deterministic routing is one of the most effective ways distributed systems reduce consistency problems at scale.

It is a foundational technique used by many modern databases, caches, and large-scale platforms. Understand how it works and you can apply the same pattern in your own systems.

🤔 Understanding the Problem

At some point, every successful system hits the limits of a single database instance.

A single server can only handle so many connections, queries, writes, storage capacity, or CPU/memory demands. Even with the best hardware, performance eventually degrades. So systems scale horizontally.

Instead of sending all traffic to a single database server, requests are distributed across multiple nodes.

At the same time, resiliency matters. If one server fails and all data resides there, the outage can be severe.

So modern databases spread data across multiple nodes, availability zones, and regions.

Distributing load and data solves both capacity and resiliency problems. But it introduces another challenge.

How do you keep request behavior consistent when data is distributed across multiple systems?

⚠️ Why Replication Is Not Enough

Replication helps, but it does not solve every consistency problem.

Imagine a write lands on Server 1. Immediately after, a read request for the same data lands on Server 67. Will Server 67 have the latest version? Maybe, but often not.

Asynchronous Replication

With asynchronous replication, Server 1 will accept the write and replicate the data to other servers in the background. That means a follow-up read on any other node may return stale data.

Synchronous Replication

With synchronous replication, the write on Server 1 will wait for an acknowledgment from all replicas before returning a success. While this improves consistency guarantees, it increases latency.

The farther apart a replica is, the worse this gets. Local writes may be fast, but cross-region writes will be slow. Plus, is it really feasible to replicate data across every single node?

So the question becomes: How do you preserve consistency, without paying latency taxes?

🔀 Route Requests to the Data

A highly effective answer is deterministic routing.

Instead of moving data to where requests might land, move requests to where the data already exists.

If requests for the same key can go to the same node, you gain predictable ownership, reduced stale reads, lower coordination overhead, and easier horizontal scaling.

👨‍🏫 How Deterministic Routing Works

At a high level, the system needs a repeatable way to decide where requests should go.

A common approach is hashing.

A hash of user123 always goes to Node 7
A hash of user456 always goes to Node 42

As long as the same key produces the same result, requests can be consistently routed to the same owner. Many modern databases implement deterministic routing through techniques like consistent hashing, partition maps, and shard ranges.

🗺️ Where Routing Logic Lives

Different systems solve routing in different places.

Client-side Routing

The client library knows the partition map and sends requests directly to the correct node. Used by many distributed caches and databases.

Proxy / Router Tier

A small router sits in front of nodes and forwards traffic appropriately. Useful when client behavior cannot be influenced.

Server-side Forwarding

Requests land anywhere, and the receiving node forwards internally to the owning node. Simple for clients, doesn’t introduce a proxy failure point, but introduces complex cluster discovery/health monitoring.

Each model has tradeoffs.

🧰 Routing Does Not Replace Replication

Deterministic routing is powerful, but not magic. What happens when the owning node is down? You still need replication.

Modern databases combine both: deterministic routing for performance and ownership, plus replication for durability and failover.

🧠 Why This Matters Beyond Databases

Distributed databases use this approach, but it is not unique to them.

Deterministic routing can be used to solve: session ownership, user affinity, in-memory workflow coordination, work queue partitioning, and more.

I’ve used deterministic routing many times to solve load distribution and consistency problems.

At scale, the answer is not always more/better hardware. Consistency and availability problems are not always solved with replication alone.

Sometimes the best answer is simply to send the request to the right place.

When you think of microservices, you probably think of centralized shared services. But there's another valid pattern that is...

Benjamin Cane — Fri, 24 Apr 2026 00:00:00 +0000

When you think of microservices, you probably think of centralized shared services. But there’s another valid pattern that is rarely discussed: running the same microservice inside multiple platforms.

🧩 How It Usually Works

Most microservice designs follow the same model:

Break systems into capabilities, teams, or functions
Deploy one shared service for each capability
Any platform that needs it calls that centralized service

That works well for many cases, but it’s not the only model.

🏗️ How We Got Here

Before microservices, many organizations used Service-Oriented Architecture (SOA).

Despite being labeled as antiquated, SOA and microservices are not that different. Both break down systems into capabilities that communicate with each other. The biggest difference is scope.

In SOA, a “Payments Service” might own:

Message parsing
Validation
Balance checks
Currency conversion
Settlement logic

While other SOA services would own “Users” or “Accounting”. Today, that payment service would be considered an entire platform, with each of those capabilities implemented as microservices within that domain.

Microservices are often the same idea as SOA, just at a more granular level.

🎯 Why Centralization Became the Default

One reason microservices gained traction was the need to avoid duplication. Capabilities were often rebuilt across multiple systems. For example, Currency Conversion is needed in Payments, Accounting, and many other platforms.

Duplication is not just wasteful, it creates real problems: logic drift, coordination overhead, and inconsistent outcomes across systems. Packaging that capability as a standalone service solved real problems: build once, reuse everywhere.

⚠️ The Downside of Centralization

In cell-based architectures, platforms are usually designed to be self-contained and failure-isolated. That means a mission-critical platform depending on a centralized service shared by other platforms can become a design smell.

Cross-cell dependencies
Added latency
Shared failure domains
Complex failover scenarios

So teams, once again, solve these problems by rebuilding the same capability locally.

🔁 Another Option

Instead of rebuilding the capability each time, deploy the same microservice codebase inside multiple platforms. If both Payments and Accounting need a currency conversion service, deploy the same service within each platform.

It’s the same codebase and capability, but with local ownership and resilience. You get reuse without forced centralization.

🧪 Caveats from Experience

This pattern works when applied carefully.

1️⃣ Strong Ownership

A shared codebase still needs a clear owning team. Others can contribute, but someone must own quality, roadmap, and releases.

2️⃣ Pick the Right Capabilities

Not everything is a great fit. Something like currency conversion is well-scoped, relatively stateless, and doesn’t have unique business logic based on which platform is calling it. It’s a strong example.

But other services that have unique logic for each platform domain or require consistency across different platforms are less of a fit.

3️⃣ Operational Discipline

Using the same codebase doesn’t automatically solve all problems; you can still run into drift across platforms if each is running a different version. Changes in behavior still sometimes need coordination.

But with a single codebase, these issues are far easier to address.

💭 Final Thoughts

Microservices gave us reusable building blocks. Sometimes the best use of a microservice is not one centralized deployment. Sometimes it’s many local deployments of the same capability.

Just reuse the software while maintaining autonomy.

Are you using traffic mirroring in production? If not, try it out.

Benjamin Cane — Fri, 17 Apr 2026 00:00:00 +0000

Are you using traffic mirroring in production? If not, you might be missing one of the safest ways to test and observe production changes.

🚦 What is Traffic Mirroring?

Traffic mirroring in Istio or Envoy Proxy lets you send a copy of live traffic to a secondary target.

When enabled, traffic to /service routes to cluster1 as normal, and a mirrored copy is sent to cluster2.

The key: mirrored traffic is fire-and-forget. Responses are ignored and never impact the primary request.

🧪 Why It’s Powerful

1️⃣ Shadow Traffic for Safe Testing

The most common use case is shadow traffic.

When migrating platforms or deploying a new version of an application, you can send real traffic to the new system, observe behavior, and validate responses.

All without impacting users. No risky cutovers. You see exactly how the new system behaves under real load.

2️⃣ Out-of-Band Traffic Inspection

Another powerful use case is traffic inspection.

Inline inspection is risky. It adds latency, introduces new failure points, and becomes part of the critical path.

With traffic mirroring, you can inspect traffic, analyze requests, and detect anomalies.

All without impacting the primary path.

😶‍🌫️ Reality Check

It’s not perfect. There is some overhead.

Mirroring adds load to the sidecar, which may or may not be acceptable for your system. In my experience, it’s negligible, but it’s something you should measure in your own environment before deploying to production.

🧠 Final Thoughts

Traffic mirroring is one of the safest ways to validate migrations, test new systems, and observe real production behavior.

The hard part isn’t mirroring traffic. It’s running two production systems in parallel. That’s the real cost, and the real tradeoff.

But if you can afford that cost, traffic mirroring is an incredibly powerful tool.

If you want to dig deeper:

Istio traffic mirroring docs explain the workflow.
Envoy request mirror policy docs cover the lower-level routing behavior.

Agent Skills Are Becoming the Best Way to Capture Institutional Knowledge

Benjamin Cane — Fri, 10 Apr 2026 00:00:00 +0000

Use Agent Skills to capture institutional knowledge and make it usable by coding agents.

Every organization has institutional knowledge.

Internal frameworks
Preferred practices
Platform-specific capabilities

It exists everywhere. But it’s often undocumented… or buried in a wiki no one reads.

As coding agents take on more work, this problem gets worse.

If you ask an agent to build a new service, you want it to use your internal framework, follow your patterns, and respect your organizational constraints.

A human engineer would ask questions. An agent won’t, unless you give it that context.

📚 Agent Skills as Knowledge Distribution

Most people think about Agent Skills as actions:

Convert markdown to PDF
Review this pull request
Commit my changes

But the more interesting use case is guidance.

Skills aren’t just for doing things. They’re for shaping agent output.

Agents discover and use skills based on intent.

If a user asks: “Create a new Python service.”

The agent looks for relevant skills:

Language conventions (PEP 8, etc.)
Internal frameworks
Organizational standards

That’s where institutional knowledge belongs.

Instead of hoping engineers remember to tell the agent:

“We use Flask, not Django.”
“Stick to the standard library.”
“Follow this service layout.”

You capture that into a skill. The agent applies it automatically.

🧠 Why This Matters

Institutional knowledge only works if it's:

Discoverable
Applied consistently

Agent Skills give you both.

They turn tribal knowledge into something agents can find, understand, and use.

⚠️ The Tradeoff (For Now)

Right now, this introduces duplication.

Most teams already have internal docs, style guides, & wikis.

And now you’re putting the same information into skills. Which feels like extra work.

But it poses an interesting question:

As agents become the primary interface… Will engineers read the wiki? Or ask the agent?

🧠 Final Thoughts

As agents take on more of the implementation work, where you store knowledge becomes more important. Making that knowledge accessible to agents becomes essential.

Agent Skills aren’t just automation tools.

They are becoming the interface for standards, practices, and institutional knowledge.

And teams that embrace that early will see more consistent output from both humans and agents.

Saved Prompts Are Dead. Agent Skills Are the Future.

Benjamin Cane — Fri, 03 Apr 2026 00:00:00 +0000

Saved prompts are dead. Agent Skills are the next step.

If you’ve been around for a while, you probably have a file full of bash one-liners.

Small scripts or commands you saved because they solved a problem you didn’t want to automate properly.

When coding agents arrived, prompts became the new one-liners.

Useful prompts were saved, reused, and eventually turned into “prompt files”, then slash commands like /do-something.

But that model has already evolved.

⚙️ Agent Skills

Agent Skills are the next iteration.

At a basic level, a skill looks a lot like a saved prompt: a directory with a markdown file.

What makes it different is how it’s used.

Skills include metadata like name and description, allowing agents to discover them.

Instead of explicitly calling a prompt every time, the agent can determine when to use a skill based on intent.

This is referred to as progressive disclosure:

Agent loads skill metadata
Matches it to your task
Then loads and executes the full skill when needed

You can still call skills directly (/, $, @), but you don’t always have to.

🧠 More Than Just Prompts

The real differentiator is that skills aren’t just prompts.

They can include reference documentation, templates, and scripts.

This means you’re no longer just telling the agent what to do.

You’re giving it tools and context to execute and validate tasks.

For more complex workflows, it’s often easier to write a script and teach the agent how to use it than to encode everything in a prompt.

⚠️ A Word of Caution

This power comes with risk.

Skills can include executable logic and tell agents to perform tasks.

That means a shared skill can contain malicious or unsafe behavior.

Treat them like any script you install:

Understand what they do
Know where they come from
Review before using (watch out for hidden text or obfuscated instructions)

🧠 Final Thoughts

Agent skills are a meaningful step forward.

They let you codify workflows, preferences, and repeatable agent tasks in a way that agents can discover.

They’re a strong productivity accelerator and a powerful way to capture institutional knowledge in a form agents can actually use.

(More on that in the next post.)

Generating Code Faster Is Only Valuable If You Can Validate Every Change With Confidence

Benjamin Cane — Fri, 27 Mar 2026 00:00:00 +0000

Generating code faster is only valuable if you can validate every change with confidence.

Software engineering has never really been about writing code. Coding is often the easy part.

Testing is harder, and many teams struggle with it.

As tools make it easier to generate code quickly, that gap widens. If you can produce changes faster than you can validate them, you eventually create more code than you can safely operate.

Which begs the question: What does good testing actually look like?

🔍 What Good Looks Like

One of the biggest challenges I see is that teams struggle to understand what “good” testing means and never define it.

Pipelines are often built early in a project, when the team is small, and they rarely keep pace with the system and organization as they grow.

My starting principle is simple:

At pull request time, you should have strong confidence that the change will not break the service or platform being modified.
Within a day of merging, you should have strong confidence that the change hasn’t broken the full customer journey that the platform supports.

🔁 On Pull Request

For backend platforms, I like to see three levels of automated testing before merging.

Code Tests (Unit Tests)

This level is the foundation. Unit tests validate internal logic, error handling, and edge cases. Techniques such as fuzz testing and benchmarking also reveal issues early. As the test pyramid tells us, this is where the majority of testing and logic validation should take place.

Service-Level Functional Tests

Too many teams stop at unit tests for pull requests. Functional tests should also be run in CI for every pull request.

Services should be tested in isolation with functional tests. Dependencies can be mocked, but things like databases should ideally run for real (Dockerized).

This is where API contracts are validated and regressions can be identified without wondering whether the issue came from this change or another service.

Platform-Level Functional Tests

Testing a service alone isn’t enough. Changes can break upstream or downstream dependencies. Platform-level tests spin up the entire platform in CI and validate that services interact correctly.

These tests ensure the platform continues to work as a system.

For platforms with strict latency or resiliency requirements, I recommend introducing light stress tests at both the service and platform levels. These aren’t full performance tests, but they act as early indicators of performance regressions.

If these three layers pass, you should have high confidence in the change. But not complete confidence.

🌙 Nightly Testing

Some failures take time to appear.

Memory leaks, performance degradation, and cross-platform integration issues may not show up immediately.

That’s why I like to run a nightly build (or every few hours).

This environment runs end-to-end customer journey tests, performance tests, and chaos tests.

These are typically the same tests used during release validation, but running them continuously accelerates feedback. If something breaks, you learn about it early, before the pressure of a release.

🧠 Final Thoughts

There is no universal approach everyone can follow.

Different systems have different needs; mission-critical systems may focus heavily on correctness and resilience. Non-mission-critical systems may focus more on validating core functionality.

Your testing strategy depends heavily on architecture, dependencies, and operational constraints. But if your organization is increasing its ability to generate code quickly, your testing capabilities must evolve at the same pace.

AI-generated code becomes much easier to review when you already have high confidence in your testing.

When You Go to Production with gRPC, Make Sure You’ve Solved Load Distribution First

Benjamin Cane — Fri, 20 Mar 2026 00:00:00 +0000

When you go to production with gRPC, make sure you’ve solved load distribution first.

I was recently talking with another engineer who is rolling out gRPC into production. He asked what the biggest gotchas were.

My first answer: Load Distribution.

🚦 HTTP/1 vs. HTTP/2

Most teams first implement services using REST over HTTP/1 and then migrate to gRPC as they seek its performance benefits.

That shift introduces a subtle but important change in how traffic gets distributed across instances.

With HTTP/1, requests are generally tied closely to connections. A client opens a connection, sends a request, waits for the response, and then sends another (if connection re-use is enabled).

HTTP/2 (which underpins gRPC) works differently.

HTTP/2 multiplexes requests over persistent connections. A client can send many requests over the same connection without waiting for responses.

This is one of the reasons gRPC provides a performance boost, but it can create unexpected load distribution issues.

If your infrastructure isn’t built for an HTTP/2 world, you’ll quickly find traffic becoming unevenly distributed.

🏗️ Infrastructure Support

In an HTTP/1 world, load balancing at the connection (Layer 4) level often works well enough. But with HTTP/2, connections live much longer and carry far more concurrent traffic.

If your load balancer distributes traffic based only on connections, a busy client may hammer a single instance while others sit idle.

Unfortunately, much of the infrastructure still doesn’t fully support HTTP/2-aware load balancing.

Depending on your environment, your load balancers or ingress controllers may operate primarily at Layer 4. That works fine for HTTP/1, but once you introduce HTTP/2 via gRPC, the effectiveness changes significantly.

⚙️ Supporting gRPC

To get the most out of gRPC, the best approach is to use infrastructure that understands HTTP/2 and load-balances requests rather than just connections.

If that’s not possible, another option is client-side load balancing.

Many gRPC clients support opening a pool of connections and distributing requests across them. You still benefit from HTTP/2’s persistent connections, but you avoid concentrating all traffic on a single backend instance.

🧠 Final Thoughts

gRPC offers many advantages, including performance, strongly typed contracts, and efficient communication. But it also introduces different networking behavior.

If you’re rolling out gRPC into production, make sure your load balancing infrastructure is ready for an HTTP/2 world.

You may be building for availability, but are you building for resiliency?

Benjamin Cane — Fri, 13 Mar 2026 00:00:00 +0000

You may be building for availability, but are you building for resiliency? Many teams design for availability. Far fewer design for resiliency.

A concept that took me a while to really grasp is that building highly available systems and highly resilient systems is not the same thing.

The difference is how the system reacts to failure.

🚄 High Availability

When you build for high availability, the goal is simple: ensure there is always another path.

If something fails, traffic can be redirected somewhere else.

For example, a service might run across multiple availability zones or regions. If one fails, traffic is routed to another.

Detecting failures and redirecting traffic are core elements of building for high availability.

Availability is about rerouting traffic when something fails.

🚂 High Resiliency

Building for resiliency is different.

The solution to failure isn’t another path; it’s how the system handles the error.

When a dependency fails, the decision becomes:

Do we retry? Do we continue without that dependency? Do we degrade functionality? Do we stop processing altogether?

Resiliency is about defining what happens when things go wrong.

Sometimes you can continue processing. Sometimes you can defer work and fix it later.

Resiliency is absorbing failure instead of avoiding it.

🧩 A Simple Example

When you design systems with resiliency in mind, you tend to treat dependencies differently.

A simple example is configuration.

Many systems use distributed configuration services so that runtime behavior can change without redeployment.

But that configuration service then becomes a dependency. To avoid turning it into a hard dependency, many systems cache the configuration in memory.

When updates occur, the system fetches the new configuration and switches only after it’s fully loaded into memory.

If configuration refresh fails, the system continues operating with the last known configuration. Transient failures don’t bring the system down.

That’s resiliency.

🧠 Final Thoughts

When I talk about non-functional requirements, you’ll hear me say:

“Highly available and resilient systems”

I separate them intentionally because the approaches are different.

Availability ensures there is always another path. Resiliency ensures the system can continue operating when failures occur.

Availability routes around failure. Resiliency survives failure. You need both.

When your coding agent doesn’t understand your project, you’ll get junk

Benjamin Cane — Fri, 06 Mar 2026 00:00:00 +0000

When your coding agent doesn’t understand your project, you’ll get junk.

Junk in, junk out.

One of the best ways to get more from agentic coding tools is to give the agent context.

The more an agent understands your project, the better its work will be.

If you ask an agent to add a method to a class, it will. It might read the file. It might infer some structure. But it won’t understand the project's intent.

If you asked a human engineer to make the same change, they would have questions.

What is the purpose of this project? How is it used? What constraints exist?

If they skipped that step, you’d get exactly what you asked for, even if it was wrong.

That’s the same challenge many face with coding agents. A lack of context means it only does what it’s told — which isn’t always what you actually need.

But when it understands a project, it operates with far more clarity.

🧙‍♂️ My “Old School” Method

Before I start serious work with an agent, I have it learn the project.

Read the docs 📚 Review the codebase ⚙️ Understand the architecture 🏙️ Learn how to build, test, and run the project locally 👩‍🔧

I even ask the agent to summarize its understanding back to me.

This started as a saved prompt, turned into a slash command, and is now a skill.

This step is a huge productivity boost.

🤖 Agents Files (`AGENTS.md`)

Over the past year, an open standard for providing agents with structured context has emerged.

Instead of prompting the agent to rediscover your project every time, document that context once — and the agent will reference it going forward.

Most modern agents support an Agents.md file and reference it during each interaction.

💽 What Goes in an Agents File?

Think of the Agents file as onboarding documentation, but for an agent.

Project context:

Purpose
Architecture
Layout
CI/CD instructions

Team context:

Code style preferences
Testing philosophy (TDD or YOLO)
Tech stack constraints

Any tribal knowledge you’d expect a new team member to learn belongs in an Agents file.

👨‍💻 Personal Agent Files

Many tools also support a personal Agents file in your home directory.

That’s where your workflow preferences live. Are you a two-space tabs person? Do you want your agent to prefer table tests?

If you have preferences you want to apply to every project, but are unique to you, they go in the personal Agents file.

🧠 Final Thoughts

Using an Agents file dramatically improves agent quality.

Even then, I still use my “learn-this” slash command — sometimes that extra context makes a difference.

If you wouldn’t drop a new engineer into a project without context, don’t do it to your agents.

You can have 100% Code Coverage and still have ticking time bombs in your code. 💣

Benjamin Cane — Fri, 27 Feb 2026 00:00:00 +0000

You can have 100% Code Coverage and still have ticking time bombs in your code. 💣

I was listening to a team recently, and an engineer was discussing how a coding agent added additional tests to a project that already had 100% code coverage.

The conversation reminded me that coverage is directional and often mistaken for quality. Just because your coverage shows 100% doesn’t mean your software is fully tested.

👨‍🏫 Understanding How Coverage Is Measured

Code Coverage measures the percentage of executable lines that run during code tests. Executed doesn’t mean well-tested.

Just because every function runs doesn’t mean it’s free of logic errors or safe.

😃 Happy Path Testing

A common challenge teams face with testing is focusing too much on the happy path.

Suppose you have a function that accepts an array. In your tests, you always pass 5 elements — because that’s the expected usage. Coverage shows all branches executed. You’re good, right?

What happens if you pass 4 elements? Or 0?

If you never test fewer than 5, how do you know? You may say: “But wait, it’s only ever called with 5 elements.” That may be true, for now.

⚠️ Protecting Against Your Future Self

Code is rarely static; someone will come along and change things. That might be you, it might be someone else.

Eventually someone changes that function. Will they add tests for new edge cases? Maybe. Assume they won’t.

When you write tests, don’t just focus on how you know a function is going to be used; also include tests that misuse the function.

Rather than sending an array with 5 elements, send one with 4, 0, and send a nil value.

Rather than sending strings that match an expected pattern, send junk that doesn’t.

Does the function still behave correctly? Should it?

The more you test outside the happy path, the more resilient your code becomes — and the less likely it is to break later.

🧠 Final Thoughts

Code coverage is a guide, don’t let it give you false confidence. Test the happy path, and the unexpected ones. Validate function outputs against the input you provide.

100% Coverage is easy. Writing reliable code is not.

Getting More Out of Agentic Coding Tools

Benjamin Cane — Fri, 20 Feb 2026 00:00:00 +0000

Are you getting the most out of Agentic Coding Tools?

Software engineering is changing fast.

Agentic coding tools became widely available last year, and if you’re not using them today, you’re already behind. But many still struggle to move beyond the “fancy chat” experience.

Just like any tool in our engineering tool belts, knowing how to use it effectively matters.

🤖 Agents Are More Than A Better Chat

Last year, most were using tab-complete with a useful chat interface where you could ask questions, get suggestions, and maybe copy/paste into your code.

But agents can do much more than make suggestions — they can understand your codebase and act.

Instead of asking an agent:

“Can you suggest additional tests?”

Tell your agent:

“Create additional test cases, then run make tests and validate they pass.”

An agent can create tests, run them, inspect failures, adjust the implementation, and re-run the suite until it passes.

This isn’t about suggestions anymore; agents have more autonomy.

I think of coding agents as assistants working toward a shared goal. They do some work, you do some, and you iterate together.

🏆 Moving from Direction to Outcomes

A big mental shift is moving away from simple directions to defining an outcome with guidance & guardrails.

Agents don’t just perform a single task; they can execute multiple steps (and even parallelize them). You don’t need to spoon-feed each directive one by one.

Instead, define the outcome you want, along with guidance and guardrails.

The clearer you are on the outcomes, constraints, and context around what you are trying to do, the better the agent will perform.

📋 Examples: Real-world tasks I’ve asked Agents to handle

“Using the existing DB Driver X as a reference, create a set of table tests for driver Y. The tests should be structured similarly to the existing driver, surface any logic issues, concurrency issues, and act as a clear insurance against the defined interface.”

“Update CI workflows to Go 1.26.0, find and update any references to 1.25.6, then run tests to ensure everything still builds and passes”

I also use agents for mundane work like git commits and opening pull requests. They consistently produce better commit messages and PR descriptions than I would.

Agents don’t always get it exactly right, but with a bit of feedback and occasional adjustment, you can get a lot done quickly.

Avoid going down the rabbit hole of endless refinement, sometimes it’s better to reset with a clearer prompt.

👨‍🏫 Context is Key

If you want the best results from agents, you need to give them context.

Before I do serious work on a project, I have the agent:

Read the Docs 📚
Review the Architecture 🏙️
Understand the Project Structure 📐
Understand how to build, test, and run the application locally 👩‍🔧

The same steps that a human would take. Agents are no different.

(I’ll dive deeper into Agent files, skills, and effective ways to provide more context in a future post)

🧠 Final Thoughts

Engineers are doing amazing things with agents, and new capabilities are being added daily. But you don’t need to be at the bleeding edge to get more out of them (I certainly am not).

Don’t worry about the hype. Understand what these tools can do, making small adjustments in how you use them can drastically change what you get back.

DEV Community: Benjamin Cane

YOLO Is a Terrible Strategy for Validating Production Changes

🥶 Cold Reality

💡 Better Ways to Validate Changes

Canary Releases 🐤

Shadow Traffic 🪞

Synthetic Traffic 🤖

Smoke Tests 😶‍🌫️

🧠 Final Thoughts

Deterministic routing is one of the most effective ways distributed systems reduce consistency problems at scale

🤔 Understanding the Problem

⚠️ Why Replication Is Not Enough

Asynchronous Replication

Synchronous Replication

🔀 Route Requests to the Data

👨‍🏫 How Deterministic Routing Works

🗺️ Where Routing Logic Lives

Client-side Routing

Proxy / Router Tier

Server-side Forwarding

🧰 Routing Does Not Replace Replication

🧠 Why This Matters Beyond Databases

When you think of microservices, you probably think of centralized shared services. But there's another valid pattern that is...

🧩 How It Usually Works

🏗️ How We Got Here

🎯 Why Centralization Became the Default

⚠️ The Downside of Centralization

🔁 Another Option

🧪 Caveats from Experience

1️⃣ Strong Ownership

2️⃣ Pick the Right Capabilities

3️⃣ Operational Discipline

💭 Final Thoughts

Are you using traffic mirroring in production? If not, try it out.

🚦 What is Traffic Mirroring?

🧪 Why It’s Powerful

1️⃣ Shadow Traffic for Safe Testing

2️⃣ Out-of-Band Traffic Inspection

😶‍🌫️ Reality Check

🧠 Final Thoughts

Agent Skills Are Becoming the Best Way to Capture Institutional Knowledge

📚 Agent Skills as Knowledge Distribution

🧠 Why This Matters

⚠️ The Tradeoff (For Now)

🧠 Final Thoughts

Saved Prompts Are Dead. Agent Skills Are the Future.

⚙️ Agent Skills

🧠 More Than Just Prompts

⚠️ A Word of Caution

🧠 Final Thoughts

Generating Code Faster Is Only Valuable If You Can Validate Every Change With Confidence

🔍 What Good Looks Like

🔁 On Pull Request

Code Tests (Unit Tests)

Service-Level Functional Tests

Platform-Level Functional Tests

🌙 Nightly Testing

🧠 Final Thoughts

When You Go to Production with gRPC, Make Sure You’ve Solved Load Distribution First

🚦 HTTP/1 vs. HTTP/2

🏗️ Infrastructure Support

⚙️ Supporting gRPC

🧠 Final Thoughts

You may be building for availability, but are you building for resiliency?

🚄 High Availability

🚂 High Resiliency

🧩 A Simple Example

🧠 Final Thoughts

When your coding agent doesn’t understand your project, you’ll get junk

🧙‍♂️ My “Old School” Method

🤖 Agents Files (AGENTS.md)

💽 What Goes in an Agents File?

👨‍💻 Personal Agent Files

🧠 Final Thoughts

You can have 100% Code Coverage and still have ticking time bombs in your code. 💣

👨‍🏫 Understanding How Coverage Is Measured

😃 Happy Path Testing

⚠️ Protecting Against Your Future Self

🧠 Final Thoughts

Getting More Out of Agentic Coding Tools

🤖 Agents Files (`AGENTS.md`)