DEV Community

Sasha Podlesniuk
Sasha Podlesniuk

Posted on • Originally published at Medium on

Making Coding Agent Verification Fast, Accurate, and Lightweight with Architectural Tests

When you build software with the help of an LLM, your workflow naturally splits into two complementary phases: guiding and checking.

Guiding is all about context-setting. You feed the model prompts, examples, architectural rules, and any tribal knowledge your team has accumulated over the years. You’re essentially teaching the LLM how things should be done.

Checking is where you verify the model’s output through human review, automated guardrails, or even an additional “verification prompt” that asks the LLM to critique its own work.

Today, let’s zoom in on this second phase and explore how it can help balance an entire AI-assisted development workflow by using architectural unit tests.

Architectural unit tests help ensure that a project is built in a consistent, sensible way. Instead of testing what the code does, they check that the code is put in the right folders, depends on the right things, and follows the patterns the team agreed on.

A Typical Setup: Layers, Rules, and the Chaos Between Them

Imagine a fairly standard architecture inside your org:

  • Controllers, Events, Background Jobs: Your main entry points.
  • Application & Domain layers: Where all your business logic lives.
  • Infra / Repositories / Persistence : The nuts and bolts behind the scenes.
  • A forest of internal rules : Dependency directions, DTO requirements, guidance on connecting controllers, and dozens of conventions born from real-world constraints.

This is the system your LLM must navigate, consistently, predictably, and without going rogue.

Naturally, the first instinct is:

Let’s just write all this down.

So you add everything into a markdown file. If you’re using Claude Code, maybe it’s CLAUDE.md. Eventually you introduce something like skills to structure that guidance better.

And you know what?

It does work… at first.

But as your rulebook grows, the cracks start to show.

When Guidance Starts Fighting Itself

Anyone who has used LLMs for code generation knows this moment well: the instruction you thought was perfectly clear gets ignored or worse, overwritten by another instruction you forgot you wrote.

Some real examples from my setup:

  • I had a long description explaining that our application must use CQRS (commands and queries) instead of Service classes. For example, instead of:
public class SomeController {
  private final UserService userService;

  public ResponseEntity<User> getUser(String userId) {
    return userService.get(userId);
  }
}
Enter fullscreen mode Exit fullscreen mode

Should be:

public class SomeController {

  private final Mediator mediator;

  public ResponseEntity<User> getUser(String userId) {
    return mediator.send(new GetUserById(userId));
  }
}
Enter fullscreen mode Exit fullscreen mode
  • I added another block clarifying that all controller responses must use DTOs , never domain objects, for instance:
public class SomeController {
  private final Mediator mediator;

  public ResponseEntity<UserDto> getUser(String userId) {
    return mediator.send(new GetUserById(userId)).map(UserDto::fromUser);
  }
}
Enter fullscreen mode Exit fullscreen mode
  • Later, after issues popped up, I added rules forbidding cross-package communication within the application layer (no handler-to-handler commands):
public class GetUserByIdHandler implements Handler < > {
  private final Mediator mediator; // NOT ALLOWED

  public User handle() {
    mediator.send(new AnotherCommand()); // NOT ALLOWED
  }
}
Enter fullscreen mode Exit fullscreen mode

Each addition fixed something… but also increased the cognitive load on both me and LLM.

To make things more complex, in a typical mixed codebase, where old implementation still coexist with a new one (Service classes live together with CQRS), the model often sees both patterns and becomes unsure which one to follow.

Soon you’re in a loop of:

I’ll add one more rule; that should fix it.

Until the rulebook becomes a labyrinth.

Guidance Alone Isn’t Enough

At some point, scaling guidance becomes expensive both in tokens and in mental overhead. What you really need is a fast, deterministic way to verify output and give the LLM immediate feedback.

Ideally something:

  • Predictable (clear pass/fail)
  • Cheap (no giant prompt blocks)
  • Quick to run (so feedback can be automated)

And this is where an unexpected hero enters.

Using Architectural Tests as LLM Feedback Signals

Architectural testing tools like ArchUnit were originally designed to keep software engineers from violating architectural constraints.

Turns out, they are excellent at correcting LLMs, too.

Instead of endlessly describing rules in prose, you can:

  1. Encode the rule as an architectural test.
  2. Have the LLM generate code.
  3. Run the tests.
  4. Feed the failures back to the LLM.

Now the model isn’t guessing what you meant; it is responding to a precise, machine enforced contract.

This shifts the system from:

LLM tries to remember everything

to

LLM generates a proposal → tests verify correctness → LLM fixes only what failed.

Suddenly, your guidance becomes lighter, your rulebook becomes smaller, and your architecture becomes much easier for the model to follow.

What Exactly Can We Enforce? A Practical Tour of Rules

Now let’s look at the kinds of architectural rules we can enforce and how they help us keep an LLM on the rails.

For the examples below, we’ll use a Spring Boot project paired with ArchUnit tests.

We’ll tackle the same challenges described in the earlier section.

1. Eliminating Service Packages (to Push CQRS)

One of the recurring issues was the LLM drifting back toward a traditional Service Layer approach often by creating a brand new service package out of thin air.

A simple rule shuts that down immediately:

@Test
@DisplayName("Classes should not reside in 'service' packages")
void noClassesInServicePackages() {
  noClasses()
    .should().resideInAPackage("..service..")
    .because("use 'application' package with mediator instead of 'service'")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This is small but incredibly effective.

If the LLM tries to “helpfully” reintroduce Service classes, this test fails instantly steering it back toward the CQRS pattern.

This was reinforced with one more constraint:

private static final String[] ALLOWED_ROOT_PACKAGES = {
  "..application..",
  "..controller..",
  "..config..",
};

@Test
@DisplayName("Classes must reside in allowed root-level packages only")
void classesResideInAllowedPackagesOnly() {
  classes()
    .should().resideInAnyPackage(ALLOWED_ROOT_PACKAGES)
    .because("enforces standard package structure: application, controller, config")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This guards the root package from accumulating unexpected folders like utils, common, manager, or anything else the LLM might creatively invent.

2. Restricting the Controller Layer

Controllers should depend on only two kinds of things:

  1. DTOs: Plain transport objects with no DI annotations.
  2. Controller classes: Annotated with @RestController and extending your internal ApiController abstraction (which provides the mediator for sending commands/queries).

Anything else — domain objects, application handlers, repository types — is forbidden.

You can enforce that with a rule like:

@Test
@DisplayName("Controller package should only contain DTOs and RestControllers")
void controllerPackageOnlyContainsDtosAndRestControllers() {
  noClasses()
    .that().resideInAPackage("..controller..")
    .and().areNotAnnotatedWith("org.springframework.web.bind.annotation.RestController")
    .should().notBeAnnotatedWith("org.springframework.stereotype.Service")
    .orShould().notBeAnnotatedWith("org.springframework.stereotype.Component")
    .orShould().notBeAnnotatedWith("org.springframework.stereotype.Repository")
    .because("non-controller classes must be plain DTOs without DI annotations")
    .check(importedClasses);
}

@Test
@DisplayName("RestControllers must extend ApiController")
void restControllersMustExtendApiController() {
  classes()
    .that().areAnnotatedWith("org.springframework.web.bind.annotation.RestController")
    .should().beAssignableTo("com.example.ApiController")
    .because("controllers must extend ApiController to access the mediator")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This single constraint prevents the LLM from “accidentally” injecting half your system into the controller layer.

3. Enforcing CQRS in the Application Layer

All public classes in the application package should follow CQRS patterns:

  • Commands / Queries
  • CommandHandlers / QueryHandlers

Everything else should remain package-private.

A rule might look like this:

@Test
@DisplayName("Public classes in application layer must implement CQRS interfaces")
void applicationLayerPublicClassesMustImplementCqrs() {
  classes()
    .that().resideInAPackage("..application..")
    .and().arePublic()
    .and().areNotInterfaces()
    .should().beAssignableTo("com.example.Command")
    .orShould().beAssignableTo("com.example.Command.Handler")
    .because("public classes must be Commands or Handlers - catches accidental UserService or OrderManager")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This automatically catches cases where an LLM introduces a UserService or OrderManager, especially if it makes it public to expose it to controllers.

4. Preventing Application → Application Communication

Another common violation: the LLM tries to send commands or queries between application packages, e.g., user handlers calling invoice handlers.

That breaks the vertical-slice structure.

ArchUnit allows us to forbid this and immediately suggest a fix:

@Test
@DisplayName("Application layer should not import Pipeline")
void applicationLayerShouldNotImportPipeline() {
  noClasses()
    .that().resideInAPackage("..application..")
    .should().dependOnClassesThat().haveFullyQualifiedName("com.example.Mediator")
    .because("Application handlers should not call other handlers via Mediator. FIX: consider using domain events")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This turns a subtle architecture rule into a crisp, enforceable contract.

5. Ensuring Controllers Return DTOs Only

A bonus check validates that controller return types never leak domain entities or events:

@Test
@DisplayName("Controller methods should not return domain entities or events")
void controllerMethodsShouldOnlyReturnDtos() {
  ArchCondition < JavaMethod > notReturnDomainOrEvents = new ArchCondition < > ("not return domain or event types") {
    @Override
    public void check(JavaMethod method, ConditionEvents events) {
      method.getReturnType().getAllInvolvedRawTypes().stream()
        .filter(type -> type.getPackageName().contains(".domain.") ||
          type.getPackageName().contains(".events."))
        .forEach(type -> events.add(SimpleConditionEvent.violated(
          method,
          String.format("%s returns forbidden type %s", method.getFullName(), type.getName())
        )));
    }
  };
  methods()
    .that().areDeclaredInClassesThat().areAnnotatedWith("org.springframework.web.bind.annotation.RestController")
    .and().arePublic()
    .should(notReturnDomainOrEvents)
    .because("controllers must return DTOs only - prevents leaking domain internals")
    .check(importedClasses);
}
Enter fullscreen mode Exit fullscreen mode

This rule alone helps maintain API stability and prevents accidental exposure of internals.

And of course, this is only the beginning. You can add many more invariant checks such as naming conventions, allowed dependency flows, type usage rules, and other architectural guarantees that help keep the system consistent and easier for the LLM to navigate.

Real Feedback the LLM Receives

With these tests in place, mistakes become precise, actionable feedback:

CASE 1: LLM creates a service package with a service class

Feedback:

  • Unexpected package service detected
  • Only allowed root packages are A, B, C
  • Service classes are forbidden, use application

CASE 2: LLM adds a public Service inside application

Feedback:

  • Public class violates application-layer rules
  • Internal abstractions are allowed, but public Service classes are not

CASE 3: LLM sends a command from one application slice to another

Feedback:

  • Application → Application communication is not allowed
  • Use Domain Events instead of cross-slice commands

6. Preventing LLMs from Modifying the ArchUnit Tests Themselves

If you want coding agents to work autonomously, you need to guard the tests from being “helpfully fixed” by the model.

For example, a simple Claude Code hook can block edits to test files:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | grep -qE '/architecture/.*Test\\.java$' && exit 2 || exit 0"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Engineers can manually update tests; the agent should not.

Handling Mixed Codebases with Freezing Rules

Many teams operate in a mixed codebase: part legacy and part modern. In our case, the new approach is CQRS, while the existing implementation still uses traditional Service classes. Fixing everything at once is not realistic and exposing both implementations to the LLM creates confusion. The model sees two competing patterns and does not always pick the correct one.

Freezing Rules help stabilise this situation by letting you introduce new architectural constraints without breaking existing code or overwhelming the LLM with conflicting signals.

ArchRule rule = FreezingArchRule.freeze(
  classes().should(). /* your actual ArchRule */
);
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

This architecture-focused approach gives you:

  • A tight feedback loop
  • Deterministic, test-driven constraints
  • Cleaner prompts (smaller CLAUDE.md / SKILLS files)
  • A more predictable LLM-assisted workflow
  • Less “LLM drift” into legacy patterns

It will not verify everything, for example ensuring that a repository query has the right index would require deeper static or runtime analysis. But it significantly reduces architectural mistakes while keeping guidance lean.

What’s Next?

ArchUnit also exposes metrics.

Next time, we can explore whether those metrics can help keep autonomously generated codebases:

  • clean,
  • analysable,
  • and maintainable

…for both software engineers and coding agents.

Top comments (0)