Sasha Podlesniuk

Posted on Jan 5 • Originally published at Medium on Dec 18, 2025

From Prompts to Invariants: Re-architecting Systems with ArchUnit and LLMs

#agents #architecture #ai #softwareengineering

Architectural drift is a slow tax on every codebase.

Layers blur. Boundaries erode. Patterns decay. Over time, what started as intentional design becomes accidental structure.

AI coding agents and autonomous loops promise a way out: give the agent a task, let it iterate, and stop when it “looks done”. This works well for local refactors and mechanical changes.

But architectural change is different.

Architecture is not a one-off transformation.

It is a target state defined by invariants.

What if we could run an autonomous loop, but instead of relying on natural language prompts to guide it, we defined the desired architectural outcome as executable constraints?

That is where ArchUnit becomes the control mechanism for autonomous architectural change.

Reframing Autonomous Loops

In the Ralph Wiggum autonomous loop technique — a lightweight pattern for running LLM agents in self-correcting cycles — an agent repeatedly iterates on a task with minimal human intervention. The loop typically looks like this:

Give the agent a task description
Let it modify the code
Evaluate the result (tests, output, heuristics)
Repeat until a stop condition is met

The weak point is step 3.

For architectural tasks, “did it work?” is rarely a binary answer. The agent must infer intent from prompts and guess whether the desired architectural state has been reached.

We can keep the loop, but replace the stop condition.

Instead of asking:

Does this look architecturally correct?

we ask:

Do the architectural tests pass?

When the stop condition is defined by ArchUnit tests, the loop becomes deterministic.

Architecture as the Loop Condition

In this model, ArchUnit tests define the desired architectural change , not just static restrictions.

The loop looks like this:

Write an ArchUnit test that expresses the target architecture
Run the test suite (it fails)
Ask the agent to make the tests pass
Repeat until all architectural tests are green

This is an autonomous loop, but one that converges on explicit invariants rather than emergent behavior.

The agent is not discovering architecture.

It is satisfying constraints.

Why This Is Different from “Using ArchUnit for Rules”

This is not just about enforcing rules after the fact.

The key shift is using ArchUnit tests as the primary driver of change :

The test describes where the system should go
Failures show what still violates the target state
Passing tests mean the architectural migration is complete

In other words, the ArchUnit test is the prompt, the feedback mechanism, and the stop condition.

Example 1: Driving a Shift from Unit Tests to Integration Tests

Suppose you want to move from unit tests with mocked repositories to integration tests.

Instead of telling an agent:

“Refactor tests to stop mocking repositories”

You encode the desired end state:

@Test
void repositoriesShouldNotBeMockedInTests() {
    JavaClasses importedClasses = new ClassFileImporter()
            .importPackages("com.example.demo");

    fields()
        .that().areAnnotatedWith(org.mockito.Mock.class)
        .should().haveRawType(not(resideInAPackage("..repository..")))
        .because("repositories should not be mocked - use integration tests instead")
        .allowEmptyShould(true)
        .check(importedClasses);
}

This test initially fails.

Now the autonomous loop is trivial:

Run tests
See the failure
Ask the agent to fix the violations
Repeat

The loop ends naturally when no repository mocks remain.

Example 2: Migrating Stripe Usage with a Test-Driven Loop

Consider a system that uses Stripe’s static API configuration:

Stripe.apiKey = "sk_test_123";
Charge charge = Charge.create(params);

You want to migrate to explicit client usage:

StripeClient client = new StripeClient("sk_test_123");
client.v1().charges().create(params);

Instead of instructing the agent to “refactor Stripe usage”, you define the invariant:

@Test
void stripeStaticApiCallsMustNotBeUsed() {
    JavaClasses importedClasses = new ClassFileImporter()
            .importPackages("com.example.demo");

    DescribedPredicate<JavaCall<?>> isStripeStaticCall =
        DescribedPredicate.describe(
            "static method in com.stripe.model (excluding builder())",
            call ->
                call.getTarget().getOwner().getPackage().getName().startsWith("com.stripe.model")
                && call.getTarget().resolveMember()
                    .map(member -> member.getModifiers().contains(STATIC))
                    .orElse(false)
                && !call.getTarget().getName().equals("builder")
        );

    noClasses()
        .should().callCodeUnitWhere(isStripeStaticCall)
        .because("Stripe static API calls should not be used - inject StripeClient instead")
        .check(importedClasses);
}

This test now defines the migration target.

The autonomous loop runs until every static Stripe call is eliminated.

What the Agent Actually Does

From the agent’s perspective, this is ideal:

Failures point directly to violating code
because(...) explains the expected fix
The search space narrows with every iteration

There is no need for the agent to understand Stripe, testing philosophy, or architectural intent in abstract terms.

It only needs to make the build green.

Benefits of Test-Driven Autonomous Architecture

Using ArchUnit as the loop condition gives you:

Deterministic stopping criteria
Predictable convergence
Clear auditability for human reviewers
Architecture that stays enforced after the migration

Most importantly, the architecture does not live in prompts or documentation.

It lives in executable tests.

Closing Thought

Autonomous loops are powerful, but architecture requires precision.

By defining the desired architectural change as ArchUnit tests, we can keep the loop while removing ambiguity. The agent iterates, the tests judge, and the system converges on a clearly defined target state.

This approach is not without trade-offs. Architectural invariants can be over-constrained, hard to express, or temporarily at odds with incremental change. Poorly designed tests can block valid evolution just as easily as they can prevent drift. The discipline shifts from writing better prompts to designing better constraints — and those constraints still require human judgment.

This is not replacing human architectural thinking.

It is making architectural intent explicit, reviewable, and enforceable — so machines can apply it reliably, and humans can reason about it clearly.