Manoj Sharma

Posted on Apr 3

The Engineering Guardrails We Need After AI Started Writing the Code

#ai #programming #software #productivity

Today, AI removes the typing cost. You ship faster, tests are green, code reviews pass. But it also removes the natural pauses where as an engineer you normally think through the hard parts — and this is where you should not be on autopilot:

retries
idempotency 👑
network timeouts
concurrency
rich suite of tests
observability

The result is: Reasoning Debt — and it keeps increasing.

The code works. But nobody can explain why it was written that way, and more importantly, nobody knows what it does when things go wrong.

Quietly similar to technical debt, but different:

Technical debt 👉 messy implementation
Reasoning debt 👉 unclear intent under failure

In production you need reasoning — and one day it breaks, and you find yourself in an endless debugging session with 2 or 3 more engineers trying to understand why you wrote what you wrote. This can silently kill your code reading and reasoning skills over time.

But we can set up a good set of engineering guardrails. And honestly, AI is getting more capable every day — we may see it handling these concerns by default in the near future. Until then, let's talk about a few things worth focusing.

Guardrail #1 — Fully Covered Tests Become Code Reasoning Documentation

AI can generate logic quickly, but intent is rarely obvious without tests. Consider this user eligibility logic:

public boolean eligible(User user) {
    return user.isActive()
        && user.getBalance() > 1000
        || user.isPremium();
}

Looks reasonable. But questions appear immediately:

Does isPremium() bypass the balance check entirely?
Does an inactive premium user get in?
Was operator precedence here intentional or accidental?

Without tests this becomes reasoning hell later. The fix isn't just adding parentheses — this is a business rule encoded as an expression, and the expression is ambiguous. Make sure you instruct AI to generate intent-covering tests, not just happy path ones.

Added tests turn this into a code manual:

@Test
@DisplayName("Premium users are eligible regardless of balance")
void premiumUserEligibleRegardlessOfBalance() {}

@Test
@DisplayName("Inactive users are never eligible, even if premium")
void inactiveUserNotEligible() {}

@Test
@DisplayName("Active non-premium users need sufficient balance")
void activeUserWithBalanceEligible() {}

Now the logic has to be unambiguous, because the tests force you to state every case explicitly. And when someone asks "what was this supposed to do?" — the tests answer that faster than any comment.

Guardrail #2 — AI Writes Happy Paths. Systems may fail Unexpectedly.

AI naturally writes linear success flows:

public void processOrder(String orderId) {
    payment.capture(orderId);
    inventory.reserve(orderId);
    shipping.create(orderId);
}

Completely correct... until retries happen. Higher environments sometimes introduce retries that catch you off guard — and if not handled, you may end up with significant drift that's not easy to rollback:

SQS retries
Kafka re-delivery
HTTP timeouts
Client retries after 502 or 409

Now imagine a failure mid-flow:

Attempt 1:
  payment.capture()   → SUCCESS ✅
  inventory.reserve() → SUCCESS ✅
  shipping.create()   → TIMEOUT ❌

Queue retries the message.

Attempt 2:
  payment.capture()   → CAPTURED AGAIN 💸
  inventory.reserve() → RESERVED AGAIN 📦
  shipping.create()   → SUCCESS ✅

Now you have:

double charge incidents
inventory drift
duplicate shipments
reconciliation nightmares

The missing idea is usually idempotency — not logic. You must protect the state, and this is something worth explicitly instructing AI to reason about. A simple attempt:

public void processOrder(String orderId) {
    if (orderStateRepository.isProcessed(orderId)) {
        log.info("Order {} already processed, skipping", orderId);
        return;
    }

    if (!payment.isAlreadyCaptured(orderId)) {
        payment.capture(orderId);
    }

    if (!inventory.isReserved(orderId)) {
        inventory.reserve(orderId);
    }

    if (!shipping.exists(orderId)) {
        shipping.create(orderId);
    }

    orderStateRepository.markProcessed(orderId);
}

Honestly, this is subjective — some engineers naturally think about it and instruct upfront, others don't. But it should always be on your brain map.

Guardrail #3 — Instruct AI to Test System Behaviour, Not Just Code Interaction

Without explicit instruction, AI tends to validate code interaction using mocks instead of actual system behaviour.

Problem 1 — Mocking Infrastructure Clients (e.g. OpenSearch)

@Test
void shouldIndexOrder() {
    OpenSearchClient client = mock(OpenSearchClient.class);
    SearchService service = new SearchService(client);

    service.index("order-123");

    verify(client).index(any());
}

Test proves: Client method was called.

Test does not prove:

index exists
mapping is correct
serialization works
AWS config works
search actually works

This is testing Java interactions, not search behaviour.

Problem 2 — Mocking DB Interaction (When You Have Liquibase)

class OrderRepositoryTest {

    @Mock
    OrderRepository repo;

    @Test
    void shouldSaveOrder() {
        repo.save(new Order("123"));
        assertTrue(repo.findById("123").isPresent());
    }
}

Looks fine. Hidden problems:

Liquibase never executed
schema never validated
constraints never validated
JSON columns may fail in production
migrations may fail at startup

Instruct AI to Write Real Integration Tests

Point AI toward Testcontainers — it knows about it, it just won't reach for it unless you ask:

@Testcontainers
@SpringBootTest
class OrderSystemIT {

    // spring.liquibase.change-log: classpath:db/changelog.xml
    @Container
    public static PostgreSQLContainer<?> postgres =
        new PostgreSQLContainer<>("postgres:15");

    @Container
    public static LocalStackContainer localstack =
        new LocalStackContainer(DockerImageName.parse("localstack/localstack"))
            .withEnv("SERVICES", "opensearch");

    @Autowired
    OrderService orderService;

    @Test
    void shouldCreateOrderAndSearch() {
        orderService.createOrder("order-123");

        List<Order> results = orderService.search("order-123");

        assertThat(results).hasSize(1);
    }
}

Now you're testing persist and search behaviour — not Java mocks.

Guardrail #4 — Instruct AI to Log With Context, Not Just Confirmation

AI generated logging looks completely reasonable on the surface:

public void processPayment(String orderId, BigDecimal amount) {
    log.info("Processing payment for order {}", orderId);
    Payment result = payment.process(orderId, amount);
    log.info("Payment processed for order {}", orderId);
}

When we Ship it to production, hit a failure, open our log aggregator. What we find:

Processing payment for order 123
Processing payment for order 234
Processing payment for order 345

No error. No context. No idea which order failed, what amount was involved, what the gateway returned, or whether a retry is safe.

Instruct AI to write structured logs with failure context from the start:

public void processPayment(String orderId, BigDecimal amount) {

    log.info("Payment processing started",
        kv("orderId", orderId),
        kv("amount", amount)
    );

    try {
        Payment result = payment.process(orderId, amount);

        log.info("Payment captured",
            kv("orderId", orderId),
            kv("transactionId", result.getTransactionId()),
            kv("duration", result.getDuration())
        );

    } catch (CustomException e) {
        log.error("Payment capture failed",
            kv("orderId", orderId),
            kv("errorCode", e.getErrorCode()),
            kv("retryable", e.isRetryable())
        );
        throw e;
    }
}

With structured logs, aggregator can answer "how many non-retryable payment failures in the last hour, by error code?".

Final Thoughts

None of these are new concepts. Idempotency, intent-driven tests, integration testing, structured observability — senior engineers have known these for years.
With AI, The speed is real. Just don't let it replace the thinking.
Write the guardrails into your prompts, your PR reviews, your definition of done. AI will follow — it just won't lead.

DEV Community