DEV Community

Cover image for Reviewing AI-Generated Code: Check Boundaries Before Logic
Pablo Ifrán
Pablo Ifrán

Posted on • Originally published at elpic.Medium

Reviewing AI-Generated Code: Check Boundaries Before Logic

The code came back. It looks right. Do you ship it?

Post 2 gave you structured prompts that produce better AI output. But "better" isn't "perfect." Even a well-constrained prompt will occasionally slip a database call into an application service, or sneak a business rule into a route handler. You still have to review the diff.

Reviewing AI-generated code the same way you'd review a colleague's PR is slow, frustrating, and misses the real failures. Here's a workflow that's actually fast and checks the right things first.


Why Reviewing AI Code Is Different

When a colleague sends you a PR, you can ask them questions. You can understand their intent.

With AI output, you can't. The AI made decisions about where code lives, what it imports, how it structures logic based on pattern-matching from its context window. It had no architectural intent. It was optimizing for "generates code that compiles and passes the tests I can see."

With a colleague's code, you're asking: Does this solve the problem correctly?

With AI-generated code, you need to ask that but first: Does this code belong where it landed?

Logic bugs are easy to find. Boundary violations are quiet. A function that calculates the wrong total will fail a test. A business rule in the wrong layer will pass all the tests you wrote alongside it and only surface as a problem when you try to change something six months later.

The dangerous failures are structural, not logical.


The Review Order

Deliberately backward from how most code reviews go:

1. Check the file — does it belong in this layer?
2. Check the imports — does it import only what that layer allows?
3. Check the interface — does it match the contract you specified?
4. Check the body — is there logic here that shouldn't be?
5. Check the logic — is it actually correct?
Enter fullscreen mode Exit fullscreen mode

Most reviews go 5 → 4 → 3 → 2 → 1, or skip to 5 entirely.

Steps 1-4 take about 30 seconds combined. Step 5 can take minutes. If the code fails steps 1-4, the logic review is wasted you're regenerating anyway.

Spend 30 seconds on structure first.

The Review Order — Inverted Priority


The Five Structural Signals

The Five-Signal Review Card

Signal 1: The Wrong Import

Look at the import block. Does this file import from a layer it shouldn't depend on?

# application/order_service.py - WRONG
from fastapi import HTTPException        # ← API layer in application layer
from sqlalchemy.orm import Session       # ← infrastructure in application layer
from api.models import PlaceOrderRequest # ← importing from the layer above
Enter fullscreen mode Exit fullscreen mode

None of these cause a runtime error. The code works. But now you can't test the service without a database session, and you can't reuse it outside FastAPI.

# application/order_service.py — correct
from domain.models import Order, LoyaltyAccount
from domain.ports import OrderRepository, CustomerRepository
Enter fullscreen mode Exit fullscreen mode

Signal 1 — The Wrong Import Anatomy

What to do: Scan the first 10 lines. Wrong import? Stop. Don't review the logic. Ask the AI to regenerate with the explicit constraint.

Signal 2: Logic in the Wrong Place

The AI puts logic wherever it fits syntactically. Here's a loyalty points check that shouldn't be in the route:

# api/orders.py — business rule leaked into route handler
@router.post("/")
def place_order(body: PlaceOrderRequest, service: OrderService = Depends(get_order_service)):
    # This validation is a business rule — it belongs in domain/
    if body.redeem_points and body.loyalty_points < 100:
        raise HTTPException(status_code=400, detail="Not enough points to redeem")

    order = service.place_order(...)
    return {"order_id": order.id}
Enter fullscreen mode Exit fullscreen mode

That condition "must have at least 100 points" is a business rule. When it changes, you'll find it in the route handler, not the domain. And if you added a mobile API endpoint last week, there's a copy there too.

What to do: Look for conditionals in route handlers that involve domain concepts. If the condition only makes sense when you understand the business domain not just the HTTP request it's in the wrong place.

Signal 3: The Invented Abstraction

The AI introduces a new pattern that doesn't exist anywhere else in your codebase:

# First time this decorator appears anywhere
@router.post("/")
@validate_order(require_items=True, require_customer=True)
def place_order(body: PlaceOrderRequest, service: OrderService = Depends(get_order_service)):
    ...
Enter fullscreen mode Exit fullscreen mode

Locally clean. But now you have a pattern that needs to be understood, maintained, and either adopted everywhere or deleted. The AI introduced it because it was convenient not because it fits your codebase's patterns.

What to do: Ask: "Does anything else in this codebase do this?" If the answer is no, consider simplifying. Not always wrong but it should be a conscious choice.

Signal 4: The Silent State Mutation

Looks pure, isn't:

# domain/models.py looks like it returns a new LoyaltyAccount
def earn(self, spend_amount: float) -> "LoyaltyAccount":
    self.points += int(spend_amount * self.POINTS_PER_DOLLAR)  # ← mutation
    return self                                                  # ← returns self
Enter fullscreen mode Exit fullscreen mode

Compare to correct:

def earn(self, spend_amount: float) -> "LoyaltyAccount":
    earned = int(spend_amount * self.POINTS_PER_DOLLAR)
    return LoyaltyAccount(points=self.points + earned)  # ← new instance
Enter fullscreen mode Exit fullscreen mode

Signal 4 - Pure vs. Mutating Method

The mutating version passes every test the AI wrote alongside it. It fails the immutability test but only if you wrote that test before reviewing the code.

What to do: For any domain method the prompt said should be "pure", check the return statement. Returns self → mutating. Creates a new instance → pure. Five-second check.

Signal 5: The Missing Error Path

# application/order_service.py happy path works, error path is silent
def place_order(self, customer_id: str, items: list, redeem_points: bool) -> Order:
    customer = self.customer_repo.find_by_id(customer_id)
    # What if customer is None? AttributeError waiting to happen.
    loyalty = LoyaltyAccount(points=customer.loyalty_points)
    ...
Enter fullscreen mode Exit fullscreen mode

When find_by_id() returns None, this crashes with an AttributeError. The correct behavior is raise KeyError(customer_id) so the route handler can return a 404. The AI wrote the happy path and left the error case unhandled.

What to do: For every data fetch in a service method, ask: "What happens if this returns nothing?" The answer should be raise KeyError(id). If it's not there, add it.


The Workflow in Practice

You asked for GET /orders/{order_id}. The AI generates:

# api/orders.py — first attempt
@router.get("/{order_id}")
def get_order(order_id: str, db: Session = Depends(get_db)):
    order = db.query(Order).filter_by(id=order_id).first()
    if not order:
        raise HTTPException(status_code=404)
    customer = db.query(Customer).filter_by(id=order.customer_id).first()
    loyalty_balance = customer.loyalty_points if customer else 0
    return {
        "order_id": order.id,
        "status": order.status,
        "total": order.total,
        "loyalty_points": loyalty_balance,
    }
Enter fullscreen mode Exit fullscreen mode

Signal 1 check: db: Session = Depends(get_db) route is importing the database session directly. Infrastructure in the API layer.

Stop. You don't need to read the rest. Send a corrected prompt:

Revise GET /{order_id} in api/orders.py.

Route should call service.get_order(order_id) and return the result.
Convert KeyError to 404. No direct DB calls. Route body: 10 lines or fewer.

Also add get_order(order_id: str) to application/order_service.py.
Fetch order and customer via repos. Raise KeyError if order not found.
Return a dict with order_id, status, total, loyalty_points.
No FastAPI. No SQLAlchemy types.
Enter fullscreen mode Exit fullscreen mode

The AI regenerates:

# application/order_service.py
def get_order(self, order_id: str) -> dict:
    order = self.order_repo.find_by_id(order_id)
    if order is None:
        raise KeyError(order_id)
    customer = self.customer_repo.find_by_id(order.customer_id)
    loyalty_points = customer.loyalty_points if customer else 0
    return {
        "order_id": order.id,
        "status": order.status,
        "total": order.total().amount,
        "loyalty_points": loyalty_points,
    }
Enter fullscreen mode Exit fullscreen mode
# api/orders.py
@router.get("/{order_id}")
def get_order(order_id: str, service: OrderService = Depends(get_order_service)):
    try:
        return service.get_order(order_id)
    except KeyError:
        raise HTTPException(status_code=404)
Enter fullscreen mode Exit fullscreen mode

Five-signal check on the second version: passes all five. Now review the logic. Two minutes total.

The Full Workflow — Two Passes


The Two Antipatterns

Rubber-stamping: Accepting AI output without running the structural check because "it looks fine". The imports are at the top where you'd skip them. The logic reads correctly. The tests pass. Six weeks later you're finding copied loyalty logic across three route handlers. The five-signal check takes 30 seconds. Do it every time.

Over-scrutinizing style: Spending 10 minutes debating whether the AI should have used a dataclass or a TypedDict for the return type. Style matters in final polish. The first-pass question is: is this code in the right place doing the right job? Note style issues, move on, address them in a cleanup pass.


The Feedback Loop

After a few weeks of this workflow, something shifts. You stop bracing for surprise. You have a systematic check five signals, 30 seconds. Either it passes or it doesn't.

You also start calibrating your prompts based on what slips through. If Signal 2 keeps flagging logic in route handlers, your prompts for that type of feature need better negative constraints. If Signal 1 keeps catching wrong imports in application services, your architectural context block needs a stronger rule.

The review and the prompting are a feedback loop. Each failed review tells you what to add to the next prompt.

Top comments (0)