Your CI is green. Your linter is happy. The PR has three approvals. And yet — three weeks later, 2 a.m. PagerDuty.
Sound familiar?
The bugs that cause real production outages rarely look wrong. They pass lint. They pass review. They often pass tests. They emerge from the interaction between two functions, where neither author anticipated what happens when their assumptions silently collide.
This is the problem Logic-Lens was built to solve.
The Root Cause: Pattern Matching vs. Reasoning
When you ask an AI to review code without structure, it pattern-matches. It compares your code to patterns it has seen before. This works well for style. It fails for logic bugs — because logic bugs live in syntax-clean, lint-passing code that looks perfectly normal.
The research is unambiguous. Models using structured semi-formal reasoning achieve 87–93% accuracy on interprocedural code semantics tasks. Unstructured chain-of-thought: 76–78% — and the gap is largest on exactly the class of bugs that cause production incidents.
The difference isn't model capability. It's methodology.
The Fix: Structured Execution Tracing
The key insight: force the model to build an explicit execution trace before reaching any conclusion.
Instead of "does this look right?", the model must:
- State premises — every assumption about types, nullability, and preconditions
- Trace execution — follow the actual path step by step, crossing function boundaries
- Identify divergence — find the exact point where a premise breaks
- Prescribe remedy — fix the root cause, not just the symptom
This is the methodology behind Logic-Lens — an open-source plugin for Claude Code, Codex CLI, and Gemini CLI that enforces structured execution tracing on every code review.
What It Looks Like in Practice
Here's a deceptively ordinary Python function:
def process_order(order_id, items, discount_code=None):
order = db.get_order(order_id)
total = sum(item['price'] * item['qty'] for item in items)
if discount_code:
discount = coupon_service.get_discount(discount_code)
total = total * (1 - discount)
order['total'] = total
order['items'] = items
db.save_order(order)
email_service.send_confirmation(order['email'], total)
Looks fine. Three approvals. Ships to production. Here's what Logic-Lens produces:
Logic Health: 31/100
🔴 L6 — Callee Contract Mismatch
Premises: coupon_service.get_discount(discount_code) → assumed float
Trace: get_discount returns None for expired codes (documented in coupon_service.py:47)
Divergence: total * (1 - None) raises TypeError at runtime
Remedy: Guard with `if discount is not None` before applying. Add contract test.
🔴 L3 — Boundary Blindspot
Premises: items assumed non-empty
Trace: sum() over [] returns 0 → order saved with total = $0.00
Divergence: No validation before db.save_order
Remedy: Assert len(items) > 0 or raise ValueError("Order must have at least one item")
🟡 L5 — Control Flow Escape
Premises: email_service.send_confirmation assumed non-throwing
Trace: SMTPException propagates before db connection cleanup
Divergence: Connection pool exhausted under sustained email failures
Remedy: Wrap email send in try/finally; release connection unconditionally
Every finding includes all four sections — Premises, Trace, Divergence, Remedy. That's the Iron Law of Logic-Lens: no finding ships without showing its work.
Nine Logic Risk Categories
Logic-Lens evaluates code across nine dimensions:
| Code | Name | What It Catches |
|---|---|---|
| L1 | Shadow Override | Variable shadowing across scopes |
| L2 | Type Contract Breach | Type assumptions that break at runtime |
| L3 | Boundary Blindspot | Edge cases (empty, zero, max) |
| L4 | State Mutation Hazard | Shared mutable state side effects |
| L5 | Control Flow Escape | Exception paths that skip cleanup |
| L6 | Callee Contract Mismatch | Return value assumptions that fail |
| L7 | Concurrency/Async Hazard | Race conditions, await misuse |
| L8 | Resource Lifecycle Issue | Leaked connections, handles, memory |
| L9 | Time/Locale Hazard | Timezone, clock, and locale bugs |
How It Compares
| Logic-Lens | ESLint/Pylint | GitHub Copilot Review | Plain AI | |
|---|---|---|---|---|
| Explicit execution trace | ✅ | ❌ | ❌ | ❌ |
| Premises → Trace → Divergence → Remedy | ✅ | ❌ | ❌ | ❌ |
| Interprocedural bug detection | ✅ | ❌ | ~ | ~ |
| Zero config, any language | ✅ | ❌ | ✅ | ✅ |
| Auditable / reproducible reasoning | ✅ | ✅ | ❌ | ❌ |
Logic-Lens doesn't replace your linter. It catches what linters structurally cannot: behavioral bugs in syntax-clean code.
Benchmark: 91% vs. 19%
Across three real-world codebases with documented production bugs:
- Logic-Lens: 91% pass rate on interprocedural, boundary, and state-mutation scenarios
- Plain AI (unstructured): 19%
The gap isn't what the model can find with perfect prompting. It's what it consistently finds, across every run, with a traceable reasoning chain that shows its work every time.
Six Skills, One Install
logic-lens → Full structured trace (the full review)
logic-lens-quick → Fast path for time-sensitive reviews
logic-lens-security → OWASP-mapped security focus
logic-lens-perf → Bottleneck and complexity hunting
logic-lens-diff → PR diff review (interprocedural focus)
logic-lens-report → Team-ready output with severity scoring
Install in 60 Seconds
Claude Code:
/plugin marketplace add hyhmrright/logic-lens
/plugin install logic-lens@logic-lens-marketplace/logic-review```
{% endraw %}
**Gemini CLI:**
{% raw %}
/extensions install https://github.com/hyhmrright/logic-lens
**Codex CLI:**
See the [README](https://github.com/hyhmrright/logic-lens) for the skill installer command.
---
## Try It
If you've shipped a bug that passed review, it's worth running Logic-Lens on the function that caused it. The trace output is often illuminating even in retrospect.
⭐ [github.com/hyhmrright/logic-lens](https://github.com/hyhmrright/logic-lens)
**Which of the nine risk categories (L1–L9) have you hit most in production?** Drop a comment — happy to run Logic-Lens on a representative example and share the raw output.
---
## Related
If you also care about *why* your architecture has decay risks — not just where behavioral bugs live — I wrote a companion piece grounding AI code review in 12 classic engineering books:
[Show DEV: brooks-lint — an AI code reviewer that cites Fowler, Martin, and Brooks](https://dev.to/hyhmrright/i-synthesized-12-classic-engineering-books-into-an-ai-code-reviewer-heres-what-it-caught-3ed1)
The two tools cover different failure modes and work well together: Logic-Lens catches runtime behavioral bugs via execution tracing; brooks-lint diagnoses architectural decay against Fowler, Martin, Evans, and nine others.
Top comments (0)