Kamaumbugua-dev

Posted on Mar 15

I Built an AI That Finds Your Bugs and Rewrites Your Code to Fix Them.

#python #react #ai #security

How I built CodeLens — a Groq-powered code review tool that detects SQL injection, memory leaks, and O(n²) algorithms, then rewrites your entire file with all issues resolved. Full breakdown of the architecture, prompt engineering tricks, and the LLM hallucination problem I had to solve.

Every developer has shipped a bug they should have caught.

Not because they were careless. Because code review is expensive. You're scanning hundreds of lines for subtle patterns: a missing conn.close(), an f-string wired directly into a SQL query, a nested loop that looks innocent at n = 10 but detonates at n = 10,000.

I wanted to build a tool that never gets tired, never misses a pattern, and can tell you exactly what will go wrong in production — before you push.

That's CodeLens.

What It Does

Paste any code. In seconds you get:

A health score (0–100) with an animated gauge
Every vulnerability categorized by severity: CRITICAL, WARNING, INFO
Exact line numbers, descriptions, fix suggestions, and predicted production impact
A "Rework Code" button that rewrites your entire file with every issue resolved, with inline # FIX: comments explaining each change

Here's what it catches on a simple Python file:

CRITICAL  SQL Injection            L7     f-string in cursor.execute()
CRITICAL  Hardcoded Credentials    L27    password = "admin123"
CRITICAL  Unsafe eval()            L29    eval(open("config.txt").read())
CRITICAL  Plaintext Card Numbers   L15    print(f"...card {card_number}")
WARNING   Resource Leak             L16    file handle never closed
WARNING   Resource Leak             L42    db connection never closed
WARNING   O(n²) Complexity          L46    nested loop over same list
WARNING   Unbounded Cache           L38    dict with no eviction policy
INFO      Division by Zero Risk     L50    len(transactions) unchecked

Health score: 28 / 100.

One click later, the LLM rewrites the file. Every issue fixed. Every change commented.

The Stack

Deliberately lean:

React 19 (Vercel)  →  FastAPI (Render)  →  Groq API (llama-3.3-70b)

No database. No auth. No queue. Every request is stateless — code goes in, analysis comes out.

The frontend is a three-panel layout:

Code editor — line numbers highlight affected lines in red
Analysis dashboard — health gauge, metric bars, issue list with severity filters
Vulnerability slides — right panel with CSS scroll-snap, one full-height card per vulnerability

The backend has three endpoints worth talking about: /analyze, /fix, and /github/analyze.

The Hard Part: Getting the LLM to Return Valid JSON Every Time

The analysis response needs to be machine-parseable. Every time. Across any language, any code quality, any edge case.

This is harder than it sounds. By default, models wrap JSON in markdown fences, add explanatory preamble, or truncate responses mid-object when they hit a token limit. Any of these breaks the frontend.

My system prompt ends with:

Return ONLY valid JSON. No markdown, no code fences, no explanation outside the JSON.

And I strip artifacts post-response with:

raw_text = re.sub(r"^```

(?:json)?\s*", "", raw_text)
raw_text = re.sub(r"\s*

```$", "", raw_text)
analysis = json.loads(raw_text)

This handles 99% of cases. The remaining 1% raises a json.JSONDecodeError that returns a structured 500 to the client.

The Line Number Hallucination Problem

This was the most interesting bug I fixed.

Early versions of CodeLens would confidently report issues on lines that didn't exist. A 50-line file would get issues flagged at lines 73, 91, 108. The model was pattern-matching against training data — it recognized the type of bug and estimated a line number based on where it typically appears in codebases it had seen, not in the code you gave it.

The fix is obvious in hindsight: give the model line numbers to reference.

Instead of sending:

import sqlite3

def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"

I send:

1 | import sqlite3
2 |
3 | def get_user(username):
4 |     query = f"SELECT * FROM users WHERE username = '{username}'"

And I add an explicit constraint to the prompt:

The code has 50 lines total. You MUST only reference line numbers
that actually exist (1 to 50).

The implementation:

def add_line_numbers(code: str) -> str:
    lines = code.splitlines()
    width = len(str(len(lines)))
    return "\n".join(
        f"{str(i+1).rjust(width)} | {line}"
        for i, line in enumerate(lines)
    )

Hallucinated line numbers dropped to near zero. The model now has a concrete anchor instead of a floating reference.

The Rework Pipeline

The "Rework Code" feature is a second LLM call chained to the first.

After analysis, the frontend sends the original code + the full issue list to /fix:

class FixRequest(BaseModel):
    code: str
    language: str
    issues: List[Any]

The fix prompt encodes every issue as a line-referenced instruction:

Fix ALL of the following issues in this python code:

ISSUES TO FIX:
  - [Line 7] [CRITICAL] SQL Injection: Use parameterized queries
  - [Line 27] [CRITICAL] Hardcoded Credentials: Use os.environ.get(...)
  - [Line 29] [CRITICAL] Unsafe eval(): Use json.load() instead
  ...

ORIGINAL CODE:
{code}

Return the complete fixed code with inline FIX comments.

The system prompt is strict:

Return RAW CODE ONLY. No markdown fences, no explanation, no preamble.
Add inline comments prefixed with # FIX: explaining each change.

The result gets placed back into the editor. The user sees their fixed file immediately.

The CORS Bug That Burned Two Hours

Deploying to Vercel + Render exposed something I'd glossed over: allow_origins=["*"] and allow_credentials=True is invalid per the CORS specification.

Browsers enforce this at the preflight stage. Your OPTIONS request returns 200, but the browser rejects the response because the spec says wildcard origins cannot coexist with credentials. You get a cryptic console error and a silent failure in the UI.

The fix is one line:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=False,  # must be False with wildcard origin
    allow_methods=["*"],
    allow_headers=["*"],
)

Worth knowing before you spend two hours debugging network tab preflight responses.

The Vulnerability Slides

The right panel uses CSS scroll-snap-type: y mandatory. Each vulnerability gets its own full-height card:

scroll-snap-type: y mandatory;

<div style={{ height: "100%", scrollSnapAlign: "start" }}>
  <VulnSlide issue={issue} />
</div>

There's a dot navigation sidebar that syncs with the scroll position:

onScroll={(e) => {
  const idx = Math.round(
    e.target.scrollTop / e.target.clientHeight
  );
  setActiveSlide(idx);
}}

Rounding (not flooring) prevents the active dot from flickering during the snap animation — the snap always settles on an integer, but scrollTop passes through fractional values mid-animation.

Each slide has a "SLIDE" button in the issue list that calls:

slidesRef.current.scrollTo({
  top: idx * slidesRef.current.clientHeight,
  behavior: "smooth"
});

Bi-directional sync between the list and the slides, no state management library needed.

Deployment Notes

A few things that bit me:

Render cold starts. The free tier sleeps services after 15 minutes of inactivity. First request after sleep takes 30–50 seconds. I added a loading state with an explanation so users wait instead of leave.

Vite bakes env vars at build time. VITE_API_BASE is injected into the bundle when Vercel builds — not at runtime. Old preview deployment URLs serve old bundles permanently. The production domain always reflects the latest build. If your frontend is still hitting the wrong backend, you're on an old preview URL.

Railway port mismatch. I originally deployed on Railway. The dashboard had the networking port set to 8000, but the $PORT environment variable was 8080. Internal healthchecks passed (Railway probed the container directly), but external traffic failed at the edge with persistent 502s. Moved to Render, problem gone.

Try It

Live: codelens-new.vercel.app

Source: github.com/Kamaumbugua-dev/CODELENS

Paste the worst code you can find. The demo loads a Python file with SQL injection, hardcoded secrets, unsafe eval(), and an O(n²) algorithm. Hit Analyze, then Rework. The whole thing takes about 10 seconds on a warm backend.

Built by Steven K. — Head of AXON LATTICE LABS™

CodeLens™ — See your code's future before it ships.

Top comments (4)

Mykola Kondratiuk • Mar 21

honestly this hits close to home. I built a few AI apps where the AI was writing code faster than I could review it and security stuff just... slipped through. things like hardcoded tokens, no input validation, SQL built from strings. I think the hardest part is not finding the bugs - its the rewrite step, getting the AI to not just patch a symptom but actually fix the underlying pattern. did you find it would sometimes rewrite in a way that introduced new issues? tbh I wonder if adding a second pass just for that would help

Bhavin Sheth • Mar 16

This is really interesting. The line-number hallucination problem you mentioned is something I’ve seen when experimenting with LLMs on code too. Giving the model numbered lines is a smart fix. Also like the idea of returning a full rewritten file with clear # FIX comments — that would make it much easier for developers to actually learn from the changes instead of just patching the bug. Nice work building this.

Apex Stack • Mar 17

The line number hallucination fix is a great insight — giving the model concrete anchors instead of letting it pattern-match against training data. I run into the exact same class of problem generating structured financial analysis for 8,000+ stock tickers with a local Llama 3 instance. The model will confidently report a P/E ratio of 41% instead of 0.42%, or reference a metric that doesn't exist in the source data, because it's interpolating from what it "expects" to see rather than what's actually there.

Your JSON parsing approach (regex strip + strict prompt) maps directly to my pipeline too. At scale, that 1% failure rate on JSON parsing becomes ~80 broken pages per batch run. I ended up adding a Pydantic validation layer as a second pass — if the JSON parses but the data fails schema validation (like a line number outside the file's range, or a severity level that isn't in the enum), it gets flagged for retry rather than silently passing through.

Curious whether you've considered batching the analyze + fix into a single LLM call with structured output. The two-call approach is cleaner architecturally, but at Groq's inference speeds you might be able to get both in one pass and cut the latency in half.

Mohamed Yaseen • Mar 16

One of the pain point that Today's Developers face who use AI. Good Job!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.