Alekhya Bonamukkala

Posted on Mar 23

Hindsight caught repeated AST traversal bugs

#ai #beginners #devops #devchallenge

Hindsight caught repeated AST traversal bugs

“Why is it flagging recursion again?” I checked the logs, and the agent wasn’t looping—it was using Hindsight to call out the same AST traversal mistake I’d made three commits ago.

Last night I assumed my AST walker was fixed; by morning, Hindsight had surfaced the exact same traversal bug across three different submissions I thought were unrelated.

What I actually built

I’ve been working on Codemind, a coding practice platform that doesn’t just run user code—it tries to understand how someone is solving a problem and guide them while they’re doing it.

At a high level, the system looks like this:

A frontend where users write and submit code
A Python backend that parses and analyzes submissions
A sandboxed execution layer (Firecracker-style isolation) to safely run code
An analysis pipeline that walks ASTs and applies rules + taint tracking
A memory layer powered by Hindsight that stores patterns across attempts

The interesting part isn’t execution—it’s the analysis loop:

Parse code into an AST
Traverse it to detect patterns (recursion misuse, unsafe flows, etc.)
Generate feedback
Store what happened
Use that history to influence future feedback

That last step is where things got weird—in a good way.

The problem I thought I had

Originally, I thought my biggest challenge would be writing good static analysis rules.

Things like:

Detect recursion without base case
Identify unsafe variable flows (taint analysis)
Catch incorrect loop termination

So I built a fairly standard AST walker. Something like:

def visit(node):
    if isinstance(node, FunctionDef):
        check_recursion(node)

    for child in ast.iter_child_nodes(node):
        visit(child)

Pretty normal. Walk the tree, inspect nodes, apply rules.

And it mostly worked.

But then I started noticing something annoying:
I kept fixing the same class of bugs over and over again.

Not in the codebase—in the submissions.

Users (including me, while testing) would:

Write a recursive function
Forget the base condition
Or place it incorrectly
Or structure traversal in a subtly broken way

And every time, the analyzer would treat it as a fresh problem.

Stateless. No memory.

That’s when I added Hindsight.

What I expected Hindsight to do

I expected Hindsight to act like a simple log:

Store past attempts
Maybe surface similar ones
Help generate slightly better hints

Basically, better context.

Instead, it started behaving like pattern recognition across time.

I integrated it using the official repo:

The idea was simple: every analysis run produces an “event”.

event = {
    "user_id": user_id,
    "code": submission,
    "issues": detected_issues,
    "ast_patterns": extracted_patterns,
}
store_event(event)

Then, before generating feedback:

history = hindsight.query(user_id=user_id)

similar = find_similar_patterns(history, current_patterns)

if similar:
    adjust_feedback(similar)

That’s it. No magic.

But the behavior changed dramatically.

The bug that wouldn’t stay fixed

The first time I noticed something different was with a recursion check.

My rule looked something like this:

def check_recursion(func_node):
    calls = find_recursive_calls(func_node)

    if calls and not has_base_case(func_node):
        report("Missing base case in recursion")

Clean. Straightforward.

But users kept triggering false positives or weird edge cases:

Base case present but unreachable
Base case after recursive call
Nested conditions that break termination

Individually, each case looked different.

Stateless analysis treated them as separate.

Hindsight didn’t.

Hindsight started grouping my mistakes

Once I started storing AST-derived patterns, I began extracting simple features:

patterns = {
    "has_recursion": True,
    "base_case_position": "after_recursive_call",
    "branching_depth": compute_depth(node),
}

Nothing fancy. Just structural hints.

But when I queried history, I started seeing clusters:

Same incorrect ordering
Same missing condition structure
Same flawed traversal shape

Even across different users.

That’s when the feedback changed from:

“Missing base case”

to:

“Your base case is placed after the recursive call, which matches a previous failing pattern.”

That’s a completely different experience.

The moment it clicked

“This node ordering looks familiar.”

That line came from a debug print I had added while comparing ASTs.

I realized Hindsight wasn’t just storing failures—it was letting me compare structure over time.

I added a crude similarity function:

def is_similar(p1, p2):
    return (
        p1["has_recursion"] == p2["has_recursion"] and
        p1["base_case_position"] == p2["base_case_position"]
    )

And suddenly:

Different submissions mapped to the same pattern
Feedback became consistent
I stopped chasing individual bugs

I was debugging classes of mistakes, not instances.

Where my design broke

The first version of this system had a big flaw:

I stored too much raw data.

Entire ASTs, full code blobs, verbose logs.

It made retrieval slow and noisy.

Worse, similarity comparisons became meaningless.

I refactored to store only distilled features:

event = {
    "user_id": user_id,
    "patterns": extract_patterns(ast_tree),
    "issues": issues,
    "timestamp": now(),
}

This made two things easier:

Fast comparison
Clear grouping of mistakes

Tradeoff: I lost some context.

But for feedback generation, structure mattered more than raw code.

How feedback actually changed

Before Hindsight:

Each submission analyzed in isolation
Generic messages
No sense of progression

After Hindsight:

Feedback references past mistakes
Patterns influence hint priority
Repeated issues get stricter guidance

Example:

Before:

“Check your recursion logic.”

After:

“You’ve repeated a pattern where the base case is evaluated after recursion. Try moving it before the recursive call.”

That’s not just better—it’s targeted.

One real scenario

A user writes:

def factorial(n):
    return n * factorial(n-1) if n > 1 else 1

Looks fine.

Then they modify it:

def factorial(n):
    return factorial(n-1) * n

No base case.

First submission:
→ flagged as missing base case

Second submission (after hint):
→ same mistake, slightly different structure

Without memory: same feedback again.

With Hindsight:

Recognizes repeated pattern
Escalates feedback
Suggests exact structural fix

That escalation turned out to be key.

What surprised me most

I didn’t expect Hindsight to help me debug my own analyzer.

But it did.

By clustering mistakes, I could see:

Where my rules were too broad
Where they were missing edge cases
Which patterns kept slipping through

In a way, user mistakes became test cases.

Lessons learned

1. Stateless analysis hides patterns
If you only look at one submission at a time, you miss the bigger picture. Most bugs repeat in slightly different forms.

2. Store structure, not raw data
AST features worked better than full trees. Smaller, comparable representations made everything easier.

3. Similarity doesn’t need to be fancy
Simple comparisons (position, presence, ordering) were enough to group meaningful patterns.

4. Feedback quality depends on memory
Better rules didn’t improve feedback nearly as much as remembering past mistakes did.

5. Your system will learn things you didn’t design for
I added Hindsight to improve hints. It ended up exposing flaws in my analyzer and changing how I debug.

What I’d do differently

If I rebuilt this:

I’d design pattern extraction first, not last
I’d define similarity metrics upfront
I’d treat memory as a core system, not an add-on

Right now, Hindsight sits alongside the analyzer.

It probably belongs inside it.

Closing

I went in thinking I was building a better static analyzer.

What I ended up building was a system that remembers how people fail—and uses that to guide them forward.

And somewhere along the way, it started catching my own repeated AST traversal bugs before I even noticed them.

DEV Community

Hindsight caught repeated AST traversal bugs

Hindsight caught repeated AST traversal bugs

What I actually built

The problem I thought I had

What I expected Hindsight to do

The bug that wouldn’t stay fixed

Hindsight started grouping my mistakes

The moment it clicked

Where my design broke

How feedback actually changed

One real scenario

What surprised me most

Lessons learned

What I’d do differently

Closing

Top comments (0)