<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Devika_ND</title>
    <description>The latest articles on DEV Community by Devika_ND (@devika2605).</description>
    <link>https://dev.to/devika2605</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3839426%2Ff000671b-7e7d-4cba-85b0-1fd25f8cf8f5.png</url>
      <title>DEV Community: Devika_ND</title>
      <link>https://dev.to/devika2605</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devika2605"/>
    <language>en</language>
    <item>
      <title>How I Built the Engine That Makes Our AI Mentor Actually Work</title>
      <dc:creator>Devika_ND</dc:creator>
      <pubDate>Mon, 23 Mar 2026 09:57:06 +0000</pubDate>
      <link>https://dev.to/devika2605/how-i-built-the-engine-that-makes-our-ai-mentor-actually-work-5ac6</link>
      <guid>https://dev.to/devika2605/how-i-built-the-engine-that-makes-our-ai-mentor-actually-work-5ac6</guid>
      <description>&lt;p&gt;By: Devika N D - Code Execution &amp;amp; Behavioral Signal Module&lt;/p&gt;

&lt;p&gt;Hindsight Hackathon - Team 1/0 coders&lt;/p&gt;

&lt;p&gt;*“The moment I submitted an infinite loop and watched the server hang forever — *&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I realized the execution engine is the most dangerous part of the whole system.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nobody told me that when you let users run arbitrary Python code on your server, you are one while True: pass away from a dead process. I found out the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  My job in this project was building the code execution engine, problem store, behavioral signal tracker, and cognitive pattern analyzer.
&lt;/h2&gt;

&lt;p&gt;\This is the story of what I built, what broke, and what I'd do differently. &lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;The AI Coding Mentor is a system where students submit Python solutions to coding problems, get evaluated against real test cases, and receive personalized hints based on how they actually behave — not what they tell us about themselves.&lt;/p&gt;

&lt;p&gt;The stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;__FastAPI __backend for all routing and execution&lt;/li&gt;
&lt;li&gt;__Groq (LLaMA 3.3 70B) __for AI-generated feedback and problems&lt;/li&gt;
&lt;li&gt;__Hindsight __for persistent behavioral memory across sessions&lt;/li&gt;
&lt;li&gt;__React __frontend with live code editor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My modules owned the entire middle of this pipeline — from the moment a user hits Submit to the moment a cognitive pattern label gets stored in memory.&lt;/p&gt;

&lt;p&gt;The system doesn't ask users how they learn. It watches them. Every keystroke count, every second spent staring  at the problem, every failed test case — all of it feeds  into a behavioral profile that gets smarter with each session.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Execution Engine — Harder Than It Looks
&lt;/h2&gt;

&lt;p&gt;The core function is run_user_code() in execution_service.py. It takes user code as a raw string, compiles it, runs it against test cases, and returns a structured result. Simple enough on paper.&lt;/p&gt;

&lt;p&gt;def run_user_code(user_code: str, function_name: str, test_cases: list[dict]) -&amp;gt; dict:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;namespace = \{\}

try:

    exec\(compile\(user\_code, "&amp;lt;user\_code&amp;gt;", "exec"\), namespace\)

except SyntaxError as e:

    return \_error\_result\(f"Syntax error: \{e\}", total, start\_time\)

except Exception as e:

    return \_error\_result\(f"Runtime error on load: \{e\}", total, start\_time\)



if function\_name not in namespace:

    return \_error\_result\(

        f"Function '\{function\_name\}' not found\.",

        total, start\_time

    \)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The first version worked perfectly for normal code. Then I tested it with an infinite loop. The server froze. No response, no error, no timeout. Just a dead process hanging indefinitely while uvicorn stopped serving everything else.&lt;/p&gt;

&lt;p&gt;On Linux you can use signal.SIGALRM to timeout a function. On Windows — which is what I’m running — that doesn’t exist. So I used threading instead:&lt;/p&gt;

&lt;p&gt;def _run_with_timeout(fn, kwargs, timeout_sec=5):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result = \{"value": None, "error": None\}



def target\(\):

    try:

        result\["value"\] = fn\(\*\*kwargs\)

    except Exception as e:

        result\["error"\] = str\(e\)



t = threading\.Thread\(target=target\)

t\.start\(\)

t\.join\(timeout=timeout\_sec\)



if t\.is\_alive\(\):

    result\["error"\] = "Time limit exceeded \(5s\)"

return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Important caveat: on Windows, threading timeout doesn’t actually kill the thread — it just stops waiting. The thread keeps running in the background. This means an infinite loop will still consume CPU even after the timeout fires. For a hackathon demo, this is good enough. For production you’d want a subprocess-based sandbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Store and AI Generator
&lt;/h2&gt;

&lt;p&gt;I started with a hardcoded problems.json — 9 problems covering arrays, strings, loops, and recursion. Each problem has a function_name, test_cases with exact inputs and expected outputs, and starter_code.&lt;/p&gt;

&lt;p&gt;Then I realized 9 problems is not enough for an adaptive system. I added an AI problem generator using Groq:&lt;/p&gt;

&lt;p&gt;def generate_problem(topic: str, difficulty: str) -&amp;gt; dict:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = client\.chat\.completions\.create\(

    model="llama\-3\.3\-70b\-versatile",

    messages=\[\{"role": "user", "content": prompt\}\],

    temperature=0\.3

\)

raw = re\.sub\(r"\`\`\`json|\`\`\`", "", raw\)\.strip\(\)

problem = json\.loads\(raw\)

problem\["id"\] = "gen\_" \+ str\(uuid\.uuid4\(\)\)\[:8\]

\_generated\_cache\[problem\["id"\]\] = problem

return problem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The endpoint is GET /problems/generate/{topic}/{difficulty}. Hit it with recursion/medium and you get a fresh problem with 3 test cases, a function signature, and starter code — all ready to run through the execution engine immediately.&lt;/p&gt;

&lt;p&gt;The trickiest part was prompt engineering. My first attempt generated Tower of Hanoi — a 4-parameter function — but test cases only passed n. Every test case failed with "missing 3 required positional arguments." The fix: be explicit in the prompt that functions must have 1 or 2 parameters maximum and input keys must match exactly.&lt;/p&gt;

&lt;p&gt;Generated problems get saved to problems.json permanently — so the dataset grows every time someone generates one. No manual JSON writing needed.&lt;/p&gt;

&lt;p&gt;This started as a workaround for a small dataset and ended up being one of the most useful features in the whole project — infinite problems, fully wired into the same execution pipeline as the static ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capturing Behavioral Signals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Beyond just running code, I needed to understand how users were behaving while they solved problems.&lt;/p&gt;

&lt;p&gt;Every time a user submits code, signal_tracker.py captures the raw behavioral data:&lt;/p&gt;

&lt;p&gt;def capture_signals(submission: CodeSubmission, result: EvalResult) -&amp;gt; dict:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;return \{

    "user\_id":        submission\.user\_id,

    "attempt\_number": submission\.attempt\_number,

    "time\_taken\_sec": submission\.time\_taken,

    "code\_edit\_count":submission\.code\_edit\_count,

    "all\_passed":     result\.all\_passed,

    "error\_types":    classify\_errors\(result\.error\_types\),

    "failed\_cases": \[

        ec for ec in result\.edge\_case\_results

        if not ec\.get\("passed"\)

    \]

\}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;These signals feed into cognitive_analyzer.py — five patterns: overthinking, guessing, rushing, concept_gap, and boundary_weakness. Each returns a confidence score between 0 and 1.&lt;/p&gt;

&lt;p&gt;Here’s the rushing detector — the one that catches users who submit without reading:&lt;/p&gt;

&lt;p&gt;def _check_rushing(signals: dict) -&amp;gt; list:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score = 0\.0



if signals\["time\_taken\_sec"\] &amp;lt; 15:

    score \+= 0\.4   \# submitted too fast



if "syntax\_error" in signals\["error\_types"\]:

    score \+= 0\.4   \# didn't even read the code



if signals\["code\_edit\_count"\] &amp;lt;= 2:

    score \+= 0\.2   \# barely touched the editor



if score &amp;gt;= 0\.4:

    return \[\{"pattern": "rushing", "confidence": round\(score, 2\)\}\]

return \[\]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Under 15 seconds, a syntax error, barely any edits — that’s a user who copy-pasted something without reading the problem. Confidence 0.8, stored in memory, hint tone adapts accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wiring the Full Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The submit_code route is where all five modules connect in sequence:&lt;/p&gt;

&lt;p&gt;@router.post("/submit_code", response_model=EvalResult)&lt;/p&gt;

&lt;p&gt;def submit_code(submission: CodeSubmission):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\# 1\. Load problem

problem = get\_problem\_by\_id\(submission\.problem\_id\)



\# 2\. Run the code

result\_dict = run\_user\_code\(

    user\_code=submission\.code,

    function\_name=problem\["function\_name"\],

    test\_cases=problem\["test\_cases"\]

\)

result = EvalResult\(\*\*result\_dict\)



\# 3\. Capture behavioral signals

signals = capture\_signals\(submission, result\)



\# 4\. Detect cognitive patterns

patterns = analyze\_patterns\(signals\)



\# 5\. Store into Hindsight memory

store\_session\(user\_id=submission\.user\_id, session\_data=\{

    "patterns":           patterns\["patterns"\],

    "dominant\_pattern":   patterns\["dominant\_pattern"\],

    "dominant\_confidence":patterns\["dominant\_confidence"\],

    "time\_taken\_seconds": signals\["time\_taken\_sec"\],

    "solved":             result\.all\_passed,

\}\)



return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;One request, five things happen: code runs, signals captured, patterns detected, memory stored, result returned. The judge can submit code and immediately call GET /memory/recall/{user_id} to see the pattern stored in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Route Order Bug That Would Have Killed Our Demo
&lt;/h2&gt;

&lt;p&gt;FastAPI registers routes in declaration order. I had this:&lt;/p&gt;

&lt;p&gt;# WRONG — dynamic route declared first&lt;/p&gt;

&lt;p&gt;@router.get("/get_problem/{problem_id}")        # swallows everything&lt;/p&gt;

&lt;p&gt;@router.get("/problems/difficulty/{difficulty}") # never reached&lt;/p&gt;

&lt;p&gt;# CORRECT — static routes before dynamic&lt;/p&gt;

&lt;p&gt;@router.get("/problems/difficulty/{difficulty}") # matched first&lt;/p&gt;

&lt;p&gt;@router.get("/problems/topic/{topic}")           # matched first&lt;/p&gt;

&lt;p&gt;@router.get("/get_problem/{problem_id}")         # dynamic last&lt;/p&gt;

&lt;p&gt;When a judge hit GET /problems/difficulty/easy, FastAPI tried to find a problem with ID "difficulty" and returned 404. Found this during final integration testing — not during development when I was only testing /get_problem/p001 directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;__Windows threading timeout doesn’t kill threads. __It stops waiting but the thread lives on. Design for subprocess isolation if you’re running untrusted code in production.&lt;/li&gt;
&lt;li&gt;__Prompt engineering for structured output is iteration. __My first AI problem generator produced functions whose test cases didn’t match the signature. Being extremely explicit about parameter constraints in the prompt fixed it.&lt;/li&gt;
&lt;li&gt;__Route order in FastAPI is load-bearing. __Dynamic routes swallow everything declared before them. Always put static routes first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Owning the full pipeline from execution to signals __forced cleaner interfaces. Because I controlled both ends of the data contract, field name mismatches between modules were caught immediately — not during&lt;/strong&gt; __&lt;/li&gt;
&lt;li&gt;&lt;p&gt;__final integration when they're painful to fix.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;__Behavioral signals are more honest than user input. __No one types “I tend to rush” into a profile form. But submit in 8 seconds with a syntax error twice in a row and the system knows exactly what’s happening.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources &amp;amp; Links
&lt;/h2&gt;

&lt;p&gt;Hindsight GitHub: &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;https://github.com/vectorize-io/hindsight&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hindsight Docs: &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;https://hindsight.vectorize.io/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent Memory: &lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;https://vectorize.io/features/agent-memory&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>microsoft</category>
      <category>ai</category>
      <category>fastapi</category>
    </item>
  </channel>
</rss>
