<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tyler H</title>
    <description>The latest articles on DEV Community by Tyler H (@tyy130).</description>
    <link>https://dev.to/tyy130</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3774657%2F4abfa93c-2ccc-4352-92a7-e5b2dbf7adf6.jpeg</url>
      <title>DEV Community: Tyler H</title>
      <link>https://dev.to/tyy130</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tyy130"/>
    <language>en</language>
    <item>
      <title>NeuroGuard: AI-Native Code Security Using Gemma 4's Glass-Box Thinking Mode</title>
      <dc:creator>Tyler H</dc:creator>
      <pubDate>Wed, 13 May 2026 10:00:38 +0000</pubDate>
      <link>https://dev.to/tyy130/neuroguard-ai-native-code-security-using-gemma-4s-glass-box-thinking-mode-1hgh</link>
      <guid>https://dev.to/tyy130/neuroguard-ai-native-code-security-using-gemma-4s-glass-box-thinking-mode-1hgh</guid>
      <description>&lt;p&gt;&lt;em&gt;Submitted to the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Build With Gemma 4&lt;/a&gt; track of the Dev.to Google Gemma 4 Challenge.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built &lt;code&gt;neuroguard&lt;/code&gt; — a CLI that uses Gemma 4's &lt;code&gt;ThinkingConfig(include_thoughts=True)&lt;/code&gt; API to stream the model's full cognitive trace in a split-pane terminal UI while it finds security vulnerabilities and produces a SAST-verified secure rewrite. &lt;strong&gt;&lt;a href="https://neuroguard-psi.vercel.app" rel="noopener noreferrer"&gt;Live demo →&lt;/a&gt;&lt;/strong&gt; | Install: &lt;code&gt;pip install neuroguard-ai&lt;/code&gt; | Source: &lt;a href="https://github.com/tyy130/neuroguard-ai" rel="noopener noreferrer"&gt;github.com/tyy130/neuroguard-ai&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem I Kept Running Into
&lt;/h2&gt;

&lt;p&gt;security&lt;br&gt;
Studies find the majority of AI-generated applications ship to production with OWASP Top 10 vulnerabilities. I've seen it firsthand. The worst cases aren't SQL injections from typos — they're &lt;strong&gt;hallucinated bypasses&lt;/strong&gt;: an AI agent removes authentication middleware to resolve a compilation error, silently stripping the application of its entire security layer.&lt;/p&gt;

&lt;p&gt;The frustrating thing is that a human reviewer wouldn't make this mistake, because they'd &lt;em&gt;reason&lt;/em&gt; about what the code does before deleting it. The AI just optimized for "code compiles" without the security reasoning step.&lt;/p&gt;

&lt;p&gt;The root cause is &lt;strong&gt;opacity&lt;/strong&gt;. When a black-box LLM generates insecure code, you can't see why. You get the output without the reasoning. And without the reasoning, you can't tell if the model considered security at all — or silently decided to ignore it.&lt;/p&gt;

&lt;p&gt;I wanted to fix that.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Makes Gemma 4 Different
&lt;/h2&gt;

&lt;p&gt;Two approaches existed before Gemma 4:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hidden (OpenAI o1/o3)&lt;/strong&gt;: These models run a real reasoning process, but the trace is completely invisible. You get a &lt;code&gt;reasoning_tokens&lt;/code&gt; count in the usage object, nothing else. You can't route it, log it, or build on top of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline text (local R1, prompted CoT)&lt;/strong&gt;: When you run a reasoning model locally — like R1 via Ollama — or prompt any model to think step by step, the reasoning ends up in the same string as the response, separated only by &lt;code&gt;&amp;lt;think&amp;gt;...&amp;lt;/think&amp;gt;&lt;/code&gt; tags. You can see it, but you have to parse it out. Tags can split across stream chunks, the model can reopen reasoning after &lt;code&gt;&amp;lt;/think&amp;gt;&lt;/code&gt;, and there's no API-level guarantee about the boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 does something different. &lt;code&gt;ThinkingConfig(include_thoughts=True)&lt;/code&gt; emits reasoning as &lt;strong&gt;structurally separate stream parts&lt;/strong&gt; — each chunk carries a &lt;code&gt;thought=True&lt;/code&gt; field. The reasoning and the response are separated at the API level, not by text parsing.&lt;/p&gt;

&lt;p&gt;That API-level separation is what makes NeuroGuard possible. I can route thought parts to a left pane and response parts to a right pane in real-time, with no regex parsing, no risk of the boundary getting confused, no thought tokens leaking into the final output.&lt;/p&gt;


&lt;h2&gt;
  
  
  How NeuroGuard Works
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;┌─────────────────────────────┬────────────────────────────┐&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="err"&gt;🧠&lt;/span&gt; &lt;span class="n"&gt;Gemma&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;Thinking&lt;/span&gt;        &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="err"&gt;🔒&lt;/span&gt; &lt;span class="n"&gt;Secure&lt;/span&gt; &lt;span class="n"&gt;Rewrite&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="err"&gt;─────────────────────────&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="err"&gt;─────────────────────────&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;SQL&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;47&lt;/span&gt; &lt;span class="n"&gt;concatenates&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;directly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;        &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;classic&lt;/span&gt; &lt;span class="n"&gt;injection&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;parameterized&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;                 &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;            &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;62&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;executes&lt;/span&gt; &lt;span class="n"&gt;arbitrary&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;    &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;     &lt;span class="err"&gt;│&lt;/span&gt;      &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connect&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="n"&gt;This&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;RCE&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;             &lt;span class="err"&gt;│&lt;/span&gt;      &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;       &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;                             &lt;span class="err"&gt;│&lt;/span&gt;          &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM    │
│                             │           users WHERE      │
│                             │           id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;└─────────────────────────────┴────────────────────────────┘&lt;/span&gt;
  &lt;span class="n"&gt;Bandit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="err"&gt;✓&lt;/span&gt; &lt;span class="nc"&gt;CLEAN &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rewrite&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The left pane streams as Gemma 4 reasons. The right pane fills in as it produces the secure rewrite. Bandit runs on the rewrite at the end and confirms the fix is real.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Core API Call
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-31b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)])],&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thinking_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ThinkingConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;include_thoughts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;thinking_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thinking_budget&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# scales with SAST severity
&lt;/span&gt;        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thought&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;think&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;   &lt;span class="c1"&gt;# → left pane
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;               &lt;span class="c1"&gt;# → right pane
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That's it. No regex. No text parsing. The &lt;code&gt;thought=True&lt;/code&gt; flag on stream parts is the entire separation mechanism.&lt;/p&gt;
&lt;h3&gt;
  
  
  Making the Thinking Load-Bearing
&lt;/h3&gt;

&lt;p&gt;The key design decision was making the thinking trace &lt;em&gt;load-bearing&lt;/em&gt;, not decorative. I inject SAST findings from Bandit/semgrep directly into the prompt before the model starts reasoning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SAST pre-scan findings (ground truth — confirm or refute each in your reasoning):

  [HIGH] B608 hardcoded_sql_expressions — line 47
  [HIGH] B307 eval() — line 62
  [MEDIUM] B105 hardcoded_password_string — line 12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the model's thinking trace is explicitly reasoning about concrete, tool-verified findings. It can't skip them. It either confirms the finding and fixes it, or explains why it's a false positive. Either way, you have an auditable chain of evidence tied to specific lines.&lt;/p&gt;

&lt;p&gt;The thinking budget scales automatically: &lt;code&gt;4096 + HIGH_count × 512 + MEDIUM_count × 256&lt;/code&gt; tokens (capped at 16384). Files with more HIGH findings get proportionally deeper reasoning.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;The built-in demo (&lt;code&gt;demo/vuln_sample.py&lt;/code&gt;) is a Flask app with 5 intentional vulnerabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# demo/vuln_sample.py — intentionally vulnerable
&lt;/span&gt;
&lt;span class="n"&gt;SECRET_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supersecret123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;   &lt;span class="c1"&gt;# hardcoded secret
&lt;/span&gt;
&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            &lt;span class="c1"&gt;# no auth check
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;admin_panel&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Admin panel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# SQL injection
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/eval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_code&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;      &lt;span class="c1"&gt;# RCE
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;neuroguard review demo/vuln_sample.py&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bandit finds 4 HIGH/MEDIUM findings in the original&lt;/li&gt;
&lt;li&gt;Those findings are injected into the prompt&lt;/li&gt;
&lt;li&gt;Gemma 4 streams its reasoning — you watch it identify the injection vector, explain the attack path, and reason through the fix&lt;/li&gt;
&lt;li&gt;The secure rewrite uses parameterized queries, removes &lt;code&gt;eval()&lt;/code&gt;, moves the secret to env vars&lt;/li&gt;
&lt;li&gt;Bandit runs on the rewrite: &lt;strong&gt;0 findings&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The thinking trace is the proof of work. You don't have to trust the rewrite blindly — you can see the exact chain of reasoning that produced it.&lt;/p&gt;




&lt;h2&gt;
  
  
  SAST + LLM: Two Layers of Confidence
&lt;/h2&gt;

&lt;p&gt;One thing I deliberately avoided was making this "just an LLM." Bandit (for Python) and semgrep/regex patterns (for JS/TS) run &lt;em&gt;before&lt;/em&gt; the model sees the code. The findings are facts fed into the reasoning layer.&lt;/p&gt;

&lt;p&gt;After the rewrite, they run again. The exit code is non-zero if the original had HIGH/MEDIUM findings — so in CI/CD, your pipeline fails on vulnerable code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/neuroguard.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Security review&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;neuroguard review src/ --format json&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GEMINI_API_KEY }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also get a Slack notification with Gemma 4's reasoning excerpt, post a GitHub PR comment automatically, or pipe JSON to any webhook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;neuroguard review app.py &lt;span class="nt"&gt;--notify-slack&lt;/span&gt; https://hooks.slack.com/...
neuroguard review app.py &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.thinking'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;neuroguard/
├── agent.py           # Gemma 4 streaming client — ThinkingConfig, retry/fallback
├── thinking_parser.py # Routes &amp;lt;think&amp;gt; parts to left pane, response to right
├── prompts.py         # Language-aware prompt + SAST findings injection
├── cli.py             # Typer CLI: review, install-hooks, --format json/text
├── integrations.py    # Slack Block Kit, webhook, GitHub PR comments
├── tools/
│   ├── sast.py        # Bandit wrapper → Python findings
│   └── js_sast.py     # semgrep + regex fallback → JS/TS findings
└── ui.py              # Rich split-pane Live layout (12fps)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Model fallback:&lt;/strong&gt; If the 31B dense model hits a rate limit, NeuroGuard falls back to &lt;code&gt;gemma-4-26b-a4b-it&lt;/code&gt; (MoE, ~4B active params) automatically. The demo never stalls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language support:&lt;/strong&gt; Python, JavaScript, TypeScript, JSX, TSX.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;neuroguard-ai
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key   &lt;span class="c"&gt;# free at https://aistudio.google.com/apikey&lt;/span&gt;

&lt;span class="c"&gt;# against your own code&lt;/span&gt;
neuroguard review app.py

&lt;span class="c"&gt;# against the built-in vulnerable demo&lt;/span&gt;
git clone https://github.com/tyy130/neuroguard-ai
&lt;span class="nb"&gt;cd &lt;/span&gt;neuroguard-ai
neuroguard review demo/vuln_sample.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see Gemma 4's full reasoning trace in real-time, then a clean, Bandit-verified secure rewrite.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Beyond the Demo
&lt;/h2&gt;

&lt;p&gt;The shift happening in software development right now is that AI generates the first draft of most code. That's not going to stop. But "vibe coding" — accepting AI output without verification — is already producing an epidemic of OWASP vulnerabilities in production systems.&lt;/p&gt;

&lt;p&gt;The answer isn't to distrust AI-generated code. It's to &lt;strong&gt;demand transparency from the model before you trust the output&lt;/strong&gt;. Gemma 4's Thinking Mode makes that possible at the API level for the first time.&lt;/p&gt;

&lt;p&gt;NeuroGuard is a concrete demonstration of what that looks like: the model can't silently delete an auth check if its reasoning is visible. The audit trail is the security control.&lt;/p&gt;

&lt;p&gt;Apache 2.0. The Kaggle weights mean you can run this on-premise — no code ever leaves your network.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/tyy130/neuroguard-ai" rel="noopener noreferrer"&gt;github.com/tyy130/neuroguard-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/neuroguard-ai/" rel="noopener noreferrer"&gt;pypi.org/project/neuroguard-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Landing page: &lt;a href="https://neuroguard-psi.vercel.app" rel="noopener noreferrer"&gt;neuroguard-psi.vercel.app&lt;/a&gt;&lt;em&gt;gemmachallenge&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gemmachallenge</category>
      <category>security</category>
      <category>python</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>Notion Cortex: A Multi-Agent AI Research System Where Notion Is the Operating System</title>
      <dc:creator>Tyler H</dc:creator>
      <pubDate>Tue, 31 Mar 2026 21:51:17 +0000</pubDate>
      <link>https://dev.to/tyy130/notion-cortex-a-multi-agent-ai-research-system-where-notion-is-the-operating-system-3i58</link>
      <guid>https://dev.to/tyy130/notion-cortex-a-multi-agent-ai-research-system-where-notion-is-the-operating-system-3i58</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/notion-2026-03-04"&gt;Notion MCP Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Notion Cortex&lt;/strong&gt; is a multi-agent AI research system that uses Notion as its operating system — not just an output destination, but the shared coordination layer where agents think, communicate, and await human approval.&lt;/p&gt;

&lt;p&gt;Give it any topic, and five specialized AI agents fan out in parallel:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scout agents&lt;/strong&gt; (x5) research different angles simultaneously, extracting structured entities into a Knowledge Graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyst&lt;/strong&gt; cross-references all findings, identifies patterns and gaps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesizer&lt;/strong&gt; streams a structured synthesis directly into Notion as it thinks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval Gate&lt;/strong&gt; pauses execution and waits for you to review in Notion — set Status to "Approved" to continue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writer&lt;/strong&gt; produces a publication-ready intelligence brief with headings, entity tables, and conclusions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every agent's reasoning streams into its own &lt;strong&gt;Working Memory&lt;/strong&gt; page in real time. You can literally watch them think in Notion.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;notion-cortex &lt;span class="s2"&gt;"The rise of autonomous AI agents in software engineering"&lt;/span&gt;
&lt;span class="go"&gt;
🧠 Notion Cortex — starting run for: "The rise of autonomous AI agents..."

📋 Bootstrapping Notion workspace...
✅ Workspace ready (1.6s)

🧩 Decomposing topic into research angles...
   5 angles identified

🚀 Running 5 Scout agents (concurrency: 3)...
  ✅ Scout 1 done
  ✅ Scout 2 done
&lt;/span&gt;&lt;span class="c"&gt;  ...
&lt;/span&gt;&lt;span class="go"&gt;
📊 All Scouts complete (103s). Running Analyst...
✅ Analyst done (31s)

🕸️  Computing knowledge graph relations...
✅ Relations linked (8s)

🔗 Running Synthesizer...
✅ Synthesis written (23s)

✍️  Running Writer...
✅ Writer done (43s)

🎉 Done in 229s! Intelligence brief: https://notion.so/...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Video Demo
&lt;/h2&gt;


&lt;div class="ltag_asciinema"&gt;
  
&lt;/div&gt;



&lt;p&gt;The demo shows a complete run from &lt;code&gt;notion-cortex "topic"&lt;/code&gt; through all 5 agent phases to the final intelligence brief in Notion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Show us the code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tyy130/notion-cortex" rel="noopener noreferrer"&gt;github.com/tyy130/notion-cortex&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
  index.ts              CLI entry point + setup wizard
  cleanup.ts            Archives all cortex-* databases for a fresh start
  orchestrator.ts       Pipeline coordinator
  llm.ts                Dual-provider streaming (OpenAI + Anthropic)
  streaming.ts          Token buffer → timed Notion block flush
  concurrency.ts        Write queue (p-limit) + exponential backoff retry
  types.ts              Zod schemas for all database entry types
  agents/
    scout.ts            Research + entity extraction via MCP
    analyst.ts          Cross-scout analysis + KG enrichment
    synthesizer.ts      Structured synthesis streamed to Working Memory
    writer.ts           Final brief written to Outputs database
  notion/
    bootstrap.ts        Idempotent 5-database workspace creation
    client.ts           Notion SDK singleton
    mcp-client.ts       Notion MCP server (stdio transport)
    task-bus.ts         Agent task queue CRUD
    working-memory.ts   Streaming page writer + content reader
    knowledge-graph.ts  Entity store with serialized upsert
    approval-gates.ts   Human-in-the-loop polling
    outputs.ts          Final page publisher
    markdown-blocks.ts  Markdown → Notion block converter
    utils.ts            Shared helpers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Technical Decisions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Serialized KG upsert&lt;/strong&gt;: Parallel scouts can discover the same entity simultaneously. A &lt;code&gt;pLimit(1)&lt;/code&gt; queue wraps the check-then-create operation, making the upsert atomic without a database lock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two-queue concurrency design&lt;/strong&gt;: &lt;code&gt;writeQueue&lt;/code&gt; (pLimit(3)) handles Notion API rate limiting. &lt;code&gt;kgUpsertQueue&lt;/code&gt; (pLimit(1)) handles logical atomicity. Different concerns, different queues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotent bootstrap with archived filtering&lt;/strong&gt;: &lt;code&gt;bootstrapWorkspace&lt;/code&gt; searches for existing &lt;code&gt;cortex-*&lt;/code&gt; databases and reuses them. It filters out archived databases (Notion's search API returns them by default) and uses &lt;code&gt;databases.update&lt;/code&gt; to ensure schema migrations apply to pre-existing databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual-provider LLM abstraction&lt;/strong&gt;: Supports OpenAI (default) and Anthropic with streaming and multi-turn tool-use loops. Switch with one env var.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;55 tests across 13 files&lt;/strong&gt;: Full coverage of the orchestrator pipeline, all agents, concurrency utilities, markdown converter, and Notion data layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tyy130/notion-cortex.git
&lt;span class="nb"&gt;cd &lt;/span&gt;notion-cortex
npm &lt;span class="nb"&gt;install
&lt;/span&gt;notion-cortex setup    &lt;span class="c"&gt;# interactive wizard&lt;/span&gt;
notion-cortex &lt;span class="s2"&gt;"your research topic"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;Notion isn't just where output ends up — it's the runtime substrate. The &lt;strong&gt;Notion MCP server&lt;/strong&gt; (&lt;code&gt;@notionhq/notion-mcp-server&lt;/code&gt;) runs as a stdio subprocess, giving Scout agents access to &lt;code&gt;notion_search&lt;/code&gt; — they check what knowledge already exists in the workspace before extracting new entities, avoiding redundant work across runs.&lt;/p&gt;

&lt;p&gt;Beyond MCP search, each database works as infrastructure through the Notion SDK:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Task Bus (agent coordination)
&lt;/h3&gt;

&lt;p&gt;The orchestrator creates tasks, scouts claim them via &lt;code&gt;assigned_agent&lt;/code&gt;, and status transitions (&lt;code&gt;pending → active → done → blocked&lt;/code&gt;) drive the pipeline forward. This is a distributed task queue implemented entirely in Notion.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Working Memory (streaming scratchpad)
&lt;/h3&gt;

&lt;p&gt;Each agent gets a dedicated Notion page. As tokens stream from the LLM, a timed buffer flushes them as paragraph blocks to the page every second. You can open a scout's Working Memory page and watch it think in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Knowledge Graph (structured entity store)
&lt;/h3&gt;

&lt;p&gt;Scouts extract entities (companies, products, trends, concepts) with claims, confidence levels, and source URLs. A serialized upsert queue (&lt;code&gt;pLimit(1)&lt;/code&gt;) prevents duplicate entities when parallel scouts find the same thing. After the Analyst pass, &lt;code&gt;computeAndStoreRelations&lt;/code&gt; scans all entities and auto-links them using Notion's relation property — if "GitHub Copilot" appears in another entity's claim, they get linked.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Approval Gates (human-in-the-loop)
&lt;/h3&gt;

&lt;p&gt;Before the Writer runs, an approval gate creates a Notion database entry with status "Pending" and a link to the synthesis. The system polls with exponential backoff until you change the status to "Approved" or "Rejected" in Notion. This is genuine human-in-the-loop control — not a dialog box, but a Notion workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Outputs (final deliverables)
&lt;/h3&gt;

&lt;p&gt;The Writer converts its markdown output into native Notion blocks — headings, bullet lists, numbered lists, tables, code blocks, bold/italic, and links — using a custom &lt;code&gt;markdownToNotionBlocks&lt;/code&gt; converter. The result is a proper Notion page, not a pasted text blob.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The most surprising thing about this project was how naturally Notion works as an agent coordination layer. Databases become task queues. Pages become working memory. Relations become a knowledge graph. Status properties become approval gates. It's not a hack — it's genuinely the right tool for this.&lt;/p&gt;

&lt;p&gt;The human-in-the-loop approval gate is my favorite feature. Most agent systems are either fully autonomous or require you to babysit a terminal. With Cortex, you get a Notion notification, review the synthesis at your own pace, and approve when ready. The agents wait patiently.&lt;/p&gt;

&lt;p&gt;MIT licensed. PRs welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>notionchallenge</category>
      <category>devchallenge</category>
    </item>
  </channel>
</rss>
