<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nikolaos Petridis</title>
    <description>The latest articles on DEV Community by Nikolaos Petridis (@nikolaospetridhs).</description>
    <link>https://dev.to/nikolaospetridhs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3934812%2Fd42e0c46-d9a3-4612-bd54-372c87152c4f.jpg</url>
      <title>DEV Community: Nikolaos Petridis</title>
      <link>https://dev.to/nikolaospetridhs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nikolaospetridhs"/>
    <language>en</language>
    <item>
      <title>I built an LLM-powered compliance scanner that points at the actual line of code</title>
      <dc:creator>Nikolaos Petridis</dc:creator>
      <pubDate>Sat, 16 May 2026 16:02:43 +0000</pubDate>
      <link>https://dev.to/nikolaospetridhs/i-built-an-llm-powered-compliance-scanner-that-points-at-the-actual-line-of-code-5d7p</link>
      <guid>https://dev.to/nikolaospetridhs/i-built-an-llm-powered-compliance-scanner-that-points-at-the-actual-line-of-code-5d7p</guid>
      <description>&lt;p&gt;A few weeks ago I went down a rabbit hole. I'd been reading about how every SaaS company eventually has to deal with GDPR / SOC 2 / HIPAA, and how the existing tooling space basically goes like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Do you have a password policy document?"&lt;br&gt;
"Yes."&lt;br&gt;
"Great, you're compliant."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That checks the &lt;em&gt;policy&lt;/em&gt;. It doesn't check whether your login route actually stores passwords with MD5. Which felt like… kind of the wrong layer to look at?&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;&lt;a href="https://github.com/Nikolaospet/themida" rel="noopener noreferrer"&gt;Themida&lt;/a&gt;&lt;/strong&gt; — an open-source compliance scanner that reads the actual code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Point it at a GitHub repo (or a local directory). It returns findings like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/auth/login.ts:41
CRITICAL  GDPR Art. 5(1)(f), 32(1)(a)
Password hashed with broken MD5
Maximum fine: €20M or 4% of revenue
Fix → bcrypt at cost 12+, or Argon2id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every finding has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact &lt;strong&gt;file&lt;/strong&gt; and &lt;strong&gt;line number&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;legal article&lt;/strong&gt; that the code violates&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;maximum fine&lt;/strong&gt; for context&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;code fix&lt;/strong&gt; you can paste straight into a PR&lt;/li&gt;
&lt;li&gt;A severity rating (CRITICAL / HIGH / MEDIUM / LOW)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can export the whole report as a PDF if you need to share it with someone non-technical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why an LLM and not regex?
&lt;/h2&gt;

&lt;p&gt;Honest answer: I tried regex first. It was awful.&lt;/p&gt;

&lt;p&gt;Pattern-matching catches the easy cases (&lt;code&gt;crypto.createHash('md5')&lt;/code&gt;) but produces a tidal wave of false positives on real codebases. MD5 used to hash a password is a crime. MD5 used as a cache key is fine. A regex can't tell the difference. An LLM can! If you give it enough context and the right prompt.&lt;/p&gt;

&lt;p&gt;The scanner runs three passes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Recon&lt;/strong&gt; : small/cheap LLM scans the file tree and picks ~15 suspect paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep scan&lt;/strong&gt; : bigger LLM reads those files line by line and produces findings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; : a final pass that drops hallucinated paths and findings already mitigated nearby&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Splitting it this way keeps the cost under control. A scan of a medium-sized repo costs around 5–20 cents depending on which models you pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider-agnostic
&lt;/h2&gt;

&lt;p&gt;You bring your own LLM key. The scanner ships adapters for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic (Claude)&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Anything that speaks OpenAI's Chat Completions API, OpenRouter, Groq, Together, vLLM, llama.cpp server, Ollama, LiteLLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick one with one env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLM_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openai
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://openrouter.ai/api/v1  &lt;span class="c"&gt;# optional, defaults to OpenAI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Self-hosters running a local model are first-class citizens, the cost tracker just records 0 cents and moves on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's done, what's not
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Done:&lt;/strong&gt; GDPR (5 rules), EU AI Act (5 rules), full scan pipeline, dashboard, real-time progress, PDF export, GitHub App integration, local CLI path (&lt;code&gt;pnpm dev:scan&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open issues, PRs welcome:&lt;/strong&gt; HIPAA, SOC 2, ISO 27001, OWASP Top 10, PCI DSS. Each rule pack is a single TypeScript file with a fairly readable schema, adding rules is the easiest way to contribute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On the roadmap:&lt;/strong&gt; Better local-LLM ergonomics, VS Code extension, eval suite for measuring rule accuracy as packs grow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Nikolaospet/themida
&lt;span class="nb"&gt;cd &lt;/span&gt;themida
pnpm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env.local
&lt;span class="c"&gt;# edit .env.local — pick a provider&lt;/span&gt;
pnpm dev:scan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's also a &lt;a href="https://github.com/Nikolaospet/themida" rel="noopener noreferrer"&gt;sample report on OWASP NodeGoat&lt;/a&gt; you can poke around without setting anything up.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is a personal project
&lt;/h2&gt;

&lt;p&gt;I want to be upfront about this: Themida isn't a company, it doesn't have funding, there's no "managed version" hiding behind the OSS face. It's a side project I'm building in the open because I find the problem interesting and I think devs are tired of compliance tools that don't read code.&lt;/p&gt;

&lt;p&gt;It's released under &lt;strong&gt;AGPL-3.0&lt;/strong&gt; , use it, modify it, run it for your team, fork it. The license just stops someone wrapping it in a SaaS and closing it back up.&lt;/p&gt;

&lt;p&gt;If you try it and something breaks, &lt;a href="https://github.com/Nikolaospet/themida/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt;. If you want to add a rule pack, an LLM adapter, or improve the eval suite, PRs are warmly welcomed, there's a &lt;a href="https://github.com/Nikolaospet/themida/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;CONTRIBUTING.md&lt;/a&gt; and a &lt;a href="https://github.com/Nikolaospet/themida/blob/main/.github/PULL_REQUEST_TEMPLATE.md" rel="noopener noreferrer"&gt;PR template&lt;/a&gt; ready to go.&lt;/p&gt;

&lt;p&gt;The repo is here: &lt;strong&gt;&lt;a href="https://github.com/Nikolaospet/themida" rel="noopener noreferrer"&gt;github.com/Nikolaospet/themida&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you build software in regulated industries, fintech, health, EU-anything, anywhere with AI Act exposure, I'd love to hear which rule packs would be most useful to ship next. Drop a comment.&lt;/p&gt;

&lt;p&gt;Thanks for reading 🙏&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>llm</category>
      <category>gdpr</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
