<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: FeDEX</title>
    <description>The latest articles on DEV Community by FeDEX (@fedex).</description>
    <link>https://dev.to/fedex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3859682%2F9f6e427e-f1e9-4aeb-a32c-b4a7ae83d7ff.jpg</url>
      <title>DEV Community: FeDEX</title>
      <link>https://dev.to/fedex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fedex"/>
    <language>en</language>
    <item>
      <title>"Please perform a comprehensive security audit" - and why it doesn't work</title>
      <dc:creator>FeDEX</dc:creator>
      <pubDate>Fri, 03 Apr 2026 15:33:12 +0000</pubDate>
      <link>https://dev.to/fedex/please-perform-a-comprehensive-security-audit-and-why-it-doesnt-work-28bh</link>
      <guid>https://dev.to/fedex/please-perform-a-comprehensive-security-audit-and-why-it-doesnt-work-28bh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hi there! We are &lt;a href="https://aisafe.io" rel="noopener noreferrer"&gt;AISafe Labs&lt;/a&gt;, a crew of security researchers with only one goal: making security accessible to everyone!&lt;/p&gt;

&lt;p&gt;We've all witnessed the recent rise of LLMs that has been nothing short of revolutionary. We're reaching a point where the power of a single prompt surpasses any expectation. From &lt;a href="https://x.com/emilylambert/status/2020288299345330512" rel="noopener noreferrer"&gt;generating entire codebases&lt;/a&gt; to &lt;a href="https://x.com/MattEpstein16/status/2026674945145577588" rel="noopener noreferrer"&gt;replacing entire workflows&lt;/a&gt;, the bar keeps getting raised. Building this much trust in the power of a prompt naturally raises the question: &lt;strong&gt;"Can you secure your application with a &lt;em&gt;single prompt&lt;/em&gt;?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this article, we put that question to the test. We will assess how far a single prompt can take you when it comes to securing a real application, then compare that to a more targeted prompt armed with domain knowledge and specific instructions, and finally, stack both against AISafe, our specialized source code auditing platform. Same codebase, same goal, will the difference be significant or negligible?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Findings&lt;/th&gt;
&lt;th&gt;Vulnerabilities with direct impact&lt;/th&gt;
&lt;th&gt;Weaknesseses without direct impact&lt;/th&gt;
&lt;th&gt;Accepted risk&lt;/th&gt;
&lt;th&gt;False positives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code w Skills&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code /security-review&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Security&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AISafe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's begin! 🍿&lt;/p&gt;

&lt;h2&gt;
  
  
  The Target
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Forceu/Gokapi" rel="noopener noreferrer"&gt;Gokapi&lt;/a&gt; is an open source file hosting application with around &lt;code&gt;2.7k&lt;/code&gt; stars on GitHub, &lt;code&gt;1.5M&lt;/code&gt; Docker pulls, and roughly &lt;code&gt;35k&lt;/code&gt; lines of code (&lt;code&gt;Go &amp;amp; JavaScript&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F152t4gkwcp0ql66bx62z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F152t4gkwcp0ql66bx62z.png" width="800" height="472"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Preview of Gokapi app&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It sits in a sweet spot for our case study: large enough to present a complex threat model and with modern web app features such as file handling, API keys, and user permissions with different access levels, yet small enough to audit in a reasonable time.&lt;/p&gt;

&lt;p&gt;All experiments were run against commit &lt;code&gt;a7c4273b819b8a48f85b866e1803632c089f60a2&lt;/code&gt;, pushed on March 1st, 2026.&lt;/p&gt;
&lt;h2&gt;
  
  
  Anthropic
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;Our first candidate is &lt;a href="https://claude.com/product/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, running &lt;strong&gt;Opus 4.6&lt;/strong&gt; with high effort. We are going to start with the most simple setup possible: &lt;strong&gt;a single prompt&lt;/strong&gt;, just the kind that it's being &lt;a href="https://x.com/jddeep003/status/2031775353019236624" rel="noopener noreferrer"&gt;recommended on social media&lt;/a&gt; every other day.&lt;/p&gt;

&lt;p&gt;To begin, we drop Claude Code into the Gokapi folder, and type &lt;code&gt;"Please perform a security audit of this repository"&lt;/code&gt;. It will spin up a few agents and start looking at different parts of the codebase:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d0qhayu4mdvpbkzad6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d0qhayu4mdvpbkzad6f.png" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few minutes later, we are already looking at 24 findings 👀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5juvqtexxkvt4oqkqr1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5juvqtexxkvt4oqkqr1j.png" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;4 &lt;strong&gt;criticals&lt;/strong&gt;? Let's break them down!&lt;/p&gt;

&lt;p&gt;The first two are "SHA-1 Password Hashing" and "Zero-Filled Nonces in AES-GCM Encryption". &lt;/p&gt;

&lt;p&gt;While Gokapi does use &lt;code&gt;SHA-1&lt;/code&gt; for password hashing, and sure, that's definitely not a great idea, still, it is a &lt;em&gt;weakness&lt;/em&gt; rather than an exploitable vulnerability. A Low severity finding at best. &lt;/p&gt;

&lt;p&gt;The "zero-filled nonce" finding sounds scary until you take a closer look and realize that the nonce gets handed off to the &lt;code&gt;sio-go&lt;/code&gt; library, which appends its own nonce on top, and the server also uses per-file keys. Again, the real-world impact is practically &lt;em&gt;none&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The other two &lt;strong&gt;critical&lt;/strong&gt; findings flag "XSS in templates" and an "open redirect". Unfortunately both are &lt;em&gt;false positives&lt;/em&gt;: the templates rely on Go's &lt;code&gt;html/template&lt;/code&gt; package which handles sanitization out of the box, and on the other hand the redirect URL is only ever set by the instance admin. &lt;/p&gt;

&lt;p&gt;Four critical vulnerabilities reported, but no real security issues were identified.&lt;/p&gt;

&lt;p&gt;Moving on to the &lt;strong&gt;high&lt;/strong&gt; impact findings. Three of the seven are about missing cookie attributes and security headers like &lt;code&gt;HttpOnly&lt;/code&gt;, &lt;code&gt;Secure&lt;/code&gt;, &lt;code&gt;SameSite&lt;/code&gt;, &lt;code&gt;Content-Security-Policy&lt;/code&gt;, and &lt;code&gt;X-Frame-Options&lt;/code&gt;. Not a bad report, but again, not really vulnerabilities either. Another finding flags missing CSRF protection, which has had generally pretty limited impact ever since &lt;a href="https://web.dev/articles/samesite-cookies-explained#default-behavior-changes" rel="noopener noreferrer"&gt;SameSite=Lax became the default&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, there is one finding that points us to something actually vulnerable: a "Header Injection" issue, where &lt;code&gt;file.Name&lt;/code&gt; is potentially attacker-controlled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Disposition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"attachment; filename=""+file.Name+"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the impact is overstated. Go validates headers and blocks CRLF injection, so the possible corruption is limited only to the header. &lt;/p&gt;

&lt;p&gt;Lastly, the final two &lt;strong&gt;high&lt;/strong&gt; impact findings describe header-based auth, which is intentional by design, and a missing TLS configuration that turns out to be another false positive.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;mediums&lt;/strong&gt; are all theoretical with no meaningful real-world impact.&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;At first glance, the results might look like a success. Twenty-four findings sound impressive, and the report looks thorough. But, once you dig in you will realize that while some of the findings are fix-worthy, most of them have little to no real impact. &lt;/p&gt;

&lt;p&gt;We've tried several other prompt variations, but the results were largely the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code \w bug-class specific prompts
&lt;/h3&gt;

&lt;p&gt;If you blame the generic prompt for the previous results, don't worry because in this experiment we will up our prompt game and try to hunt more specific: &lt;code&gt;"Look for business logic bugs and permission problems"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm682zk660tx0de9tbz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm682zk660tx0de9tbz7.png" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time we only got 10 total findings, and things are already looking more promising. The single &lt;strong&gt;critical&lt;/strong&gt; finding points to a function that runs every hour to clean up file requests belonging to deleted users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;cleanInvalidFileRequests&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;getUserMap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fileRequest&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetAllFileRequests&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fileRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserId&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetAllMetadata&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UploadRequestId&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;fileRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="n"&gt;DeleteFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DeleteFileRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is an obvious typo here, but a pretty painful one. If it encounters a file request whose owner no longer exists, it will &lt;em&gt;delete every single file from the application&lt;/em&gt;. That said, the trigger condition is narrow just as the comment sitting right above the function explicits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// cleanInvalidFileRequests removes file requests and the associated files from the database if their associated owner is not a valid user.&lt;/span&gt;
&lt;span class="c"&gt;// Normally this should not be a problem, but if a user was manually deleted from the database,&lt;/span&gt;
&lt;span class="c"&gt;// this could cause issues otherwise.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This only fires when a user is deleted directly from the database, not through the application's logic. Still this is a real bug, just not as alarming as the title sounds. Medium impact would be a more appropriate classification.&lt;/p&gt;

&lt;p&gt;Two other genuine findings also made it through:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A user without &lt;code&gt;UserPermGuestUploads&lt;/code&gt; permission can generate a temporary API key with &lt;code&gt;ApiPermManageFileRequests&lt;/code&gt; and use it to create File Requests they should never have been able to create. A solid finding that breaks a real permission assumption.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A copy-paste mistake in &lt;code&gt;apiUploadRequestListSingle&lt;/code&gt; that checks &lt;code&gt;UserPermDeleteOtherUploads&lt;/code&gt; on a function meant for listing, not deleting:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;apiUploadRequestListSingle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;requestParser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;paramURequestListSingle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"invalid parameter passed"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;uploadRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;filerequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusNotFound&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errorcodes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NotFound&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"FileRequest does not exist with the given ID"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;uploadRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserId&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Id&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HasPermission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserPermDeleteOtherUploads&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errorcodes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NoPermission&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"No permission to delete this upload request"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Marshal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uploadRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;helper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A valid bug, though with limited security implications in practice.&lt;/p&gt;

&lt;p&gt;The rest of the findings circle back to familiar territory: file passwords stored as hashed values in a short-lived browser cookie (which appears intentional), a missing &lt;code&gt;HttpOnly&lt;/code&gt; flag, and a non-constant-time comparison.&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;This technique shows improvements over the previous one, but not all bug-class specific prompts are equal, when we try &lt;code&gt;“please report all XSS vulnerabilities in this app”&lt;/code&gt; the results are much worse, out of 5 reported vulnerabilies, 4 are false-positives and the remaining one is an admin-triggered self-XSS in expired hotlinks, which turned out to be false positive. However, it actually missed a stored XSS vulnerability that we will later discuss when a different tool discovers it.   &lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code with Skills
&lt;/h3&gt;

&lt;p&gt;Next, what if instead of just prompting more specific, we actually empower Claude with top-tier infosec knowledge? Introducing Claude Skills - specialized, prompt-based instructions and tools that extend the LLM's capabilities. We will explore pairing Claude Code with &lt;a href="https://github.com/trailofbits/skills" rel="noopener noreferrer"&gt;Trail of Bits's open source security Skills&lt;/a&gt; and see what additional value it brings.&lt;/p&gt;

&lt;p&gt;We enable a handful of skills, relevant to the target we are auditing: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;audit-context-building&lt;/code&gt;, &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fp-check&lt;/code&gt;, &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;insecure-defaults&lt;/code&gt;, &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sharp-edges&lt;/code&gt;, &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;static-analysis&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;testing-handbook-skills&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;variant-analysis&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then ask for a &lt;code&gt;"comprehensive security audit"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmkepmue56wn4qchbqpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmkepmue56wn4qchbqpf.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude picked up the &lt;code&gt;audit-context-building:audit-context&lt;/code&gt; skill and got to work. The results, though, tell a familiar story. The one improvement is that the &lt;code&gt;fp-check&lt;/code&gt; skill lets us ask Claude to filter out its own false positives, trimming the report from 21 findings down to 7.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50zfmjgzcjbnnz1roryx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50zfmjgzcjbnnz1roryx.png" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of those 7, the only new addition is a world-readable config file. Everything else we had already seen before.&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;While the vulnerability discovery process itself did not improve much, the benefit of having a false positive detection skill is noticeable: the report took less time to triage and the findings that made it through are far more actionable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code /security-review
&lt;/h3&gt;

&lt;p&gt;We cannot wrap up our Claude testing without trying what is arguably their most established security feature.  The &lt;code&gt;/security-review&lt;/code&gt; command is designed to review pending changes in the current branch, but with a bit of Git magic we can make the entire repository appear as pending changes. Close enough.&lt;/p&gt;

&lt;p&gt;The results are as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0snvqyhms9mwp10s42k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0snvqyhms9mwp10s42k.png" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The only finding that made it through is the SHA-1 password hashing. No trace of the business logic bugs we uncovered earlier.&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;Quite a disappointing result, especially given that &lt;code&gt;/security-review&lt;/code&gt; is probably &lt;a href="https://x.com/_tenZdhon_/status/1953222429951803482" rel="noopener noreferrer"&gt;used a lot by developers&lt;/a&gt; who want an accessible AI-assisted security audit without having to think about prompts at all. It did not just fail to find anything new, it actually fell short of what raw Claude Code had already caught with a single generic prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code Security
&lt;/h3&gt;

&lt;p&gt;We did not manage to get access to &lt;a href="https://x.com/claudeai/status/2024907535145468326" rel="noopener noreferrer"&gt;Claude Code Security&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Codex
&lt;/h3&gt;

&lt;p&gt;Of course, we also had to put Codex through the same experiment that started this whole journey: &lt;strong&gt;a single prompt&lt;/strong&gt;, asking it to &lt;code&gt;"perform a comprehensive security audit"&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Compared to Claude, Codex is noticeably more restrained with its findings, but also more systematic in its approach, starting by mapping out the security boundaries, endpoints, and trust model before diving in.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23mheiekpmtm6sb6zuhb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23mheiekpmtm6sb6zuhb.png" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It came back with 3 findings: &lt;strong&gt;1 high&lt;/strong&gt; and &lt;strong&gt;2 mediums&lt;/strong&gt;. &lt;br&gt;
The high is the &lt;code&gt;UserPermGuestUploads&lt;/code&gt; bypass via an API key with &lt;code&gt;ApiPermManageFileRequests&lt;/code&gt;, which we already discussed in the Anthropic section. &lt;/p&gt;

&lt;p&gt;The two mediums are ZIP downloads preserving unsafe filenames and allowing archive path traversal (not a vulnerability in Gokapi's context), and the familiar missing security cookie flags. &lt;/p&gt;
&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;On the bright side, fewer false positives than Claude, but unfortunately fewer findings overall too. Not quite what you would hope for when asking for a "comprehensive security audit".&lt;/p&gt;
&lt;h3&gt;
  
  
  Codex Security
&lt;/h3&gt;

&lt;p&gt;While we did not manage to get our hands on Claude Code Security, we did get access to its OpenAI counterpart. &lt;a href="https://x.com/OpenAI/status/2029985250512920743" rel="noopener noreferrer"&gt;Codex Security&lt;/a&gt; is OpenAI's application security agent. Fresh out of research preview and already generating buzz, it felt like a natural candidate to explore. Codex Security officially only supports scanning commits from the past two months, but with a few Git tricks we can make it scan the full Gokapi repository.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bml5xxkqe8wa6x2jr0q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bml5xxkqe8wa6x2jr0q.png" width="800" height="650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result surprised us. Only a single finding: the &lt;code&gt;Content-Disposition&lt;/code&gt; header injection we already flagged in the Claude section, where &lt;code&gt;file.Name&lt;/code&gt; is potentially attacker-controlled. As we established earlier, CRLF injection is not actually possible here thanks to Go's header validation. So we are looking at one finding, and it has limited impact.&lt;/p&gt;

&lt;p&gt;We thought the first result was a fluke, so we ran it again. This time things looked very different:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsw3p7pf49fiyg1onkmck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsw3p7pf49fiyg1onkmck.png" width="367" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;45 findings. That is a lot to unpack. Let's see how many of them actually hold up.&lt;/p&gt;

&lt;p&gt;The first &lt;strong&gt;high&lt;/strong&gt; targets the OpenID Connect SSO integration. Gokapi lets you restrict access to specific OIDC "Authorised groups", with wildcard support:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch476pau2wrp5mconb28.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fch476pau2wrp5mconb28.png" width="800" height="137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The function that checks a user's group against the configured allowlist looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;matchesWithWildcard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;components&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;components&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// if len is 1, there are no *'s, return exact match pattern&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;regexp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MatchString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"^"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;"$"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Builder&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;literal&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;components&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Replace * with .*&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WriteString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".*"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c"&gt;// Quote any regular expression meta characters in the&lt;/span&gt;
        &lt;span class="c"&gt;// literal text.&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WriteString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regexp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuoteMeta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;regexp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MatchString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"^"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="s"&gt;"$"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Noticed the bug? The default branch (when wildcards are present) correctly calls &lt;code&gt;regexp.QuoteMeta&lt;/code&gt; on the input, but the early exit branch (no wildcards) skips it entirely. This means if the configured allowed group is &lt;code&gt;john.doe&lt;/code&gt;, someone with the group &lt;code&gt;johnXdoe&lt;/code&gt; will sail right through the allowlist. A genuine vulnerability, though we would call it a medium in practice given that an attacker would need some degree of control over SSO identities to exploit it (the Gokapi's maintainer reached a similar conclusion and decided not to issue a CVE for it).&lt;/p&gt;

&lt;p&gt;The second &lt;strong&gt;high&lt;/strong&gt; impact vulnerability, "External StreamSaver MITM", flags a situation where Gokapi is running over plain HTTP. Since service workers cannot be registered over insecure connections, the app falls back to a proxy hosted on the creator's GitHub Pages. The man-in-the-middle potential is real, but this looks like a deliberate tradeoff: the service worker exists specifically to avoid buffering large encrypted files in memory (and frankly, if you are running a production file service over HTTP you have bigger problems - why don't just MITM the connection itself?).&lt;/p&gt;

&lt;p&gt;The other two &lt;strong&gt;highs&lt;/strong&gt; are unauthenticated setup and header-based auth, both of which we have already seen and both of which are intentional behaviours.&lt;/p&gt;

&lt;p&gt;Moving on to the 8 &lt;strong&gt;mediums&lt;/strong&gt;, we were able to confirm 2 as valid, and they both hit the same feature: hotlinking. You cannot create hotlinks for HTML pages, but what if you create a hotlink for a PNG and then swap it out for an HTML file? The check gets bypassed. The same goes for SVGs. Both result in a stored XSS reachable by any authenticated user. Nice finding.&lt;/p&gt;

&lt;p&gt;The remaining mediums fall into the "weakness or false positive" bucket given Gokapi's threat model.&lt;/p&gt;

&lt;p&gt;Most of the &lt;strong&gt;lows&lt;/strong&gt; follow the same pattern, with a few exceptions worth calling out: the "Cleanup deletes all files" bug we already know well by now, the SVG hotlink XSS surfacing again, and a log injection vector via username.&lt;/p&gt;

&lt;p&gt;So out of 45 findings, only a handful have real, direct impact. Notably, the business logic bug around &lt;code&gt;UserPermGuestUploads&lt;/code&gt; vs &lt;code&gt;ApiPermManageFileRequests&lt;/code&gt;, which Codex itself found in the previous run, is nowhere to be seen this time around. Even the rest of the findings are worth knowing about and fixing, but they do not pose immediate risk.&lt;/p&gt;

&lt;h4&gt;
  
  
  Summary:
&lt;/h4&gt;

&lt;p&gt;The signal to noise ratio is rough, but the signal itself is worth noting. The stored XSS that every other tool had missed up to this point is a genuinely good find, and the SSO bug is a nice catch too. Less impressive is the tendency to report the same vulnerability multiple times from different angles rather than consolidating findings around a root cause, which makes the report harder to navigate than it needs to be.&lt;/p&gt;

&lt;p&gt;It also has to be said that Codex Security is still clearly early: the UI has quite a few rough UI/UX bugs that make the experience more painful than it should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  AISafe Labs
&lt;/h2&gt;

&lt;p&gt;Thus far we have explored individual LLMs and how well they perform when given increasingly richer context and more detailed instructions. From a single generic prompt, to bug-class specific ones, to a full suite of infosec skills bolted on top. Now we have reached the final boss, how much extra value will the custom orchestration, knowledge base and tools built into AISafe bring?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpwk18t805tqs7ebf1tv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpwk18t805tqs7ebf1tv.png" width="800" height="386"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Preview of AISafe app&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The audit returned 2 &lt;strong&gt;high&lt;/strong&gt; impact vulnerabilities.&lt;/p&gt;

&lt;p&gt;The first one is titled "Data Leak in Upload Status Stream". Gokapi uses &lt;a href="https://en.wikipedia.org/wiki/Server-sent_events" rel="noopener noreferrer"&gt;HTTP server-sent events&lt;/a&gt; to stream upload progress to the user's browser. The problem is that authorization on this endpoint is completely broken: any authenticated user receives events about every file upload happening across the entire application. That includes chunk IDs, which could allow an attacker to interfere with another user's active upload, and file IDs, which would let them download files they were never meant to see. This is a serious finding, and it is genuinely surprising that &lt;strong&gt;none of the other tools in this experiment spotted it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The second high is the unauthenticated setup wizard, which as we have established is an accepted risk for Gokapi.&lt;/p&gt;

&lt;p&gt;14 &lt;strong&gt;mediums&lt;/strong&gt;. Let's go through them, because there is a mix of familiar faces and some interesting new ones.&lt;/p&gt;

&lt;p&gt;Two we have already seen: the &lt;code&gt;UserPermGuestUploads&lt;/code&gt; vs &lt;code&gt;ApiPermManageFileRequests&lt;/code&gt; privilege escalation, and the stored XSS in hotlinks.&lt;/p&gt;

&lt;p&gt;The first novel finding is a nice catch around API key demotion. When a user gets demoted, previously issued API keys do not automatically lose the privileges they should. In practice, a stale key can retain &lt;code&gt;ApiPermManageFileRequests&lt;/code&gt; and keep managing upload requests long after the account behind it no longer has that capability.&lt;/p&gt;

&lt;p&gt;The second is a privilege escalation in the file replacement feature. Cross-user replacement is guarded too loosely: if you have permission to list other users' uploads, you can point &lt;code&gt;idNewContent&lt;/code&gt; at someone else's file during a replacement and, with &lt;code&gt;deleteNewFile=true&lt;/code&gt;, have the storage layer silently delete it without any stronger authorization check. The victim's file is gone.&lt;/p&gt;

&lt;p&gt;The third new finding targets chunk uploads. The size of the current HTTP request body is checked, but the total declared file size is taken from attacker-controlled metadata and then used during allocation and completion. By splitting a file into chunks and lying about the right fields in the metadata, an attacker can smuggle a file that exceeds the configured upload limit. A clean quota bypass.&lt;/p&gt;

&lt;p&gt;That is 2 familiar findings and 3 new ones. What about the remaining 9?&lt;/p&gt;

&lt;p&gt;Two are CSRF reports, one for the login page and one for the password reset page, but this time with actual impact attached. AISafe noticed, that the password reset functionality uses &lt;code&gt;r.Form.Get("newpw")&lt;/code&gt;, which accepts both POST and GET parameters with no method check, effectively bypassing the &lt;code&gt;SameSite=Lax&lt;/code&gt; default protection that would otherwise limit CSRF exposure.&lt;/p&gt;

&lt;p&gt;Two more are denial of service findings: one in the E2E Metadata Parser where the &lt;code&gt;content&lt;/code&gt; parameter is stored in memory twice, creating an amplification factor; and one in the Upload Status SSE Endpoint, which we already know but have determined has no meaningful real-world impact.&lt;/p&gt;

&lt;p&gt;The remaining five are a mixed bag:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSRF in the setup page: accepted risk&lt;/li&gt;
&lt;li&gt;Authorization "bypass" via file add API: determined by the maintainer to have no security implications&lt;/li&gt;
&lt;li&gt;Race condition in chunk file upload: theoretical due to a very narrow race window&lt;/li&gt;
&lt;li&gt;Header spoofing in header auth mode: already discussed, intentional behaviour&lt;/li&gt;
&lt;li&gt;Sessions not invalidated after password change: a valid weakness worth fixing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The remaining 4 &lt;strong&gt;lows&lt;/strong&gt; are mostly theoretical or intentional behaviour.    &lt;/p&gt;

&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;AISafe was able to identify the SSE broadcast vulnerability, 1 DoS, and 3 unique logic bugs that resulted in CVEs, that the other tools haven’t discovered. Why? Well, looks like custom orchestration, knowledge base, guided threat modeling, and other secret sauces making up AISafe have big effect.&lt;/p&gt;

&lt;p&gt;On the other side, the valid findings that other tools caught and AISafe did not, such as the file list vs. replace permission mismatch, the OIDC bug, and the file request cleanup typo, and some of the low severity findings. are worth putting in context: only the first and second have some security impact, even with the second one having serious preconditions necessary for exploitation. The low impact findings are better described as security improvements than actual vulnerabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Findings&lt;/th&gt;
&lt;th&gt;Vulnerabilities with direct impact&lt;/th&gt;
&lt;th&gt;Weaknesseses without direct impact&lt;/th&gt;
&lt;th&gt;Accepted risk&lt;/th&gt;
&lt;th&gt;False positives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code w Skills&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code /security-review&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Security&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AISafe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So, back to our initial question: &lt;strong&gt;can a single prompt catch all the vulnerabilities in your application?&lt;/strong&gt; Based on what we have seen today, &lt;strong&gt;no&lt;/strong&gt;, and the gap between a raw LLM and a purpose-built security tool is not just about how many tokens you throw at the problem, but it comes down to how you spend those tokens. Flooding a report with low-impact findings and suggestions is easy; finding the complex logic issues is not.&lt;/p&gt;

&lt;p&gt;At the end of the day, the results will always tell their own story. Security is not something you can afford to overlook. Fixing 45 reported issues might feel like a great achievement, but quality over quantity. That's why choosing the right tool matters just as much as choosing to act in the first place. And we hope that this case study has given you a clearer picture of where things stand today.&lt;/p&gt;

&lt;p&gt;Quality over quantity is not just a nice principle in security research, it is the whole point why we strive to make quality accessible to everyone!&lt;/p&gt;

&lt;p&gt;Check out our &lt;a href="https://aisafe.io" rel="noopener noreferrer"&gt;Code Audit service&lt;/a&gt; or &lt;a href="mailto:contact@aisafe.io"&gt;get in touch&lt;/a&gt; directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  CVEs &amp;amp; advisories
&lt;/h3&gt;

&lt;p&gt;We'd also like to thank Gokapi's maintainer &lt;a href="https://github.com/Forceu" rel="noopener noreferrer"&gt;Forceu&lt;/a&gt; for a great collaboration on resolving and classifying the issues!&lt;/p&gt;

&lt;p&gt;7 CVEs were assigned in total for the issues found by AISafe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-3c22-5j5m-4jq7" rel="noopener noreferrer"&gt;CVE-2026-28683&lt;/a&gt;: Stored XSS in SVG Hotlinks&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-c36c-7pc2-f2ph" rel="noopener noreferrer"&gt;CVE-2026-28682&lt;/a&gt;: Data Leak in Upload Status Stream&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-q658-hfpg-35qc" rel="noopener noreferrer"&gt;CVE-2026-29061&lt;/a&gt;: Privilege escalation via incomplete API-key permission revocation on user rank demotion&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-hcff-qv74-7hr4" rel="noopener noreferrer"&gt;CVE-2026-29084&lt;/a&gt;: CSRF in Login Endpoint&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-j6jp-78w8-34x6" rel="noopener noreferrer"&gt;CVE-2026-30943&lt;/a&gt;: Privilege Escalation in File Replace&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-qwc6-vc2v-2ggj" rel="noopener noreferrer"&gt;CVE-2026-30955&lt;/a&gt;: DoS in API endpoint&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Forceu/Gokapi/security/advisories/GHSA-45vh-rpc8-hxpp" rel="noopener noreferrer"&gt;CVE-2026-30961&lt;/a&gt;: File Request MaxSize Limit Bypassed in File Request&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
