<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Narendra Singh Shekhawat</title>
    <description>The latest articles on DEV Community by Narendra Singh Shekhawat (@nshekhawat).</description>
    <link>https://dev.to/nshekhawat</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F323507%2F32755718-6508-4fdc-89c5-6187aaef56cb.jpeg</url>
      <title>DEV Community: Narendra Singh Shekhawat</title>
      <link>https://dev.to/nshekhawat</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nshekhawat"/>
    <language>en</language>
    <item>
      <title>wfguard: a GitHub Actions supply-chain auditor</title>
      <dc:creator>Narendra Singh Shekhawat</dc:creator>
      <pubDate>Sun, 24 May 2026 13:09:18 +0000</pubDate>
      <link>https://dev.to/nshekhawat/wfguard-a-github-actions-supply-chain-auditor-3obe</link>
      <guid>https://dev.to/nshekhawat/wfguard-a-github-actions-supply-chain-auditor-3obe</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;wfguard is a Go CLI that audits GitHub Actions workflows for supply-chain attack patterns. It combines two engines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A deterministic Go pass that catches what regex catches: pwn-request triggers, mutable-tag pins from unverified publishers, missing &lt;code&gt;permissions:&lt;/code&gt; blocks, references to known-compromised actions. I seeded the &lt;code&gt;tj-actions/changed-files&lt;/code&gt; 2025 incident in the known-bad list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Gemma 4 agent loop that catches what regex can't: cross-step taint flow, action-source review, severity calls that depend on the workflow's trigger surface.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tool outputs a Markdown report, SARIF 2.1.0 for GitHub's code-scanning UI, and with &lt;code&gt;--harden&lt;/code&gt;, a unified diff. You run &lt;code&gt;wfguard scan ./repo --harden&lt;/code&gt;, then apply the patch with &lt;code&gt;git apply report.patch&lt;/code&gt;. Gemma 4 produces the corrected file. The tool validates it parses as YAML before including it in the patch. wfguard puts only changed files in the diff.&lt;/p&gt;

&lt;p&gt;A real CI workflow in the wild had this snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;export VERSION="${GITHUB_REF#refs/tags/v}"&lt;/span&gt;
    &lt;span class="s"&gt;sed -i "s/version=.*/version=\"${VERSION}\",/" setup.py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;$GITHUB_REF&lt;/code&gt; for a release trigger is &lt;code&gt;refs/tags/&amp;lt;tagname&amp;gt;&lt;/code&gt;. Git tag names accept most ASCII characters. Push a tag named &lt;code&gt;v"; rm -rf / #&lt;/code&gt; and the &lt;code&gt;sed&lt;/code&gt; runs that as a shell payload. My static rules didn't catch it: no &lt;code&gt;${{ ... }}&lt;/code&gt; interpolation to anchor on, just a runner env var threading through string interpolation into bash. Static analysis can't see this. The agent can.&lt;/p&gt;

&lt;p&gt;Three design choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;submit_finding&lt;/code&gt; is the agent's only output channel.&lt;/strong&gt; The tool ignores anything the model says outside a tool call. Structured output without strict JSON mode. The model can't hallucinate findings outside the schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default &lt;code&gt;--min-severity high&lt;/code&gt;.&lt;/strong&gt; wfguard computes hygiene findings (unpinned &lt;code&gt;actions/*&lt;/code&gt; tags, missing &lt;code&gt;permissions:&lt;/code&gt; blocks) but hides them by default. Most workflow scanners drown users in these. The LLM agent uses them as context; the human sees them only with &lt;code&gt;--min-severity low&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;UnpinnedRule&lt;/code&gt; is narrow.&lt;/strong&gt; It fires for unverified publishers or actions with a known compromise history. &lt;code&gt;actions/checkout@v4&lt;/code&gt; is fine. &lt;code&gt;random-vendor/some-tool@v1&lt;/code&gt; is not. The OpenSSF "pin everything to a SHA" advice is correct in theory but produces ~80% noise on real repos.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/nshekhawat/wfguard" rel="noopener noreferrer"&gt;https://github.com/nshekhawat/wfguard&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Primary model: Gemma 4 31B Dense (&lt;code&gt;gemma-4-31b-it&lt;/code&gt;).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I picked 31B for three model properties and one problem property:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;256K context.&lt;/strong&gt; Workflow YAML is small. Referenced action source code is large. Calling &lt;code&gt;get_action_source('actions/checkout@v4')&lt;/code&gt; returns &lt;code&gt;action.yml&lt;/code&gt; plus &lt;code&gt;dist/index.js&lt;/code&gt;, which can be hundreds of KB of bundled JavaScript. A 32K model would be useless. An 8K model would force me to pre-summarize, which defeats the point of letting the model read.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strongest dense reasoning in the Gemma 4 family.&lt;/strong&gt; Multi-hop taint analysis is where smaller models stop being useful. The &lt;code&gt;$GITHUB_REF → sed&lt;/code&gt; finding requires reasoning across three steps: the tag arrives in an env var, bash interpolates it into a &lt;code&gt;sed&lt;/code&gt; argument, the shell runs it. Each step is trivial in isolation. The chain is where 31B matters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native function calling.&lt;/strong&gt; wfguard's agent has seven tools: &lt;code&gt;list_workflows&lt;/code&gt;, &lt;code&gt;get_workflow&lt;/code&gt;, &lt;code&gt;get_action_source&lt;/code&gt;, &lt;code&gt;resolve_reference&lt;/code&gt;, &lt;code&gt;lookup_advisories&lt;/code&gt;, &lt;code&gt;trace_expression_flow&lt;/code&gt;, &lt;code&gt;submit_finding&lt;/code&gt;. The model picks tools, my Go dispatcher executes them, results go back as &lt;code&gt;function_response&lt;/code&gt; parts. Strict JSON-mode workarounds would have been more code to get right.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Comparison I ran: Gemma 4 E4B (&lt;code&gt;gemma-4-e4b-it-mlx&lt;/code&gt;) via LM Studio.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I built wfguard backend-neutral. A &lt;code&gt;Generator&lt;/code&gt; interface has two implementations: one for the Gemini API, one for any OpenAI-compatible server (LM Studio, vLLM, llama.cpp, Ollama, Unsloth). Everything stays the same except the wire format. Switching models is one &lt;code&gt;--backend&lt;/code&gt; flag.&lt;/p&gt;

&lt;p&gt;E4B works. It calls tools and produces valid SARIF. Three weaknesses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decisiveness.&lt;/strong&gt; E4B keeps calling tools past the point where it should stop. With &lt;code&gt;--max-steps 5&lt;/code&gt;, it hits the limit. 31B returns a clean no-tool-call turn in 4-7 steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardening fidelity.&lt;/strong&gt; E4B's hardening output drops unrelated comments. The security fix is correct, but the user loses context. 31B keeps the comments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-step reasoning.&lt;/strong&gt; The &lt;code&gt;$GITHUB_REF → sed&lt;/code&gt; finding came from 31B. E4B didn't surface it on the same workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off works: E4B for a free local pass during development (single workflow in ~2 minutes on an M-series Mac, no API spend), 31B for production hardening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E2B&lt;/strong&gt;: not used. The smallest variant doesn't have the context for action-source reading and would force a redesign of the tool set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;26B-A4B MoE&lt;/strong&gt;: not benchmarked here. Natural follow-up: with ~4B active params it should land between E4B and 31B on cost and quality. One &lt;code&gt;--backend&lt;/code&gt; flag to compare.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hardening pass uses Gemma 4 in a different mode.&lt;/strong&gt; The audit loop is tool-calling. The hardener is codegen: "here's a workflow YAML and a list of confirmed findings; produce a corrected version, output only YAML, no fences, no prose." wfguard diffs the output against the original and emits a unified patch. 31B's code-generation strength matters most here. The model writes YAML that has to round-trip through &lt;code&gt;yaml.Unmarshal&lt;/code&gt; and &lt;code&gt;git apply&lt;/code&gt; without breaking. Most outputs do.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
