<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zhijie Wong</title>
    <description>The latest articles on DEV Community by Zhijie Wong (@zhijiewong).</description>
    <link>https://dev.to/zhijiewong</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3523370%2Fdbce5ad4-386a-479e-b1ac-e04498dcf897.jpeg</url>
      <title>DEV Community: Zhijie Wong</title>
      <link>https://dev.to/zhijiewong</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zhijiewong"/>
    <language>en</language>
    <item>
      <title>Why Pattern-Matching Scanners Miss Structural Bugs (and What I Built Instead)</title>
      <dc:creator>Zhijie Wong</dc:creator>
      <pubDate>Wed, 22 Apr 2026 10:48:03 +0000</pubDate>
      <link>https://dev.to/zhijiewong/why-pattern-matching-scanners-miss-structural-bugs-and-what-i-built-instead-34k9</link>
      <guid>https://dev.to/zhijiewong/why-pattern-matching-scanners-miss-structural-bugs-and-what-i-built-instead-34k9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom90i7tpmkt1kvbgjd2l.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom90i7tpmkt1kvbgjd2l.gif" alt=" " width="1278" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Pattern-matching scanners (Semgrep, Snyk, CodeQL) find what their rulebook encodes. Bugs that arrive as &lt;strong&gt;structural variants&lt;/strong&gt; — the sink is three calls away, the taint flows through an unusual shape, the CVE matters but the pattern doesn't match verbatim — slip through.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;mythos-agent&lt;/strong&gt;, an open-source AI code reviewer (MIT, TypeScript, &lt;a href="https://github.com/mythos-agent/mythos-agent" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;), to layer an LLM-based hypothesis stage on top of a traditional SAST foundation. This post is the technical writeup: what the pipeline looks like, what bug classes it surfaces that regex-only scanners miss, and where it still gets things wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx mythos-agent scan     &lt;span class="c"&gt;# pattern scan, no API key&lt;/span&gt;
npx mythos-agent hunt     &lt;span class="c"&gt;# full AI hypothesis + analyzer pipeline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  1. The problem: rulebook coverage vs. bug space
&lt;/h2&gt;

&lt;p&gt;A pattern scanner's ruleset is a finite set of &lt;code&gt;(sink, source, condition)&lt;/code&gt; triples. A security reviewer reading the same code carries a much larger implicit model — they notice that &lt;em&gt;this&lt;/em&gt; DB transaction reads and writes the same row without locking, that &lt;em&gt;this&lt;/em&gt; handler joins a user-supplied path against a config root without resolving symlinks, that &lt;em&gt;this&lt;/em&gt; &lt;code&gt;eval&lt;/code&gt; receives a value that's been stringified three functions upstream.&lt;/p&gt;

&lt;p&gt;Concrete example. Semgrep's default TypeScript ruleset catches this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/run&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// flagged: eval() on request input&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; catch this, even though it's the same bug:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;normalise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/run&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)();&lt;/span&gt;        &lt;span class="c1"&gt;// not flagged: sink ≠ eval, source is 2 calls away&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern rule is looking for &lt;code&gt;eval(&amp;lt;tainted&amp;gt;)&lt;/code&gt; literally. The real bug is &lt;code&gt;&amp;lt;any dynamic-code sink&amp;gt;(&amp;lt;tainted, possibly transformed, possibly renamed&amp;gt;)&lt;/code&gt;. You can write a Semgrep rule for this variant — but you can only write rules for variants you've already thought of. The space of "things that behave like eval" is open-ended.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The approach: hypothesis generation per function
&lt;/h2&gt;

&lt;p&gt;The mythos-agent pipeline is four stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recon → Hypothesize → Analyze → Exploit (optional)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interesting stage is &lt;strong&gt;Hypothesize&lt;/strong&gt;. For each function the parser extracts, a prompted LLM agent produces specific, code-grounded security claims — not CWE labels, but statements about &lt;em&gt;this&lt;/em&gt; code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This handler reads &lt;code&gt;req.query.path&lt;/code&gt; and passes it to &lt;code&gt;fs.readFileSync&lt;/code&gt; via &lt;code&gt;path.join(ROOT, userPath)&lt;/code&gt; without resolving symlinks. Potential path traversal if the filesystem contains symlinks pointing outside &lt;code&gt;ROOT&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;"This transaction reads &lt;code&gt;balance&lt;/code&gt; at line 42 and writes &lt;code&gt;balance - amount&lt;/code&gt; at line 51, without wrapping in &lt;code&gt;SELECT … FOR UPDATE&lt;/code&gt; or an equivalent lock. Potential TOCTOU race allowing double-spend under concurrent requests."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The hypotheses are inputs to the next stage, not outputs to the user.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The analyzer: grading hypotheses against the code
&lt;/h2&gt;

&lt;p&gt;A separate analyzer agent re-reads the function with the hypothesis attached and decides whether the claim actually holds given the control flow, input reachability, and sink characteristics. Findings get a confidence score in &lt;code&gt;[0, 1]&lt;/code&gt;; &lt;code&gt;--severity high&lt;/code&gt; only surfaces results above a threshold.&lt;/p&gt;

&lt;p&gt;This two-stage split matters. The hypothesis stage is allowed to be speculative — it's cheap to generate a hypothesis that turns out to be wrong, and the analyzer will filter it. The analyzer stage is allowed to be conservative. Running them together in a single prompt collapses the useful separation: the model both proposes and evaluates, and in practice that means it emits plausibility-matched false positives.&lt;/p&gt;

&lt;p&gt;Example output (real, from scanning a test corpus):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ✗ src/api/transfer.ts:38   [HIGH, conf 0.88]
   Hypothesis: read-modify-write of `balance` without row lock;
               concurrent requests can double-spend.
   Evidence:   line 42 reads `balance`, line 51 writes `balance - amount`;
               no FOR UPDATE / transaction isolation in scope.
   Suggested:  wrap in BEGIN ... SELECT ... FOR UPDATE ... COMMIT,
               or use SERIALIZABLE isolation level.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Structural variant analysis
&lt;/h2&gt;

&lt;p&gt;Given a reference CVE (from NVD, or a user-supplied patch), the variant analyzer searches the codebase for AST-shape-similar regions with semantic-role matching on inputs/sinks. Similar in spirit to what Google Project Zero described in the public &lt;strong&gt;Big Sleep&lt;/strong&gt; writeup, applied to an open-source TypeScript toolchain.&lt;/p&gt;

&lt;p&gt;The use case this actually solves: &lt;em&gt;"we patched bug X in module A; are there other places in the codebase that look like module A before the patch?"&lt;/em&gt; Regex search over &lt;code&gt;git diff&lt;/code&gt; misses these because the variant can rename the variables, reorder the statements, split a helper out, etc.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. What's in the box
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;43 scanner categories&lt;/strong&gt; (15 production-wired, 28 experimental): SQL injection, SSRF, path traversal, command injection, XSS, JWT algorithm confusion, session handling, race conditions, crypto audit, secrets, IaC misconfig, supply chain, AI/LLM security, API security, cloud misconfig, zero trust, privacy/GDPR, GraphQL, WebSocket, CORS, OAuth, SSTI, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;329+ built-in rules&lt;/strong&gt; across &lt;strong&gt;8 languages&lt;/strong&gt; (TypeScript, JavaScript, Python, Go, Java, PHP, C/C++, Rust). Rules compose — "SQL injection" is N smaller rules, not one regex.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: SARIF 2.1.0 (drop-in for GitHub Code Scanning), HTML reports, JSON for piping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backends&lt;/strong&gt;: Claude, GPT-4o, Ollama, or any OpenAI-compatible endpoint. &lt;strong&gt;Pattern-only mode works offline without any API key&lt;/strong&gt; — the hypothesis stage is opt-in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Releases are Sigstore-signed&lt;/strong&gt; (cosign) with CycloneDX SBOMs attached to each GitHub release.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6. Where it still gets things wrong
&lt;/h2&gt;

&lt;p&gt;Hypothesis-driven scanning is not free. Honest limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamically-typed languages&lt;/strong&gt; (Python, JS) produce more noise than statically-typed ones. Type information is a signal the analyzer leans on heavily; without it, confidence scores drift lower and the high-severity filter leaves more on the floor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inter-procedural taint across package boundaries&lt;/strong&gt; still loses signal. If the tainted value crosses into a third-party dep with no source, the hypothesis stage has to reason about the dep's public surface, and it often over-generates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;. Running the hypothesis stage across a 100k-LOC codebase with Claude or GPT-4o is not free. The &lt;code&gt;--severity high&lt;/code&gt; filter helps; incremental scans on changed files help more. CI integration should scope to diff-only by default.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  7. Try it
&lt;/h2&gt;

&lt;p&gt;One command, no install, no API key needed for pattern-only mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx mythos-agent quick       &lt;span class="c"&gt;# 10-second security check&lt;/span&gt;
npx mythos-agent scan        &lt;span class="c"&gt;# full pattern scan&lt;/span&gt;
npx mythos-agent hunt        &lt;span class="c"&gt;# AI-guided scan (needs a model endpoint)&lt;/span&gt;
npx mythos-agent fix &lt;span class="nt"&gt;--apply&lt;/span&gt; &lt;span class="c"&gt;# AI-generated patches for high-confidence findings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mythos-agent/mythos-agent" rel="noopener noreferrer"&gt;https://github.com/mythos-agent/mythos-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Landing / docs&lt;/strong&gt;: &lt;a href="https://mythos-agent.com" rel="noopener noreferrer"&gt;https://mythos-agent.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community (EN)&lt;/strong&gt;: &lt;a href="https://mythos-agent.com/discord" rel="noopener noreferrer"&gt;https://mythos-agent.com/discord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community (CN · 飞书)&lt;/strong&gt;: &lt;a href="https://mythos-agent.com/feishu" rel="noopener noreferrer"&gt;https://mythos-agent.com/feishu&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Releases&lt;/strong&gt;: Sigstore-signed, SBOM attached&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT licensed. v4.0.0 shipped today. If you have a codebase you'd want tested against hypothesis generation (public or a redacted snippet), open an issue or a discussion — I'm specifically looking for cases where the analyzer produces unexpected false positives, since those are the most useful signal for tuning the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Questions I'd value technical feedback on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;For &lt;strong&gt;per-function hypothesis generation&lt;/strong&gt;, where has the "speculate then analyze" split produced the most noise in systems you've built or used?&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;structural variant analysis on dynamically-typed languages&lt;/strong&gt;, what's your experience with AST-shape normalisation to get useful similarity scores across Python or JS?&lt;/li&gt;
&lt;li&gt;Which &lt;strong&gt;SARIF 2.1.0 consumers beyond GitHub Code Scanning&lt;/strong&gt; actually render SARIF well, and which silently drop half the fields?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thanks for reading. ⭐Star on GitHub if this is useful; open an issue if you find a bug.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Pawdig is an AI document intelligence tool</title>
      <dc:creator>Zhijie Wong</dc:creator>
      <pubDate>Fri, 03 Apr 2026 13:13:09 +0000</pubDate>
      <link>https://dev.to/zhijiewong/pawdig-is-an-ai-document-intelligence-tool-2hme</link>
      <guid>https://dev.to/zhijiewong/pawdig-is-an-ai-document-intelligence-tool-2hme</guid>
      <description>&lt;p&gt;If you’ve ever tried to copy-paste a table from a PDF, invoice, or contract, you know the pain. &lt;/p&gt;

&lt;p&gt;The formatting breaks. The cells merge. You end up manually re-typing 200 rows of data.&lt;/p&gt;

&lt;p&gt;So, I built a better way. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://pawdig.com/sign-in" rel="noopener noreferrer"&gt;Pawdig&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pawdig&lt;/strong&gt; is an AI document intelligence tool that doesn't just extract messy data instantly, it turns your files into your own private knowledge base. &lt;/p&gt;

&lt;p&gt;Just drag and drop. Pawdig instantly structures your data, and our built-in AI agents lets you chat directly with your documents to extract insights, summarize pages, or find exact clauses.&lt;/p&gt;

&lt;p&gt;It handles:&lt;br&gt;
✅ Building an instantly searchable AI knowledge base&lt;br&gt;
✅ Borderless tables &amp;amp; complex merged cells&lt;br&gt;
✅ Scanned images and poor-quality invoices&lt;br&gt;
✅ Massive page documents&lt;br&gt;
✅ Instant export to Excel, CSV, JSON, or Markdown&lt;/p&gt;

&lt;p&gt;If you deal with invoices, reports, or contracts daily, try it out. 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pawdig.com/sign-in" rel="noopener noreferrer"&gt;https://pawdig.com/sign-in&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>I built an open-source project OpenHarness🪼</title>
      <dc:creator>Zhijie Wong</dc:creator>
      <pubDate>Wed, 01 Apr 2026 13:54:34 +0000</pubDate>
      <link>https://dev.to/zhijiewong/i-built-an-open-source-project-openharness-2lc6</link>
      <guid>https://dev.to/zhijiewong/i-built-an-open-source-project-openharness-2lc6</guid>
      <description>&lt;p&gt;I built &lt;strong&gt;OpenHarness — an open-source terminal coding agent&lt;/strong&gt; with 17 tools and 16 slash commands. It works with Ollama (free, local), OpenAI, Anthropic, OpenRouter, Deepseek, Qwen or any OpenAI-compatible API.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Code agent is amazing but locked to cloud models.&lt;br&gt;
I wanted the same experience with my local Ollama models free, private,&lt;br&gt;
no API key needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What I built&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;17 tools: file read/edit/write, bash, grep, glob, web search, task management, jupyter notebooks, sub-agents&lt;/li&gt;
&lt;li&gt;16 slash commands: /diff /undo /commit /cost /compact /plan /review&lt;/li&gt;
&lt;li&gt;Git-safe: every AI edit auto-committed, /undo reverts instantly&lt;/li&gt;
&lt;li&gt;Headless mode: &lt;code&gt;oh run "fix tests" --json&lt;/code&gt; for CI/CD&lt;/li&gt;
&lt;li&gt;Permission gates: ask/trust/deny — approve before the agent acts&lt;/li&gt;
&lt;li&gt;React+Ink terminal UI with markdown rendering&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Install&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
npm install -g @zhijiewang/openharness
oh --model ollama/llama3
oh --model ollama/qwen2.5:7b

## **Tech stack**

TypeScript, React+Ink, Zod for tool schemas, async generators for streaming.

Everyone is welcome to join and build it together. 👏
GitHub: https://github.com/zhijiewong/openharness
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
