DEV Community: Zhijie Wong

How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap

Zhijie Wong — Sat, 23 May 2026 09:06:24 +0000

TL;DR

I've been building Pilates, a flex layout engine for terminal UIs in pure TypeScript. As of last week, across the 9 scenarios in my bench suite, the pure-TS engine is faster than WASM Yoga (the engine Ink uses) on each — including the structural-mutation workload (append + remove a row per frame) Yoga led on by ~5× until phases 15–17 closed it. That flipped to a ~1.7× Pilates win, in pure TypeScript.

No native bindings. No WASM port. The fix was algorithmic, and the algorithmic fix worked in TS.

The numbers

Median latency, win32-x64, Node 22, ~5s tinybench windows with bootstrap CI95:

Scenario	Pilates	yoga-layout (WASM)	Ratio
tiny (10 nodes)	4.5µs	19.0µs	4.2× faster
realistic (~100)	121µs	328µs	2.7× faster
stress (~1000)	601µs	1.94ms	3.2× faster
big (~5000)	3.32ms	9.17ms	2.8× faster
huge (~10000)	8.62ms	18.5ms	2.1× faster
hot-relayout	16.3µs	83.0µs	5.1× faster
hot-relayout + boundaries	15.8µs	77.8µs	4.9× faster
hot-relayout (text mutation)	8.9µs	90.6µs	10× faster
hot-structural	71.3µs	118.3µs	1.7× faster

Caveats up front: 9 hand-picked scenarios, not a universal claim. Reproduce with pnpm bench — about 5 minutes on a recent machine.

Why pure TS can beat WASM here

Terminal UI is a curiously hostile workload for a WASM engine. Trees are small (10–10,000 nodes), but updates are frequent — one keystroke, one tick, one frame. The crossing cost from JS into WASM dominates: Yoga's per-call kernel is a few microseconds, but node.setWidth(N) from JS to WASM is also a few microseconds. A pure-TS engine pays no crossing cost.

That was the thesis going in. Phases 15–17 are evidence the thesis holds even in the worst case — the workload where Yoga's compute kernel is exactly what's being measured, with the tree pre-built and only the structural-mutation layout timed.

How hot-structural went from ~450µs to ~70µs

Two algorithmic changes did the work.

1. Linear-recurrence main-axis positions

The original main-axis position rule was a cumulative sum: each cell's position depended on the size of every prior sibling. A 100-cell row in the stress fixture meant ~300 dependency edges per row.

// Old rule — every cell reads every prior sibling
mainPos[N] = sum(siblings[0..N-1].mainSize + margin + gap)

Replaced with a linear recurrence — each cell only reads the cell immediately before it:

// New rule — each cell only reads the previous one
mainPos[N] = mainPos[N-1] + prev.mainSize + prev.marginEnd + me.marginStart + gap

Reverse-direction (row-reverse / column-reverse) keeps the cumulative-sum fallback because the recurrence depends on the prior cell's already-resolved position, which doesn't hold when iteration is reversed.

2. Fold default-valued style inputs

Observation: roughly half of all input fields in the grammar were sitting at default values forever — margin: 0, minWidth: 0, maxWidth: undefined, etc. They still consumed dirty-flag slots, propagated through dependents, and appeared in dependency sets.

Phase 17 folds these defaults into compile-time constants at grammar-build time. Each per-cell node went from ~15 fields to ~7. The classifier's nodeSig was extended with fold-predicate bits so that mutating from default → non-default correctly triggers a structural rebuild.

Combined, hot-structural went from ~450µs to ~70µs.

Why pure TS over a native rewrite

I considered porting the engine to a native-compiled-to-WASM language before doing the algorithmic work. Glad I didn't.

Yoga's advantage wasn't speed of arithmetic — its C++ kernel is fast and well-tuned, but speed of arithmetic wasn't the bottleneck on this workload. The advantage was the structural-mutation algorithm: Yoga handled it natively, the pure-TS engine was redoing too much work per mutation.

A native-compiled port from my side would have inherited the same algorithmic shape and reached parity at best. The fix was algorithmic, and the algorithmic fix worked in TypeScript. "Pure TS is competitive with native code on this workload" is the actually-interesting result.

Validation, including a same-day hotfix story

1,470 unit + integration tests pass
Structural-differential fuzzer green at 3,000 runs
33 Yoga oracle fixtures (cell-for-cell comparison)
Byte-identical cached-vs-cold differential mode at 833 runs

A small incident worth mentioning: within hours of publishing 2.0.0, the fast-check property fuzzer caught a real bug — createStyleDirtier was throwing on a node whose entire style had been folded out, a case my analysis said couldn't happen. The fuzzer immediately found it. 2.0.1 shipped same day with the fix and a pinned regression test, and 2.0.0 was deprecated on npm pointing at 2.0.1.

Property-based fuzzing earns its keep. I had been on the fence about whether the fuzzer was worth maintaining; this answered it.

API stability

Public calculateLayout() is byte-identical between 1.x and 2.x. The SemVer-major bump reflects internal API and memory-characteristic shifts:

Typed-array runtime (Field.id integer + array storage replacing Map<Field, X>)
LayoutPool grows unbounded (tried FinalizationRegistry-based recycling in phase 15C; caused 2× regression so removed)
Per-property dirty bitmask replacing single dirty bool
Linear recurrence + fold default values (the algorithmic changes above)

If you're using only the documented public API, you upgrade and the speedup is transparent.

Try it

git clone https://github.com/pilatesjs/pilates
cd pilates
pnpm install
pnpm bench   # ~5 min

Or install the engine directly:

npm install @pilates/core

Full React stack (reconciler + widgets):

npm install @pilates/react @pilates/widgets react

Adversarial benchmarks are very welcome — if there's a workload where this approach breaks down, I'd genuinely like to find it. That's the most valuable feedback the project can get right now.

Repo (MIT): https://github.com/pilatesjs/pilates

npm: https://www.npmjs.com/package/@pilates/core

Why Pattern-Matching Scanners Miss Structural Bugs (and What I Built Instead)

Zhijie Wong — Wed, 22 Apr 2026 10:48:03 +0000

TL;DR

Pattern-matching scanners (Semgrep, Snyk, CodeQL) find what their rulebook encodes. Bugs that arrive as structural variants — the sink is three calls away, the taint flows through an unusual shape, the CVE matters but the pattern doesn't match verbatim — slip through.

I built mythos-agent, an open-source AI code reviewer (MIT, TypeScript, GitHub), to layer an LLM-based hypothesis stage on top of a traditional SAST foundation. This post is the technical writeup: what the pipeline looks like, what bug classes it surfaces that regex-only scanners miss, and where it still gets things wrong.

npx mythos-agent scan     # pattern scan, no API key
npx mythos-agent hunt     # full AI hypothesis + analyzer pipeline

1. The problem: rulebook coverage vs. bug space

A pattern scanner's ruleset is a finite set of (sink, source, condition) triples. A security reviewer reading the same code carries a much larger implicit model — they notice that this DB transaction reads and writes the same row without locking, that this handler joins a user-supplied path against a config root without resolving symlinks, that this eval receives a value that's been stringified three functions upstream.

Concrete example. Semgrep's default TypeScript ruleset catches this:

app.get('/run', (req, res) => {
  eval(req.query.code);           // flagged: eval() on request input
});

It does not catch this, even though it's the same bug:

function normalise(input: unknown) {
  return String(input).trim();
}

function buildPayload(raw: string) {
  return normalise(raw);
}

app.get('/run', (req, res) => {
  const payload = buildPayload(req.query.code as string);
  new Function(payload)();        // not flagged: sink ≠ eval, source is 2 calls away
});

The pattern rule is looking for eval(<tainted>) literally. The real bug is <any dynamic-code sink>(<tainted, possibly transformed, possibly renamed>). You can write a Semgrep rule for this variant — but you can only write rules for variants you've already thought of. The space of "things that behave like eval" is open-ended.

2. The approach: hypothesis generation per function

The mythos-agent pipeline is four stages:

Recon → Hypothesize → Analyze → Exploit (optional)

The interesting stage is Hypothesize. For each function the parser extracts, a prompted LLM agent produces specific, code-grounded security claims — not CWE labels, but statements about this code:

"This handler reads req.query.path and passes it to fs.readFileSync via path.join(ROOT, userPath) without resolving symlinks. Potential path traversal if the filesystem contains symlinks pointing outside ROOT."

"This transaction reads balance at line 42 and writes balance - amount at line 51, without wrapping in SELECT … FOR UPDATE or an equivalent lock. Potential TOCTOU race allowing double-spend under concurrent requests."

The hypotheses are inputs to the next stage, not outputs to the user.

3. The analyzer: grading hypotheses against the code

A separate analyzer agent re-reads the function with the hypothesis attached and decides whether the claim actually holds given the control flow, input reachability, and sink characteristics. Findings get a confidence score in [0, 1]; --severity high only surfaces results above a threshold.

This two-stage split matters. The hypothesis stage is allowed to be speculative — it's cheap to generate a hypothesis that turns out to be wrong, and the analyzer will filter it. The analyzer stage is allowed to be conservative. Running them together in a single prompt collapses the useful separation: the model both proposes and evaluates, and in practice that means it emits plausibility-matched false positives.

Example output (real, from scanning a test corpus):

 ✗ src/api/transfer.ts:38   [HIGH, conf 0.88]
   Hypothesis: read-modify-write of `balance` without row lock;
               concurrent requests can double-spend.
   Evidence:   line 42 reads `balance`, line 51 writes `balance - amount`;
               no FOR UPDATE / transaction isolation in scope.
   Suggested:  wrap in BEGIN ... SELECT ... FOR UPDATE ... COMMIT,
               or use SERIALIZABLE isolation level.

4. Structural variant analysis

Given a reference CVE (from NVD, or a user-supplied patch), the variant analyzer searches the codebase for AST-shape-similar regions with semantic-role matching on inputs/sinks. Similar in spirit to what Google Project Zero described in the public Big Sleep writeup, applied to an open-source TypeScript toolchain.

The use case this actually solves: "we patched bug X in module A; are there other places in the codebase that look like module A before the patch?" Regex search over git diff misses these because the variant can rename the variables, reorder the statements, split a helper out, etc.

5. What's in the box

43 scanner categories (15 production-wired, 28 experimental): SQL injection, SSRF, path traversal, command injection, XSS, JWT algorithm confusion, session handling, race conditions, crypto audit, secrets, IaC misconfig, supply chain, AI/LLM security, API security, cloud misconfig, zero trust, privacy/GDPR, GraphQL, WebSocket, CORS, OAuth, SSTI, and more.
329+ built-in rules across 8 languages (TypeScript, JavaScript, Python, Go, Java, PHP, C/C++, Rust). Rules compose — "SQL injection" is N smaller rules, not one regex.
Output: SARIF 2.1.0 (drop-in for GitHub Code Scanning), HTML reports, JSON for piping.
Backends: Claude, GPT-4o, Ollama, or any OpenAI-compatible endpoint. Pattern-only mode works offline without any API key — the hypothesis stage is opt-in.
Releases are Sigstore-signed (cosign) with CycloneDX SBOMs attached to each GitHub release.

6. Where it still gets things wrong

Hypothesis-driven scanning is not free. Honest limits:

Dynamically-typed languages (Python, JS) produce more noise than statically-typed ones. Type information is a signal the analyzer leans on heavily; without it, confidence scores drift lower and the high-severity filter leaves more on the floor.
Inter-procedural taint across package boundaries still loses signal. If the tainted value crosses into a third-party dep with no source, the hypothesis stage has to reason about the dep's public surface, and it often over-generates.
Cost. Running the hypothesis stage across a 100k-LOC codebase with Claude or GPT-4o is not free. The --severity high filter helps; incremental scans on changed files help more. CI integration should scope to diff-only by default.

7. Try it

One command, no install, no API key needed for pattern-only mode:

npx mythos-agent quick       # 10-second security check
npx mythos-agent scan        # full pattern scan
npx mythos-agent hunt        # AI-guided scan (needs a model endpoint)
npx mythos-agent fix --apply # AI-generated patches for high-confidence findings

GitHub: https://github.com/mythos-agent/mythos-agent
Landing / docs: https://mythos-agent.com
Community (EN): https://mythos-agent.com/discord
Community (CN · 飞书): https://mythos-agent.com/feishu
Releases: Sigstore-signed, SBOM attached

MIT licensed. v4.0.0 shipped today. If you have a codebase you'd want tested against hypothesis generation (public or a redacted snippet), open an issue or a discussion — I'm specifically looking for cases where the analyzer produces unexpected false positives, since those are the most useful signal for tuning the prompt.

Questions I'd value technical feedback on

For per-function hypothesis generation, where has the "speculate then analyze" split produced the most noise in systems you've built or used?
For structural variant analysis on dynamically-typed languages, what's your experience with AST-shape normalisation to get useful similarity scores across Python or JS?
Which SARIF 2.1.0 consumers beyond GitHub Code Scanning actually render SARIF well, and which silently drop half the fields?

Thanks for reading. ⭐Star on GitHub if this is useful; open an issue if you find a bug.

Pawdig is an AI document intelligence tool

Zhijie Wong — Fri, 03 Apr 2026 13:13:09 +0000

If you’ve ever tried to copy-paste a table from a PDF, invoice, or contract, you know the pain.

The formatting breaks. The cells merge. You end up manually re-typing 200 rows of data.

So, I built a better way.

Pawdig

Pawdig is an AI document intelligence tool that doesn't just extract messy data instantly, it turns your files into your own private knowledge base.

Just drag and drop. Pawdig instantly structures your data, and our built-in AI agents lets you chat directly with your documents to extract insights, summarize pages, or find exact clauses.

It handles:
✅ Building an instantly searchable AI knowledge base
✅ Borderless tables & complex merged cells
✅ Scanned images and poor-quality invoices
✅ Massive page documents
✅ Instant export to Excel, CSV, JSON, or Markdown

If you deal with invoices, reports, or contracts daily, try it out. 👇

https://pawdig.com/sign-in

I built an open-source project OpenHarness🪼

Zhijie Wong — Wed, 01 Apr 2026 13:54:34 +0000

I built OpenHarness — an open-source terminal coding agent with 17 tools and 16 slash commands. It works with Ollama (free, local), OpenAI, Anthropic, OpenRouter, Deepseek, Qwen or any OpenAI-compatible API.

The problem

Code agent is amazing but locked to cloud models.
I wanted the same experience with my local Ollama models free, private,
no API key needed.

What I built

17 tools: file read/edit/write, bash, grep, glob, web search, task management, jupyter notebooks, sub-agents
16 slash commands: /diff /undo /commit /cost /compact /plan /review
Git-safe: every AI edit auto-committed, /undo reverts instantly
Headless mode: oh run "fix tests" --json for CI/CD
Permission gates: ask/trust/deny — approve before the agent acts
React+Ink terminal UI with markdown rendering

Install


bash
npm install -g @zhijiewang/openharness
oh --model ollama/llama3
oh --model ollama/qwen2.5:7b

## **Tech stack**

TypeScript, React+Ink, Zod for tool schemas, async generators for streaming.

Everyone is welcome to join and build it together. 👏
GitHub: https://github.com/zhijiewong/openharness