Hytham H

Posted on Jun 30

Building Phinq: How a Cronjob Failure Forced Me to Redesign Agent Governance From Scratch

#ai #typescript #security #opensource

Building Phinq: How a Cronjob Failure Forced Me to Redesign Agent Governance From Scratch

By Hytham H -- June 29, 2026

The Incident

Hermes overwrote a file. Phinq didn't catch it.

Not because the code was wrong. Two reasons.

First, the skill wasn't even loaded in that session. The agent had no instructions to consult itself against. No prompt, no rule, no safety net.

Second, even if it had been loaded, it still wouldn't have caught it. The skill only watched file operations. The cronjob mutated state through an API -- a different surface, a different path, and completely invisible to a file-op hook.

I was trying to govern an actor that didn't know it was being governed.

The Diagnosis

That failure forced an honest question: why did a tool built specifically to prevent this, fail to prevent this?

The answer is uncomfortable. A markdown skill is advisory. The agent has to choose to consult it, under token pressure, mid-task, and there's no mechanism forcing that.

Governance the governed party can ignore is not governance.

I could keep adding hooks for every tool in existence -- file operations, API calls, database queries, network requests -- but that's whack-a-mole. You're always one tool behind. The next cronjob, the next API surface, the next thing you didn't think to watch. You can't enumerate danger one surface at a time and expect to stay ahead of an agent that can call anything.

The problem wasn't the missing hook. The problem was the architecture.

The Pivot

Instead of trying to intercept every tool individually, move enforcement to the one chokepoint every tool call already passes through: the LLM API call itself.

Every action your agent takes starts as a tool call in a request to an LLM provider -- OpenAI, Anthropic, OpenRouter, anything that speaks those APIs. Trap it there and you don't need to know what tools exist. You don't need a hook per surface. You sit at the gate every action has to walk through.

That's what the proxy does. A Fastify/TypeScript intercepting server. Point your agent's base URL at it. Every request flows through before reaching the upstream. Tool calls get classified, checked against your declared rules, and either pass, block, or pause for your approval.

No code changes. No SDK. No per-tool hook maintenance. Just swap the URL.

(There's also an SDK path for in-process gating, but the proxy is the simpler story.)

What the Classifier Actually Does

A deterministic rule set scores every tool call into one of five levels:

Level	Behaviour	Examples
RISK_REDUCING	Always pass	Cancelling a task, reverting a change
REVERSIBLE	Always pass	Reading a file, writing a draft
IRREVERSIBLE_LOW	Pass	Single email, single file write
IRREVERSIBLE_MEDIUM	HOLD	Deletions, comms volume, config changes
IRREVERSIBLE_HIGH	HOLD + escalate	Credential access, billing, disable safeguards

Plus structural triggers that always escalate -- bulk deletes, credential reads, permission changes, anything that disables safeguards.

Reversible actions pass through with no latency you'd notice. The only thing that ever waits is a risky action held for your approval. Which is exactly the point.

No ML, no second black box. Just hard rules enforced at the only point where enforcement is guaranteed.

The Replay Discipline

Before you turn enforcement on, you calibrate. The proxy can run in pass-through mode -- classify every action, log every decision, but never block anything. You feed a real corpus through it, check for false HOLDs, tune the thresholds.

Only turn enforcement on once you have zero false HOLDs on routine operations.

This isn't a nice-to-have. It's the difference between a tool operators keep installed and one they rip out after the third time it blocks a harmless action. The false-positive rate determines whether governance survives contact with real workflows. If you're building a governance tool and you haven't designed a calibration loop, you haven't finished the product.

The Audit Trail

Self-reported audit logs from the entity being audited are worth nothing without tamper evidence.

Every governed action is written to a hash-chained JSONL file. The first entry is a genesis block with a random log ID. Every subsequent entry includes the hash of the previous entry. Change a single byte anywhere in the chain and verification fails.

One command to prove the history is intact.

This isn't about catching malicious agents. It's about the fact that if an agent can write to a log file, it can quietly edit that log file. The audit trail has to be structurally unalterable by the thing being audited. Otherwise you're asking the fox to maintain the chicken count.

What's Next

Phinq is MIT licensed. Two components on GitHub:

github.com/phinq-co/phinq -- the proxy, SDK, classifier, and audit logger
github.com/phinq-co/phinq-governance -- the original Agent Skill (lighter option, no infrastructure)

The hosted layer (cross-session dashboards, anomaly detection, shareable reports) is in development. Join the waitlist if that's useful to you.

The cronjob failure was the best thing that happened to this project. It proved that governance has to be environmental -- actions physically pass through it, or they don't pass at all. You cannot govern an actor by asking it nicely.

Reversible actions pass through. Irreversible ones pause.

That's the whole idea.

Top comments (1)

Alex Shev • Jun 30

Cron failures are a good forcing function because they expose whether governance is a document or a runtime behavior. The fix is usually not just retries; it is ownership, idempotency, limits, and a clear path for human review when the automation wakes up wrong.