We Built an Autonomous Code Guardian in a Weekend: Here's What Happened

#jaseci #ai #jaclang #hackathon

So the pitch we walked into JacHacks with was a little unhinged. Build a thing that watches a repo, notices when a dependency change looks shady, runs that dependency in a sandbox to actually prove it's malicious, writes a fix, and opens its own pull request. No human in the loop until the very end. We called it GhostWatch. It took 2nd in the agentic track.

This is the build log: what we made, the moment Jac clicked, and the stuff that absolutely did not click. We're Aaron and Ayush, two CS students who just finished freshman year at the University of Michigan. We split the work simply. We both lived in the backend, and I (Ayush) owned the frontend.

The problem, before any of the tooling

Two things about code review have bugged us for a while, and "add another linter" fixes neither.

First one: tools are blind to blast radius. A PR touches some core function and the reviewer sees a diff in a box. What imports that function? What tests cover it? What quietly breaks three files away? Nobody's looking, because the tool isn't built to look.

Second one is worse. A lot of supply chain attacks never open a PR at all. Someone compromises a maintainer's account, pushes a poisoned version straight to the registry, and the postinstall script runs on every machine that pulls it. There's no diff. There's no review. CI doesn't fire because nothing got reviewed. You find out it happened after it's already happened.

We wanted one system for both: spatial awareness of the codebase, plus an autonomous loop for the dependency case. The one rule we gave ourselves was that the security decisions stay deterministic and explainable. No "the model thought it looked sketchy."

Where Jac clicked

We'd never written a line of Jac before that weekend. Going in, our mental model of a multi agent system was the one everybody has: a Python script, some LLM calls, a state dict you pass around, and vibes. We'd shipped stuff like that before. Hand rolled agent loops hitting the model API directly, threading context by hand, serializing everything ourselves.

Jac's whole move is that the codebase is the data structure. You model your domain as a graph of node and edge types, and you write walkers, little agents that physically move through that graph. For a tool whose entire job is figuring out how a code change spreads, that was almost cheating. Files are nodes. Imports are edges. Blast radius stops being a metaphor and becomes, literally, a graph walk.

Here's the whole schema for a file and an import:

`node FileNode {

has path: str;

has content: str = "";

has language: str = "jac";

has risk_score: int = 0;

has is_test: bool = False;

}

edge ImportEdge {

has is_direct: bool = True;

has import_type: str = "static";

}
`
And the walker that maps blast radius doesn't compute anything clever. It just walks outward from the files that changed:

`walker BlastRadiusMapperWalker {

has changed_nodes: list[str] = [];

has affected_nodes: list[str] = [];

has max_hops: int = 5;

has risk_score: int = 0;

can start_from_root with Root entry;

can map_blast with FileNode entry;   // fires on every FileNode it lands on

Two more things caught us off guard. One: persistence is just free. Connect a node to Jac's root and it sticks around between runs. No database, no ORM, no migration. Our incident records and the whole repo graph survive restarts because they live on the graph. Two: by llm. You declare a function, hand its body to a model with one keyword, and keep the name and types as the contract:
def _merge_findings(security: Any, compat: Any, blast: Any, pr_url: str) -> VerdictObject by llm();

That one line is an LLM call returning a typed object. No prompt file, no parse the JSON and pray. That was the "oh, that's the point" moment for us. The line between deterministic code and model calls is a property of the signature, not a pile of glue you maintain.

The friction

It was not all clean. Honestly:

Jac is not Python and your hands don't know that yet. Semicolons on every statement. Braces, not indentation. has for fields instead of self.x. def for normal methods but can only for the event driven abilities. Our first couple hours were pure muscle memory parse errors. The one that actually got me: I wrote a walker ability like a Python method, assigned state with self.field = ..., and dropped half the semicolons while I was at it. Then I sat there reading the parse error backwards for a solid twenty minutes before it clicked that the field has to be declared with has up top and that every single line wants its semicolon. Dumb. Obvious in hindsight. Cost me twenty minutes anyway.

The frontend reactivity trap. Jac's client layer is React underneath, so React's rules apply. I had a panel that just would not update no matter what the walker handed back. Turned out I was mutating a list in place with items.append(x) and expecting a re render. You have to reassign (items = items + [x]). Once I stopped poking the list and started replacing it, the panel came alive. Felt stupid. Felt great.

Setup and the MCP tooling. Getting the environment up took longer than we wanted, and the docs were thin in a couple of the exact spots where we needed them. What kept us moving was the MCP server's validate step. We fell into a rhythm of writing a chunk of Jac, validating it before running anything, and fixing the parse errors right there instead of finding out at runtime. Boring loop. Saved us a lot of confused staring.

Reason we're bothering to list all this: every one is a real doc gap. Naming it helps the next person way more than pretending the weekend was frictionless.

How it stacks up against what we'd done before

Honest comparison: our prior agent stuff was raw Python plus LLM API calls. You write the orchestration. You own the state dict. You serialize everything. The graph of your system only exists in your head. Jac ate a lot of that. The graph is an actual language construct instead of a thing you fake with dictionaries. State persistence is the default instead of a database you bolt on after. The model boundary is by llm instead of forty lines of request building and response parsing. We're not going to tell you it was effortless. Scroll back up to the friction. But the program ended up shaped a lot more like the problem than any of our Python prototypes ever did.

What we actually shipped

A persistent Jac graph of a repo (files, imports, dependencies, tests, docs) built deterministically.
System 1: review walkers for security, compatibility, and blast radius, merged into one risk verdict.
System 2: the autonomous dependency pipeline. Manifest diff, rule based risk classification, sandbox run, deterministic fix, auto fix PR, Discord alert. Persistent incident state, and idempotency so retries don't spew duplicate PRs.
Deterministic helpers (manifest parsing, name normalization, risk rules, fix inversion) under test. jac test tests/test_system2.jac, 10 passing.
A maintainer and contributor frontend, also in Jac, same project.

The honest part: the sandbox is local subprocess instrumentation for npm and pip right now, not a fully wired cloud microVM, and some of the dashboard still runs on demo data. We left those seams visible on purpose. Hackathon polish shouldn't quietly become a lie.

Demo: https://www.youtube.com/watch?v=ZN0UVnNUpRs

Repo: https://github.com/ayushmk7/GhostWatch

Would we use Jac again?

For an agentic system over structured data, yeah, and faster next time. The walker over a graph thing fit a code defense problem so well that most of our arguing was about the actual security logic, not plumbing. The friction was real but it was front loaded into the first few hours of a brand new language. After that it mostly got out of the way. And for a weekend build that had to cover a graph backend, an autonomous pipeline, and a frontend, keeping it all in one stack was the difference between a cool idea and a thing we could stand up and demo.