RadarixAI

Posted on May 12 • Originally published at blog.radarix.ai

Building a Practical AI Radar — notes from the state-management trenches

#ai #opensource #automation #productivity

When we started building Radarix.ai, the visible product was always a map: layers of public-source signals stitched together so a person can see what's happening in air, at sea, and across borders in one glance.

The interesting engineering, though, didn't end up living in the map. It lived in the boring layer underneath — the part nobody tweets about: state.

This post is a build-log on what we've learned trying to scale a live, multi-source monitoring product without drowning in our own automation.

The shape of the problem

Most of us already know we should be watching more signal streams than we are: launches in our space, competitor activity, new directory listings, mentions of our project, market shifts, source data we depend on. The reason we don't is operational, not technical: doing it manually doesn't scale, and doing it semi-automatically usually devolves into a graveyard of scripts that nobody dares to touch six months later.

We hit the same wall, but with a louder version of it — our domain (live public-event monitoring) means signals are noisy, fast-moving, and contradictory. So we ended up reducing the entire operation to one loop:

collect relevant signals from public sources;
classify what actually matters from what's just noise;
produce an action queue for downstream work;
automate the repeatable follow-up checks;
keep humans in approval control for anything public-facing.

It looks obvious written out. The hard part is step 3 onwards.

The real bottleneck wasn't filling forms

We expected the slow part of "growth ops" to be browser automation — captcha walls, OAuth chains, ant-bot defenses. It is annoying, but it's not the bottleneck.

The bottleneck is maintaining state.

For every external surface we touch — directories we submit to, public profiles, content feeds, third-party listings — we need to know, with high confidence:

where the project is registered (and under which account);
what was submitted, when, with what payload;
what is pending review, and how long it's been pending;
what requires a manual step we haven't done yet;
which public profile copy is stale relative to current product positioning.

Without that registry, every cycle re-discovers what it should already know, re-submits things that shouldn't be re-submitted, and quietly accumulates a long tail of zombie listings nobody is tracking.

So early on we made a deliberate, slightly boring choice: one source of truth, in SQLite.

CREATE TABLE submissions (
    id              INTEGER PRIMARY KEY,
    target          TEXT NOT NULL,    -- normalized URL of the surface
    status          TEXT NOT NULL,    -- queued|in_progress|submitted|pending_review|registered|blocked
    last_action_at  TEXT NOT NULL,
    last_evidence   TEXT,             -- URL or path to evidence file
    blocker_reason  TEXT,             -- captcha|paywall|login_required|unclear_success
    payload_ref     TEXT,             -- pointer to the prepared payload
    ai_review       TEXT              -- last AI assessment of the evidence
);

That table is the spine of the whole operation. Everything else — the browser workers, the AI review steps, the cron — reads from it and writes back into it. If we lose anything else, we can rebuild. If we lose the registry, we're starting over.

Two cheap browser worker VPSes do more than you'd expect

We don't run browser automation on the control plane. The control plane is a small machine that holds the database, runs cron, and orchestrates work. The actual browser work lives on two separate, deliberately small VPSes running Playwright inside Docker.

Why two?

Concurrency without contention on a single Xvfb session.
IP / fingerprint diversification for surfaces that quietly flag a single VPS doing 50 form fills in a row.
Failover when one of them has a Playwright lockup, a disk fill, or just decides to be sad.

The controller distributes jobs in balanced mode — pick the worker with the lower in-flight count, fall back to the other on health check failure. If both die at the same time, the queue stays put; nothing gets corrupted because the registry didn't get its write yet.

The lesson here turned out to be surprisingly generic: separate "work that can fail" from "state you can't lose". Browser work fails routinely. The registry has to not.

AI as a reviewer, not a doer

We tried, briefly, the obvious thing: let a model drive the browser. It "worked" in the demo sense and broke in every interesting way: hallucinated buttons that didn't exist, claimed submissions succeeded based on a flash message that was actually an error, picked the wrong account on multi-tenant surfaces.

What turned out to work much better was treating the model as a reviewer of evidence, not a driver of actions.

The flow:

Playwright collects the deterministic evidence — screenshots, HTML snapshots, final URL after submit, any visible message.
A small classifier marks the surface as submitted_clean, pending_review, blocked_captcha, blocked_login, unclear, etc.
The model is given the evidence + the classifier's guess and asked: does this evidence actually support that label, or does it tell a different story?
The model's verdict gets written into ai_review alongside the human-readable explanation.

This split — deterministic action, probabilistic review — is the cheapest way we found to get the upside of model judgment without paying for its over-confidence. The browser worker doesn't trust the model. The model doesn't trust the browser worker. The registry, slowly, trusts both.

Native cron beat n8n for our use case

We started with a fancier scheduler. We removed it within a few weeks.

Not because anything was wrong with it — it's a perfectly reasonable tool. It just didn't fit our shape. Our scheduling needs are:

"every two hours, run one well-defined cycle, hold a flock so it doesn't overlap";
"every twelve hours, recheck the things we claimed were pending and confirm or downgrade them";
"every hour, sweep memory and aggregate state into one digest";
"once a day, write an audit and report it".

That fits a */2 * * * crontab line and a flock -n /tmp/cycle.lock ./cycle.sh invocation. No visual graph required. The lesson we keep relearning is boring beats clever when the operational interface is "did the thing run? what did it write?"

There's a related subtlety we got bitten by, which is worth one paragraph on its own:

When a cron job's command pipes a 25-minute pipeline into | tail -200 at the end, tail doesn't print anything until EOF. If something downstream of cron (a runner, a watcher, an LLM CLI) has a "no output for N seconds → kill" rule, you'll kill the process before it ever produces output. Diagnosis: command runs for exactly the idle timeout, dies, no log lines. Fix: stream output directly, or emit a heartbeat line every 30–60s from a wrapper. We discovered this the unglamorous way.

Humans stay in the loop for public actions

This is the one rule we won't compromise on, and it's why our throughput targets are deliberately modest.

The system can:

prepare drafts of public posts;
detect stale profile copy;
queue listing updates;
propose tone changes for a given audience;
assemble a publish-ready payload with image, title, body, and metadata.

The system cannot:

publish the post;
create a new account on a sensitive surface;
pay for placements;
bypass captchas or anti-bot defenses on a paid solver;
post in a community under a borrowed identity.

The reason is straightforward: automation that publishes is hard to recall. The internet remembers. Even a single misaligned post on a small subreddit can poison a launch for that surface for months. So we wire approval gates anywhere a public action would be observable, and we make the human review fast: a short summary, the exact text, the destination, and a yes/no.

What surprised us was how much we don't lose by doing this. Most of the throughput in submission/visibility work is in the prep — finding the right surface, finding the right copy, finding the right account, queuing the right payload. The actual "press publish" step is seconds. The bottleneck was never the human; it was every step before them.

A few lessons we'd give our six-month-ago selves

Pick a source of truth on day one. Not on day forty when the contradictions become unworkable. A single SQLite file is fine. The schema can grow.
Separate work that fails from state that can't. Browser/network failures are routine. Don't let them touch the registry directly — they go through a write step you control.
Use the model as a reviewer. Probabilistic verdict on deterministic evidence is much more reliable than the reverse.
Heartbeat your long-running jobs. Anything that runs for more than ~5 minutes without producing output will be killed by something — a runner, a sidecar, a watchdog. Print something every minute or get used to mysterious mid-pipeline deaths.
Approval gates are cheaper than retractions. Build the human-in-the-loop early; it's much harder to bolt on after you have an embarrassing post you have to apologize for.

What we're building toward

A practical operating system for monitoring, submission, content maintenance, and public-channel updates — one that runs as a small, observable, mostly-boring stack you can reason about end to end. Not a pile of agents that surprise you. Not a no-code graph that nobody can debug at 2am.

If you're building something in this space — growth ops, OSINT tooling, monitoring products, anything that has to talk to a lot of external surfaces and not lose its mind — I'd love to compare notes. The state-management trenches are lonely; everyone re-discovers them.

Live product: radarix.ai (free, no signup, OSINT radar covering aviation, maritime, and cross-border signals).

— RadarixAI

DEV Community