DEV Community: WinstonRedGuard

Why your OSINT tool lies to you

WinstonRedGuard — Mon, 01 Jun 2026 17:09:45 +0000

Open almost any OSINT tool, run a username, and you get a wall of green checkmarks. Found on 40 sites. Phone traced to a carrier. Email confirmed. Every line rendered in the same confident styling as a real breach hit pulled from a cryptographic database.

Most of those checkmarks are lying to you. Not on purpose. The tool simply has no way to show the difference between "a cryptographic check confirmed this" and "a web page returned HTTP 200, so I guessed."

The uncertainty is real, and good analysts carry it in their heads. They know a phone "carrier" field is often wrong, that a username hit on an obscure site is close to a coin flip, that an email "exists" only means the domain accepts mail. But that knowledge lives in the analyst, not in the tool. It evaporates the second a junior reads the report, or the result gets pasted into a slide where the green is all anyone sees.

So I wrote a small library that moves the uncertainty out of the analyst's head and into the result itself. The idea is one sentence: every lookup gets wrapped in an envelope, and the verdict is capped at the highest level the source type can honestly support. Not the highest level this particular result reached. The highest that kind of source can ever reach. And the cap lives in the code, so a downstream UI cannot accidentally promote a guess into a fact.

There are four levels: verified, inferred, heuristic, unverified. The ladder is the easy part. The useful part is where each source type is forbidden from climbing.

A phone number can never return verified. Not with libphonenumber, not with a messenger-presence hit, not with a paid reverse-lookup. Number portability already broke the prefix-to-carrier guess years ago. A WhatsApp hit proves the number is reachable, not who owns it. A paid lookup gives you a current carrier, never an identity. So the wrapper caps at inferred and ships a warning you cannot switch off: number_portability_not_reflected.

An email can never return verified either. An MX record proves the domain accepts mail. It does not prove that this mailbox exists, that a human reads it, or that your target owns it. Aliases, forwarders and catch-all rules are invisible from the outside. SPF and DMARC describe how the domain wants mail handled, not who owns the address. Cap: inferred, with a permanent mailbox_existence_not_proven.

A username hit from 404-scraping is heuristic by construction, because status-code detection is fragile and false positives are the normal case. It only earns a promotion when independent platforms agree, or a per-site reliability history backs it up. A single hit on one site stays a guess, and the tool says so.

The same logic runs the other way for the sources that deserve trust. A DNSSEC-validated domain resolve, a TLS handshake, a HIBP k-anonymity password check: those are real, and they reach verified. The point was never to be cynical about everything. It was to stop one honest signal and one lucky guess from rendering identically.

And the pipeline that combines all of them inherits the verdict of its weakest link, because an investigation is only as trustworthy as its shakiest input.

This is the part most tools skip. The library does not claim its own numbers are accurate. The confidence anchors are hand-calibrated tradecraft, not measured precision and recall. They have not been validated against a labelled corpus, and the README says exactly that, in plain text. They express a relative ordering, an MX record beats a regex match and a DNSSEC resolve beats both, not a probability you should bet money on. A tool whose entire job is fighting overclaiming has no business overclaiming about itself.

None of this is exotic. It is the stuff every OSINT analyst already knows and every OSINT interface throws away the moment it renders a result. The only real move is refusing to throw it away: put the ceiling in the code, ship the caveat as a warning nobody can silence, and stay honest that the numbers are judgment, not measurement.

If you build OSINT tooling, surface the verdict and the warnings. Not a bare green checkmark.

osint-trust-envelope is a small, zero-dependency Python library. It performs no lookups of its own; you bring the raw result, it assigns the honest ceiling. https://github.com/WRG-11/osint-trust-envelope

I Asked My AI Agents to Find Me a Product. Their Best Work Was Killing Two of My Own Ideas.

WinstonRedGuard — Sun, 31 May 2026 14:28:23 +0000

I run a small fleet of AI coding agents. They build and maintain my projects, one writes code, one reviews it, one verifies, and one routes the work between them. Most people point a setup like this at "ship faster." Last week I pointed mine at a different question: out of everything I've built, what's the one thing worth turning into a real product?

The most useful thing they did was tell me no. Twice. With citations.

Here's how that went, because the discipline behind it is more useful than the answer.

The first idea I was wrong about

I had a favorite. A small memory layer for AI agents, the kind of thing that watches what your assistant does and remembers the patterns. I'd already built it, it was on PyPI, and I'd half-convinced myself it was the one. So I did the thing I've learned to always do before I get attached: I handed it to one of the agents and told it to take the idea apart. Not to help me improve it. To refute it.

It came back having broken every reason I had.

The one that ended the argument: the platform my tool runs on now ships that exact feature natively. Free, automatic, zero setup, doing the thing my tool asked you to do by hand. You can't win a market where the host gives your product away, better, for nothing. The second blow was the distribution plan I'd been quietly counting on, the registries where people "discover" these tools. I'd assumed they were a discovery channel. The data said they're a graveyard. Most listings get zero installs, and the only thing that ranks you is the reputation you already had before you showed up. The channel I thought would do my marketing was just my marketing problem with extra steps.

I dropped the idea the same afternoon. It stung a little. It also saved me a few weeks of building something dead on arrival.

The second idea, where I caught myself cheating

So I pivoted to what looked like the obvious survivor: a security tool I'd already launched, the verifiable, runs-in-your-browser, nothing-leaves-your-machine kind. This time I didn't trust a single agent. I ran three in parallel. One mapped the competition. One measured whether the underlying problem was even real. And one had a single job: kill the idea. Default to "this won't work," and only fail to kill it after genuinely trying.

The problem turned out to be very real. There's a documented trail of engineers pasting secrets into chatbots, companies banning the tools over it, hard numbers on how often it happens. Demand was not the issue.

The slot was. While I'd been admiring my own demo, a well-funded incumbent had shipped the exact retention feature I was planning, free for individual developers, already sitting at number one in the marketplace I'd have to compete in. The clever "verify it yourself" angle I thought was my moat? Nine other free tools already make the identical claim, word for word. My differentiator was table stakes.

Dead again. But the part worth keeping is what the third agent flagged on its way out: my framing was rigged. I had quietly demoted the stronger candidate to make room for the one I found more exciting. I'd picked the idea I wanted to be true over the one the evidence supported, and I'd written the framing to hide that from myself. The agent's word for it was "force-fit confirmed." It was right.

The discipline, not the verdict

None of this works if the agent's job is to make you feel good. The whole thing turns on one instruction: try to break it, default to skeptical, and if you can't break it after real effort, that absence is the only signal worth trusting. I make them list what they couldn't verify, so a confident-sounding answer can't smuggle in a guess. I make them check my framing for motivated reasoning, because the failure mode isn't a wrong fact, it's me arranging true facts into a flattering shape.

That last one is the part I can't do reliably on my own. I'm too close to my own ideas to notice when I'm protecting them. An adversary with no stake in my ego, pointed at my reasoning instead of my code, turns out to be the highest-leverage thing in the whole setup.

What I actually learned

The hype version of AI assistance is "it builds whatever you ask, faster." The version that earned its keep this week did the opposite. It refused to build, twice, and showed its work both times.

Agreement is cheap. Anything will agree with you. An agent that disagrees with you and brings receipts is rare, and it's worth more than ten that ship your bad idea at high speed. Most people aim these tools at "build my idea." The move that paid off was aiming one at "now try to kill it."

It cost me two ideas I was attached to. It saved me from building either one. I'll take that trade every time.

I run a fleet of AI agents that maintain a zero-dependency security monorepo. The open-source pieces live at github.com/WRG-11.

A client-side secret scanner that physically can't exfiltrate your code (and why you shouldn't trust mine either)

WinstonRedGuard — Sat, 30 May 2026 18:49:25 +0000

There's an irony in most "paste your config to check for leaked secrets" web tools: pasting a secret into a random website is the leak. You're trusting a server you can't see.

So I built devguard-scan the other way around — it runs 100% in your browser, zero dependencies, and makes no network calls at all.

Don't take my word for it. Open DevTools → Network, scan a file, and watch zero requests fire. The source has no fetch, XMLHttpRequest, WebSocket, or sendBeacon — grep it yourself. It can't exfiltrate what it never calls.

The 10 detection rules (OpenAI, AWS, GitHub classic + fine-grained PAT, Stripe, Google API, Slack token + webhook, private-key blocks, generic assignments) aren't a weaker JS port — they're the exact regex set from a canonical Python scanner, parity-checked byte-for-byte so the convenience of "in-browser" doesn't cost you detection coverage.

It's a POC, MIT-licensed, and open to rule-requests: github.com/WRG-11/devguard-scan

The broader point: for a security tool, "trust me" isn't good enough. The design should make the safety property verifiable by the user — here, an empty Network tab. What other dev tools should be provable rather than promised?