DEV Community: lweiss01

I got tired of watching 40 Kalshi tabs, so I built a self-hosted signal monitor

lweiss01 — Mon, 06 Jul 2026 12:54:00 +0000

I kept hearing about Kalshi. The commercials, the mentions, and then one morning CNN was talking about Kalshi prediction odds like they were a weather report. So I went and looked. And I had no idea what I was seeing.

Kalshi is really hard to understand when you're new to it. You get markets, contracts, prices that are also probabilities, volume, movement, and none of it tells you what's actually worth paying attention to. I wanted something that would translate what I was looking at into something I could understand and, ideally, act on. That was the whole original goal: make the firehose legible.

Then, of course, I kept adding to it, because once you can read the flow you start wanting an edge in it. Who doesn't. So Trade Hunter grew from a translation layer into a translation layer with detection on top. This is a writeup of what it became, why I made the design choices I made, and the parts I'm still not sure about. I'd rather you poke holes in it now than find the breakpoints the hard way. If you think a decision here is wrong, please let me know. The comments are the point of this post, not an afterthought.

Where this came from

Basically, I couldn't read Kalshi, and I wanted to. Trade Hunter is the original tool I built to fix that for myself, and it's the one I still run when I want a live view. So this isn't a polished sequel to anything. It's the thing I built because I wanted it to exist, flaws and design bets included, and I'd rather show you those directly.

The core idea never changed even as I piled features on: watch live Kalshi WebSocket feeds and surface an unusual move while it's still moving, rather than reading about it after the fact, or on CNN the next morning.

What it actually does

Trade Hunter subscribes to live Kalshi feeds across every market you track. Multi-contract series like the Fed rate decisions or who will win Top Chef fan out automatically to all open contracts, so you point it at one thing and it watches the whole family. When something moves unusually, it runs the move through a spike detector with configurable volume, price, baseline, and cooldown thresholds, and surfaces it on a local dashboard at 127.0.0.1:8765 with a signal feed, trade flow, and a market tape.

The signals sort into tiers, roughly from "noticeable" to "this one cleared every gate at once." The tier I care about most is the top one, where the score, price, and volume conditions all trip simultaneously rather than one at a time.

That's the boring part. Here's the part I actually want feedback on.

The two-layer read

A raw spike detector produces a lot of noise. Volume jumps for dull reasons constantly. So there are two AI passes sitting on top of the detector.

Layer one is a per-signal analyst. Every spike gets an immediate read: signal, noise, or uncertain, with a direction, a confidence level, and a plain-English rationale for why the flow looks the way it does. This runs on Claude by default with automatic fallback to OpenAI/ChatGPT, Gemini, Perplexity or XAI/Grok. You bring your own API keys. If you configure neither, the detector still runs, you just don't get the narrated readout.

Layer two is a tuning advisor. This is the piece I'm most attached to and least certain about. It looks across your recent analyst-labeled signals, finds patterns in your false positives, and suggests concrete threshold changes you can apply with one click. The claim I'm careful to make honestly: the underlying model doesn't retrain. Nothing about the LLM gets smarter. What improves is your detector configuration, tuned to the specific markets you watch, using your own accumulated signal history. It compounds in the sense that a spreadsheet of good heuristics compounds, not in the sense that a neural net does. I'd rather undersell that than dress it up.

The one detector rule I'm proud of

Whale clustering. A single large trade isn't that interesting; large trades happen. What's interesting is several of them bunched together in a way that's unlikely to be coincidence. So the "whale-cluster" flag fires when at least three trades at the 99th-percentile volume land inside a 120-second window, where a Poisson model puts the probability of that clustering by chance below one percent. That's the closest thing here to "coordinated flow" that I felt comfortable labeling as such. Tell me if the threshold is naive.

Run it and walk away

Here's the feature I probably undersell: Trade Hunter pushes signals to Discord over webhooks, with per-topic channel routing so different markets land in different channels. That changes how you actually use it. You don't park yourself in front of the dashboard. You start it, let it run, and the analyst readouts come to you: the signal, the tier, and the plain-English read, dropped into Discord wherever you are. Then you go act on Kalshi from your phone.

For a tool whose entire job is catching a move while it's still moving, being able to leave your desk and still not miss it is most of the point. The dashboard is where I look when I'm at the machine. Discord is why I don't have to be.

The decisions you're going to question

I'd rather raise these myself than have them raised for me.

It's self-hosted, and that's deliberate. Every other Kalshi monitor I found is a hosted SaaS. Your data lives on their server, their model improves for their benefit, and when you stop paying you're left with nothing. Trade Hunter runs on your machine, persists to a local SQLite database with a seven-day retention window, and the source is yours to read and modify. I work in information security by day, and "your market positions and your API credentials leave your machine and sit on someone else's box" is a tradeoff I didn't want to make or ask anyone else to make.

Live mode touches your Kalshi private key, and you should be suspicious of that. This is the honest trust ask. To go live you drop a .env file and your Kalshi private key next to the program, and it signs its own feed subscriptions. That's a real thing to hand to software from a stranger on the internet. My answer isn't "trust me." My answer is that the full Python source ships with it, the key never leaves your machine, and there's no server for it to phone home to. If that's not enough for you, that's a completely reasonable position, and I'd rather you inspect the source than take my word.

There's an exe, and there's also full source. The exe is there so someone who wants signals but doesn't write Python can double-click and go, though that convenience is Windows-only. The source is there so someone who does write Python isn't stuck with my choices, and it runs anywhere Python 3.11+ does: Windows, Mac, or Linux, with a little extra setup on Mac and Linux. If shipping a binary at all feels wrong to you for a tool that handles credentials, I want to hear that argument, because I went back and forth on it.

What I'm deliberately not claiming

It doesn't place trades. It's informational only. It surfaces things for your review and you decide what to do on Kalshi. It's not financial advice, it doesn't promise outcomes, and the addressable audience is frankly narrow: people who trade on Kalshi and are comfortable running local software. I'm not going to pretend this is for everyone.

Try it without trusting me at all

Simulation mode is the whole point of this section. It runs the entire dashboard on synthetic data with no credentials, no API keys, and no Kalshi account. You can see exactly what the detection, the tiers, and the two-layer read look like in about sixty seconds, and decide for yourself whether the thing is any good before money or trust enters the picture.

Trade Hunter Simulation

What I would like from you

Where did I get it wrong? The Poisson threshold, the "it doesn't retrain" honesty, the binary-plus-credentials decision, the whole self-hosted premise, or something else? I built this because I wanted it to exist and I use it myself, but I've been staring at it alone for too long. Is there something that bothers you about this? Tell me about it.

If you run the sim version and decide you want it pointed at real Kalshi market flows, that's the commercial build: same dashboard, live feed, full Python source, your keys, all running on your machine. It's on an introductory launch price right now at tradehunter.site.

AI Work Doesn't Fail All At Once. It Drifts

lweiss01 — Tue, 23 Jun 2026 13:05:00 +0000

A few days ago I was looking at Microsoft's AI Engineering Coach project. It analyzes coding-agent logs after a session ends and surfaces patterns and anti-patterns in how developers worked with AI. The implementation isn't what caught me. It was the assumption underneath it: AI work generates operational signals. That's a bigger idea than it sounds.

For the last year, most AI tooling has focused on one of three things: better agents, better memory, or better orchestration. All three matter, but they all assume the same thing. The agent is the center of the system, and the work is whatever it hands back at the end. We judge a session by whether the final diff looks right, not by what happened on the way there, because nothing in the workflow asks us to look. Lately I've been wondering if that's backwards.

Agents Don't Work. They Take Shifts.

In a previous article, I argued that agents don't really "work." They take shifts. Claude works for a while, then Codex, then Cursor. Sometimes a human steps in, then another model takes over. The work continues. The participants rotate, and each one only sees the part of the work that happened on their own shift.

That's a continuity problem. If nobody, not the agent and not you, has a continuous view of the whole project, things get re-explained, re-decided, and occasionally undone by whoever shows up next without knowing better. That observation led me to build Holistic, a system focused on preserving project continuity across those shifts.

But closing that gap surfaced a second, harder question, and it's the one this piece is actually about. Even within a single shift, one agent, one session, nobody handing off to anyone, the work isn't always doing what it looks like it's doing. Projects rarely fail all at once. They drift. The agent gets stuck in a retry loop. Requirements slowly fall out of focus. Research expands without execution. Scope quietly grows beyond the original task. Tests keep failing for an hour while everyone hopes the next attempt will somehow be different. Nothing is technically broken. The session is still running, the agent is still producing output, but the work is degrading underneath all of it.

Drift Is Expensive, and It Hides

When work drifts, it creates waste, and not just wasted tokens. Wasted engineering time. Wasted compute. Wasted attention. The cost is often invisible because the project still appears healthy: the agent is busy, files are changing, commits are happening, progress is being reported. But underneath the surface, effort is accumulating without producing meaningful forward movement. By the time the failure becomes obvious, the waste has already been incurred.

You've seen the shape of this even if you haven't named it. A retry loop runs for an hour. A requirement gets forgotten halfway through implementation. A solution gets built, removed, and rebuilt. A new agent spends thirty minutes rediscovering decisions that were already made an hour earlier. These aren't isolated mistakes. They're forms of operational waste, and traditional manufacturing systems have spent decades trying to catch exactly this kind of thing before it compounds.

What Manufacturing Already Knows

Andon boards exist to surface problems while work is still in progress, not after the shift ends and not after production wraps. While the problem is still recoverable. That idea feels increasingly relevant to AI engineering.

What if we treated AI work as an operational system, where agent activity isn't just output but telemetry? A failing test isn't merely a failing test, it's a signal. Repeated edits to the same files are signals. Reversing decisions is a signal. Expanding scope is a signal. Repeatedly asking for information already sitting in context is a signal. Viewed individually, these events look like noise. Viewed together, they become findings, and findings can drive intervention.

That's the real shift I'm pointing at: not "here's what went wrong yesterday," but "something is drifting right now." Retrospective coaching is useful. Real-time supervision changes outcomes. The goal isn't to understand drift after the fact, it's to interrupt it before the waste compounds.

Holistic Remembers. Andon Watches.

This is the idea behind Andon, which already ships inside Holistic as an experimental add-on. It's incomplete by design right now, but the foundation is there: the pieces needed to start recognizing drift patterns and surfacing findings, not just recording checkpoints.

Holistic asks: "What does the project remember?" Andon asks: "What needs attention right now?" The stronger Holistic gets, the more context Andon has to work with. The more Andon catches, the more valuable Holistic's checkpoints become. One preserves continuity. The other watches for drift. They're two different jobs, and I don't think most tooling right now is doing either one on purpose.

I don't think this is a new agent framework, a memory system, or an orchestration platform. It's operational intelligence for AI work, and I think that's its own category.

Microsoft's Coach analyzes completed sessions and surfaces patterns that would otherwise be easy to miss. That's worth having. But if AI-generated work keeps becoming a bigger share of how software gets built, we won't just need systems that explain the wreckage after the fact. We'll need systems that catch it while it's still just a crack.

Because AI work doesn't fail all at once. It drifts, and drift creates waste. The job of supervision is catching that drift before the work becomes unrecoverable.

Maybe We're Using the Wrong Metaphor for AI

lweiss01 — Sat, 20 Jun 2026 17:39:08 +0000

A friend asked me recently what I'd been spending so much time working on. I told her I'd been experimenting with AI agents. She nodded, thought about it for a second, and then asked the obvious question:

"So... ChatGPT?"

At first, that felt like an easy question to answer. I started explaining that an agent can use tools, work through tasks over time, read files, modify code, and interact with systems around it. Some frameworks even coordinate multiple agents, each with different roles and responsibilities. The explanation wasn't wrong, but the longer I talked, the less satisfied I became with it. I realized I wasn't struggling to explain the technology. I was struggling with the metaphor.

Most of the language we use to describe modern AI systems comes from organizations. We talk about workers, managers, planners, reviewers, and teams. We discuss delegation, coordination, communication, and hierarchy. If you spend enough time reading about agent architectures, it starts to sound less like software engineering and more like organizational design. At first that seemed perfectly reasonable. Agents behave enough like people that the comparison feels natural.

Lately, though, I've started wondering whether that metaphor is quietly shaping how we think about the problem itself.

Looking at the Work

Most conversations about AI focus on the worker. Which model should we use? Which framework is best? How many agents should be involved? What roles should they have? How should they coordinate? These are useful questions, and it's easy to understand why they've become the center of the discussion. After all, the workers are the visible part of the system.

What I've found myself paying attention to, however, is not the worker but the work itself.

When I step back and look at what is actually happening, I don't really see an organization. I see work moving through a process. Information enters the system, decisions get made, tasks move forward, stall, loop, branch, and occasionally have to be done again. The thing that increasingly captures my attention isn't the individual agent performing a task. It's the flow of the work itself and the way that flow changes over time.

That distinction may sound subtle, but I think it leads to a different set of questions.

If we view AI primarily as a workforce problem, then the obvious goal is to build better workers. Smarter models, better prompting techniques, improved coordination between agents, and more sophisticated frameworks all become natural areas of focus. Much of the current AI ecosystem is focused on exactly those things.

What I find interesting is that many of the frustrations people encounter with AI don't actually feel like failures of intelligence. The models are already astonishingly capable. They can write software, summarize research, analyze documents, generate designs, and solve problems that would have seemed extraordinary only a few years ago. Yet despite all of that capability, the same kinds of problems continue to appear. Context gets lost. Work gets repeated. Requirements drift. Effort accumulates without producing meaningful progress. Teams of agents sometimes create more complexity than clarity.

Those don't strike me as intelligence problems. They strike me as process problems.

A Different Tradition of Thinking

Once I started looking at AI through that lens, I found myself thinking less about artificial intelligence and more about systems. Manufacturing, operations, quality management, and reliability engineering have spent decades studying how work moves through complex systems. They ask questions about waste, variation, bottlenecks, feedback loops, and early warning signals. When problems emerge, the goal is not simply to identify who made a mistake. The goal is to understand what the system is producing and why.

The more I think about AI, the more relevant those questions seem.

That's why many discussions about agents feel incomplete to me. We spend enormous amounts of time debating models, frameworks, team structures, and architectures. Those discussions matter, but they all tend to assume that the worker is the center of the story. Increasingly, I'm convinced that the system deserves at least as much attention.

A factory can employ brilliant workers and still generate waste if the process is poorly designed. A software team can be filled with talented engineers and still struggle if the workflow is unhealthy. And an AI system can contain remarkably capable models while still produce disappointing outcomes if the surrounding process is fragile.

The Factory Behind the Workers

Maybe that's why I keep finding myself drawn toward concepts like continuity, feedback loops, drift, supervision, operational signals, and process health. Those ideas feel less like questions about intelligence and more like questions about systems. They focus less on the capabilities of individual workers and more on the health of the environment those workers operate within.

To be clear, I don't think the workers stop mattering. Better models matter. Better tools matter. Better frameworks matter. But I increasingly suspect we're reaching a point where making the workers smarter is only part of the story.

The system matters too.

I don't know whether the factory metaphor is ultimately the right one. What I do know is that the more I work with AI, the more I find myself paying attention to the movement of the work rather than the characteristics of the worker. And if that's the right direction, then some of the next breakthroughs in AI may come not from making the workers smarter, but from understanding the systems they're working within.

AI Agents Don't Work. They Take Shifts.

lweiss01 — Wed, 10 Jun 2026 14:00:00 +0000

I think I've been looking at AI projects backwards.

For months, agents were the center of everything I thought about. Agent memory, agent orchestration, agent frameworks, agent autonomy. Like most people working in this space, I put the agent at the center of the picture because the agent was the thing doing the work. Optimizing it seemed obvious.

Then I noticed something I haven't been able to stop thinking about since: every time I looked at a real project, the agent kept changing. Claude would spend a while on a feature. Later, Codex would pick it up. Cursor would get involved. Sometimes I'd step in myself, and then another model would take over. The work kept moving forward, but the participants kept rotating in and out.

At first I read this as a memory problem. Context seemed to disappear between sessions. Decisions had to be rediscovered, failed approaches retraced, the same explanations given over and over. The obvious conclusion was that agents were forgetting.

But the more I watched these transitions, the less convincing that explanation became. Nothing had actually been forgotten. Claude hadn't forgotten anything. Neither had Codex. The next participant simply hadn't been there when the previous decisions were made. What I was seeing wasn't memory loss. It was the friction that appears whenever work has to survive a participant change.

That distinction sent me down a very different path.

We've become accustomed to putting the wrong thing at the center of the diagram. We talk about agents the way a factory might talk about workers. We measure their capabilities, compare their strengths and weaknesses, and pour enormous energy into making them more effective. Those are useful conversations, but they left me wondering whether we were focusing on the participant instead of the thing being produced.

A factory doesn't exist for its workers. It exists so cars can roll off the line.

That sounds almost too obvious to say out loud, but it's worth sitting with for a moment. When you walk through a factory, nobody mistakes the worker for the product. The workers matter. They contribute. Some are more skilled than others, some faster. But none of them are the thing moving through the system. The car is. The entire factory is organized around helping that car move from one stage of production to the next. Workers participate in the process, but the car is what persists. It survives shift changes, vacations, retirements, the constant replacement of individual participants.

Factories have spent decades learning how to handle that reality: they don't build continuity systems because workers are forgetful. They build continuity systems because workers leave. The day shift goes home, the night shift arrives, and the work continues.

Once I saw that, I couldn't stop seeing the same pattern in AI development. Projects outlive agents. Projects outlive sessions. Projects outlive models. The project I'm working on today will almost certainly be touched by systems that don't exist yet. Models will improve, frameworks will come and go, entire categories of tooling will appear and disappear. The project remains.

That's what finally made the idea click.

I had been centering the wrong object. I was treating the agent as the thing moving through the system and the project as a container for the work. But the longer I looked at real projects, the more backwards that seemed. Everything else rotated. The project was the only thing that stayed.

Claude isn't the thing. Codex isn't the thing. Cursor isn't the thing. I'm not even the thing. We're all participants acting on the thing.

The project is the thing.

I want to sit with that for a second, because it's easy to nod along and miss what it actually implies.

It implies that the agent I'm using today is a shift worker. A skilled one, maybe my favorite one, but a shift worker. When the session ends, the shift ends. Whatever that agent understood about the work either made it into the project or it didn't. If it didn't, it's gone, and no amount of model improvement fixes that, because the next participant wasn't there.

It also implies that I'm a shift worker, which was the part that took longest to accept. I close the laptop at night, and tomorrow a version of me shows up with most of the context evaporated, asking the same questions a fresh agent would ask. The gap between me and Claude at the start of a session is smaller than I'd like to admit.

And strangely, the demotion is freeing. If the project is the thing, I can stop asking which agent is smartest and start asking what each participant leaves behind in the project when their shift ends. That's a different question with different answers. The smartest agent that leaves nothing behind is worth less to the project than an average one that writes everything down.

That realization changed how I think about continuity. A checkpoint is not memory for an agent. A checkpoint is continuity for a project. A handoff isn't helping an agent remember what happened, it's helping the work survive a shift change. The purpose of continuity was never to preserve the participant. It's to preserve the work.

Which is why I've become less interested in perfect memory and more interested in continuity. Perfect memory helps a participant. Continuity helps a project. And projects are the things that actually have to survive.

The funny thing is that the title of this post isn't really the conclusion. It's just the observation that got me here. AI agents don't work. They take shifts. The real conclusion is something else entirely:

The project is the unit of continuity. The agent is the unit of execution.

Once I started looking at AI systems through that lens, a lot of ideas that had previously felt separate (checkpoints, handoffs, project memory, supervision, even Andon) started looking like different parts of the same system. They're all answering the same question.

How does the work survive the shift change?

I Built a Prediction Market Insider Trading Detector in One Afternoon (Because George Santos)

lweiss01 — Wed, 03 Jun 2026 16:58:13 +0000

This morning I woke up to news that George Santos is under federal investigation for insider trading on Kalshi. Within a few hours, I had a working anomaly detector monitoring 410 politically-sensitive prediction markets with a live dashboard.

Here's how it happened, what I found, and how you can run it yourself.

What Santos Actually Did

The short version is almost embarrassingly simple. The day before Trump's State of the Union address, Santos posted a video on X saying he'd be in the gallery. Traders on Kalshi piled into "yes he'll attend" contracts and the odds shot up.

What Santos didn't mention was that he'd already placed bets that he wouldn't attend. When he posted "Watching SOTU from an airport tv was not part of the plan! FML" while Trump was speaking, those odds cratered and Santos walked away with tens of thousands of dollars.

Kalshi caught it because they have internal account data. They knew which account placed those trades. They froze it, referred the matter to the DOJ and the CFTC, and here we are.

The thing that stuck with me reading the story was a simpler question: what could you see from the outside?

The Public API Question

Kalshi has a solid public REST API. No auth required for market data. I'd poked at it before for other projects, so I knew the shape of it. The question was whether the historical trade data was rich enough to detect the Santos pattern after the fact.

The Santos pattern, stripped down, is:

Someone with private knowledge establishes a position
A public statement moves the market significantly
Their position was already on the right side of that move

You can't see who placed trades from the public API. But you can see that unusual volume concentration happened before a market moved. That's the tipping-point signal -- not proof of anything, but a flag worth examining.

So I started pulling on the thread.

What I Found in the API

After some exploration (the historical endpoint pagination is a bit of an adventure), I found that Kalshi's public API exposes:

Full trade history per market -- ticker, price, contract size, timestamp, block trade flag
Candlestick data at 1-minute, 1-hour, and 1-day intervals
Market metadata including series, category, open/close time, volume

What it doesn't expose: user identity, order placement timestamps (only fill time), or anything that would let you name a specific trader.

The Santos SOTU markets themselves turned out to be inaccessible -- they were structured as multivariate markets with a different ticker format that doesn't surface cleanly through the standard search. But that's fine. The Santos case is the argument for why this tool should exist, not the data it runs on.

What I found instead was more interesting.

The Watchlist

I spent about an hour mapping out which Kalshi series carry genuine MNPI risk -- markets where someone with access to non-public government information would have a meaningful trading edge.

The list got long fast:

Executive Actions -- Cabinet departures (KXCABOUT), next AG (KXNEXTAG), next SecDef, next DNI, pardons, insurrection act invocation, martial law. Anyone in the White House personnel office or inner circle knows these things before markets do.

SCOTUS -- resignation markets, court size change, next justice confirmation. Clerks and justices themselves know when a retirement is coming.

Economic Data -- Fed rate decisions, CPI, GDP. BLS and BEA staff have the data weeks before release.

Geopolitical -- Greenland acquisition, Panama Canal, Taiwan recognition, Zelensky-Putin talks. NSC and State Department staff work these negotiations in real time.

Congressional -- impeachment, house/senate control, government shutdown, veto override. Congressional whips know vote counts before anyone else.

That's 33 series, 410 open markets, and a combined trade history of over 123,000 records from just the first collection run.

The Anomaly Scorer

With data flowing in, I built a three-signal scorer:

Volume Z-Score -- compares recent trade volume against a rolling baseline. A spike of 8+ standard deviations above normal is a strong signal. This is the Santos pattern in statistical form: someone piling into a position before a market-moving event.

Block Trade Ratio -- the API flags block trades (large privately-negotiated contracts) separately. Heavy block trade concentration on a political market before an announcement is suspicious in a way that retail chatter isn't.

Price Divergence -- detects sudden directional price movement inconsistent with gradual drift. The signature of a market that already knows something.

The compound score combines all three with a block trade modifier. Thresholds: yellow at 25, red at 60.

What Flagged on Day One

Four markets scored above threshold within hours of the first collection run:

Market	Score	What it is
KXNEXTAG-29-TBLA	167	Todd Blanche for AG, 22% price move
KXNEXTODNI-29-RCRA	156	Rick Crawford for DNI, 129% price move
KXGREENTERRITORY-29	152	US acquires Greenland, vol z=11.6
KXCPI-26MAY-T0.5	105	CPI above 0.5%, moving DOWN one week before June 10 release

To be clear: high scores don't mean insider trading happened. They mean unusual market activity that warrants a closer look.

TBLA is Todd Blanche -- Trump's former personal attorney who represented him through the criminal trials. RCRA is Rick Crawford, the Arkansas congressman who has been floated for intelligence roles. Both markets were flagging compound anomalies with volume z-scores above 8 on pmwatch's first run.

The Todd Blanche signal is worth noting specifically. It strengthened from 118 to 167 between the first and second hourly collection runs, which is the pattern you'd expect from a position being built ahead of an announcement rather than random noise. And the CPI market moving down with unusual volume a week before BLS publishes the May number? That's a flag worth watching.

I don't know what any of this means. That's kind of the point. pmwatch surfaces candidates. Investigators with subpoena power figure out the rest.

Day Two Update (June 4, 2026)

Overnight the signal count grew from 6 to 21. A few notable developments:

Todd Blanche (KXNEXTAG-29-TBLA) is now at 377 -- up from 167 at launch. That's the strongest and most persistent signal on the board, strengthening across every hourly collection run since yesterday.

New: Ghislaine Maxwell pardon market flagged at 346
(KXTRUMPPARDONS-29JAN21-GMAX) -- vol z-score of 24.6, the highest single
reading pmwatch has produced. The price hasn't moved yet, which means large volume is accumulating without public market-moving information to explain it. Detected at 11:46 AM ET, June 4. Screenshot timestamped and saved.

Sean Combs pardon market also appeared (KXTRUMPPARDONS-29JAN21-SCOM)
as a lower-score signal in the same collection run.

CPI cluster is now 4 signals strong ahead of the June 10 BLS release -- multiple CPI strike markets flagging simultaneously with scores ranging from 99 to 267.

To be clear: none of this proves anything. pmwatch flags unusual market
activity. Whether any of it represents insider trading is for investigators with subpoena power to determine. But the Maxwell signal at 346 with a 24.6 z-score is the kind of reading this tool was designed to surface.

The repo is live and the scheduler is running. Come back after the June 10
CPI release to see if the markets were right.

The Stack

Nothing exotic here:

Python for the collector and scorer
SQLite for storage (plenty for this use case)
APScheduler to run collection every 60 minutes
FastAPI for the API layer
Plain HTML + JS for the dashboard -- no build step, deploys anywhere

The entire thing runs on the public Kalshi API with no authentication. Setup is three commands:

git clone https://github.com/lweiss01/pmwatch.git
cd pmwatch
pip install -r requirements.txt
python scheduler.py
uvicorn api:app --port 8000

What This Can and Can't Do

Can detect: unusual volume concentration before events, block trade anomalies, price divergence inconsistent with public information flow.

Can't do: identify specific traders, prove intent, access order placement timestamps (only fill time is public), or monitor Polymarket (offshore, separate API).

This is a tipping-point detector. It finds markets that look wrong. Kalshi and the CFTC have the identity layer to take it further.

Why It Matters Right Now

The CFTC issued an advisory on prediction market insider trading in February 2026. H.R. 7004, the Public Integrity in Financial Prediction Markets Act, was introduced in January after a trader made $400,000 on the Polymarket Maduro capture market right before the US military operation. The Senate passed a resolution in April preventing its own members from using prediction markets.

The regulatory infrastructure is being built right now. Public monitoring tools that work from open data are part of that ecosystem -- the same role that STOCK Act trackers like Quiver Quantitative play for congressional stock trading, but for the prediction market era.

GovGreed tracks 190,000 congressional stock trades and scores them with a 7-layer AI model. Nobody is doing that for prediction markets yet.

pmwatch is a start.

What's Next

The immediate roadmap:

Social media correlation -- cross-reference X post timestamps with trade anomaly windows
Government calendar integration -- BLS release schedule, FOMC blackout periods
Polymarket support
Webhook alerts for high-score signals

The repo is at https://github.com/lweiss01/pmwatch. Issues and PRs welcome. If you work in fintech, policy, or prediction markets and want to talk about where this goes, I'm easy to find.

Built June 3, 2026. The same day the Santos story broke. Sometimes timing is everything.

I Thought My AI Agents Had a Memory Problem. They Didn't.

lweiss01 — Tue, 02 Jun 2026 17:15:00 +0000

When I started building Holistic, I thought I was working on a memory problem.

Like many people using AI coding agents, I kept running into the same frustration. A session would end, a new one would begin, and suddenly the project was paying a tax. Decisions had to be rediscovered. Prior attempts had to be reconstructed. The same explanations kept getting repeated because the next participant didn't seem to know what the previous one had already learned.

The obvious explanation was memory. The agent forgot. So I started building checkpoints. Not transcripts. Not giant chat histories. Not larger context windows. I wanted durable project state: a record of what had changed, what had been tried, what had been decided, and what still needed to happen.

That eventually became Holistic. At the time, I thought I was building an agent memory tool. Looking back, I think that description was wrong. Not completely wrong, but incomplete. The clue was a contradiction I couldn't explain.

The Contradiction

As I started using multiple agents more frequently, the memory explanation stopped fitting the evidence. Claude would work on a task, then Codex, then Cursor, and sometimes back to Claude again. Each transition seemed to produce the same failure mode. Context disappeared. Decisions became unclear. Progress slowed while the next participant figured out where things stood.

I kept calling this memory loss until I realized that wasn't actually what I was observing. Claude hadn't forgotten anything. Codex hadn't forgotten anything. Cursor hadn't forgotten anything either.

The next participant simply hadn't been there.

That sounds like a small distinction, but it completely changed how I thought about the problem. The information wasn't being forgotten. It was failing to survive the handoff. Memory and handoff are not the same thing. One is about preserving knowledge inside a participant. The other is about preserving progress across participants.

The more I looked, the more I found examples where memory wasn't the bottleneck at all. The issue appeared precisely when work crossed a boundary: from one agent to another, from one session to another, or from an agent back to a human. The failure wasn't forgetting. The failure was continuity. I just didn't have that word yet.

Following The Wrong Theory

Once the memory explanation started breaking down, I moved to the next obvious theory. Maybe the problem wasn't memory. Maybe it was coordination. Agents needed better ways to communicate. Better handoffs. Better shared artifacts.

That explanation got further because it described more of what I was seeing, but it still didn't explain everything. What caught my attention was that people kept arriving at remarkably similar solutions from completely different directions. Some were using shared files. Others relied on conversation logs. Some created structured handoff records. Others experimented with append-only histories. The technologies varied, but the pattern didn't.

Nobody was arguing about whether the problem existed. They were arguing about how to solve it. That was an important clue, because it suggested I wasn't looking at a personal annoyance or a workflow quirk. I was looking at a category of problem that was beginning to emerge wherever people used multiple agents over time.

Then Andon Happened

Around the same time, I started designing Andon. At first it felt unrelated. Holistic was about preserving state. Andon was about supervision. I wanted a way to answer operational questions:

What is my agent doing?
Is it healthy?
Is it stuck?
Is it drifting?
Do I need to intervene?

The name came from Toyota's andon system, where workers can signal abnormal conditions before defects continue down the production line. I liked the metaphor immediately because it aligned with something I had spent years thinking about professionally: systems, quality, process improvement, and how work flows through organizations. But it wasn't until much later that I realized why the metaphor felt so natural.

The interesting part of Toyota's andon system isn't the board, the lights, or even the cord itself. The interesting part is the assumption underneath it all: the work continues even when the workers change. Factories operate across shifts. Hospitals operate across shifts. Airlines operate across shifts. Large software systems operate across shifts. Participants come and go, but the work remains.

Once I saw that, everything else started to click into place.

The Work Must Survive The Worker

That is the idea I had been circling all along. Not memory. Not coordination. Continuity. Every mature operational system eventually discovers the same requirement: work must survive participant replacement.

That's why organizations create runbooks, status boards, shift reports, architecture records, incident logs, and operating procedures. Their purpose is not simply to help someone remember. Their purpose is to help the work continue when somebody else takes over.

Seen through that lens, Holistic and Andon stopped looking like separate ideas. Holistic preserves continuity. Andon protects continuity. Same problem, different layer.

Holistic answers the question:

What happened before?

Andon answers the question:

What is happening now?

Both exist because the work cannot depend on any single participant, whether that participant is a human, Claude, Codex, Cursor, or whatever comes next. The project persists. The participants do not.

The Thesis Revealed Itself

For a long time I thought I was building tools. Only later did I realize the tools were pulling me toward a theory: the project persists, the participants do not. Once that becomes clear, the question changes. It is no longer "How do we make agents remember?" It becomes "How does the work continue when the worker changes?"

That's a different category of problem. It's not memory, it's continuity. And eventually the thesis condensed into two sentences:

Memory belongs to participants.

Continuity belongs to projects.

I didn't start with that idea. I backed into it through a series of observations that refused to fit my original explanation. Checkpoint by checkpoint, handoff by handoff, and tool by tool, the same pattern kept reappearing until it was impossible to ignore.

Looking Ahead

The future of AI development is probably not one agent with perfect recall. It's many agents, many sessions, many models, many handoffs. Claude this morning. Codex this afternoon. Cursor tonight. A human reviewer tomorrow. A completely different model six months from now.

When I started building Holistic, I thought I was building an agent memory tool. When I started building Andon, I thought I was building a supervision system. Looking back, both were pointing at the same thing. The tools came first. The thesis took longer to reveal itself. But I think I've finally found the idea I was actually building toward:

The work must survive the worker.

Holistic is open source if you want to see how the handoff layer works in practice: github.com/lweiss01/holistic

The Agent-to-Agent Continuity Gap Nobody Is Talking About

lweiss01 — Wed, 27 May 2026 13:55:44 +0000

TL;DR: Most AI agent memory discussions still assume one agent talking to itself across sessions. But real coding workflows already involve Claude, Codex, Cursor, and Gemini touching the same repo in the same week. The hard problem is not "how does an agent remember." It is "how do multiple agents stay coordinated on the same project without stepping on each other." That problem does not live inside any one agent. It lives in the repo.

I wrote a post last week arguing that AI coding agent memory belongs in the repository, not the chat window. Checkpoints, not transcripts.

Sitting with that argument for a few days, I realized it is actually downstream of a bigger one I had not made explicitly yet. The checkpoint primitive only matters because of a problem the current agent stack does not have a name for.

So here it is.

The Industry Map Has A Blind Spot

There is a really good 2026 agent stack map going around right now from Paolo Perrone. Six layers. Models, protocols, memory, frameworks, eval, guardrails. It is a useful map.

But read the memory layer carefully and you notice something.

Every memory tier on that map assumes one agent.

In-context state lives inside one agent's context window
Vector retrieval lives inside one agent's RAG pipeline
Persistent memory services like Letta, Zep, and Mem0 are designed for one agent learning across sessions

That is a real problem and worth solving. But it is not the problem most coding workflows actually have.

Most Coding Workflows Already Use Multiple Agents

Look at how anyone serious is shipping code right now.

Claude for architecture and review.
Codex for implementation.
Cursor for inline iteration.
Gemini for exploration.
A human approving and editing all of it.

That is not a future scenario. That is a Tuesday.

And every single one of those agents has its own context window, its own session, its own memory, its own opinions about the codebase. None of them know what the other ones did an hour ago.

The continuity problem is not "Claude forgot what we discussed yesterday."

The continuity problem is "Claude does not know what Codex implemented this morning, Cursor reverted half of it at lunch, and the human merged something different from a different branch."

That is a coordination problem dressed up as a memory problem.

Agent-to-Agent Memory Does Not Exist Yet

Perrone's map notes this honestly. MCP standardized how agents call tools. It says nothing about how agents talk to each other. IBM has ACP. Google has A2A. Neither is a standard. Neither is widely adopted. Neither solves the coding workflow case.

So in practice, every team running a multi-agent coding workflow is solving this themselves. Usually badly. Usually by re-explaining context to every new session by hand.

The dedicated memory vendors do not solve this either, because they are designed to give one agent a longer memory. Plugging Cursor and Claude Code into the same Mem0 instance and hoping they coordinate is not a thing that works today.

Memory infrastructure is single-agent infrastructure. The multi-agent coordination layer is missing.

The Repo Is The Only Shared Surface

Here is the thing that kept hitting me.

When Claude, Codex, Cursor, and Gemini are all working on the same project, there is exactly one piece of infrastructure all of them already see.

The repository.

They all read it. They all write to it. They all already have file system access through MCP or equivalent. Git already tracks who changed what and when.

The repo is the shared substrate. It is the only shared substrate. Everything else is per-agent.

So if you want continuity across agents, the continuity artifacts have to live in the repo. Not in a vector database that one agent is plugged into. Not in a hosted memory service that another agent does not know about. In the repo. In files. Versioned. Auditable. Diffable. Visible to every agent that can read the file system.

That is what makes checkpoints the right primitive. Not because vector search is bad. Vector search is great for what it does. But you cannot retrieve from a vector store that the next agent has never heard of.

The Reframe

When you stop framing the problem as "agent memory" and start framing it as "multi-agent coordination on a shared artifact," a lot of the tooling debates collapse.

Bigger context windows do not help. The next agent has a different context window.
Better RAG does not help. The next agent has a different RAG pipeline, or no RAG at all.
Hosted memory services do not help unless every agent in your workflow is plugged into the same one, which they are not.
Transcripts do not help, because they are noise and the next agent does not have your transcript.

What does help is a small, structured, versioned record of what was decided, what is in progress, what is at risk, and what the next agent should pick up. Sitting in the repo. Where everyone can see it.

That is the continuity primitive the current stack map does not have a slot for.

What I Am Building

Holistic is an open source CLI exploring this idea. Repo-native checkpoints. Agent-agnostic. No vendor account, no hosted service, no per-agent SDK. Just files in your repo that any agent can read and any agent can update.

It is still early. The thesis is what I am most interested in pressure testing right now.

If you are running a multi-agent coding workflow and you have your own answer to the coordination problem, I want to hear it. If you think I am wrong about the repo being the right substrate, I really want to hear that.

Repo: https://github.com/lweiss01/holistic

The repo remembers, not the window. And no single agent remembers for the others.

Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory

lweiss01 — Sun, 24 May 2026 18:40:21 +0000

TL;DR: AI coding agent memory should live in the repository, not the chat window. Bigger context windows and vector databases are solving the wrong problem. Here is the case for treating the repo itself as the durable cognitive surface.

Everyone is trying to solve AI agent memory right now.

Longer context windows.
Vector databases.
Conversation replay.
Semantic retrieval.
Infinite transcripts.

But after spending months building workflows across Claude, Codex, Gemini, Cursor, and other coding agents, I've started to think we may be treating the wrong thing as the source of truth.

The problem is not:

"How do we make the model remember everything forever?"

The problem is:

"How does a software project remain cognitively coherent across sessions, compaction, agent switches, and time?"

Those are very different problems.

The Context Window Is Not Durable Infrastructure

Modern AI coding workflows are surprisingly fragile.

An agent works for hours. The context window fills up. Compaction happens. Then suddenly:

architectural reasoning disappears
unresolved work gets forgotten
regressions come back
agents undo each other
humans re-explain the same context repeatedly

The industry response so far has mostly been: store more. Bigger context windows, vector databases, hosted memory services, semantic retrieval over giant transcripts.

But transcripts are not understanding.

And replaying giant chat histories is not the same thing as preserving operational continuity.

In practice, most coding workflows do not fail because information disappeared entirely. They fail because the important state was never extracted from the conversation in the first place.

Checkpoints, Not Transcripts

The idea I have been exploring is pretty simple:

Instead of preserving entire conversations forever, preserve structured checkpoints at meaningful moments.

Not:

every token
every thought
every conversational detour

But the things that actually matter:

current state
architectural decisions
unresolved threads
regression risks
next recommended actions
implementation reasoning
handoff context

The checkpoint becomes the durable source of truth.

The live context window becomes disposable working memory.

That distinction changes a lot.

The Repo Should Remember

One realization that kept hitting me while working across multiple coding agents:

The repository itself is the only thing that actually persists.

Agents change.
Models change.
Sessions end.
Windows compact.

But the repo stays.

So instead of treating continuity as something trapped inside a chat session, I started treating continuity as a repo-native concern.

That means:

continuity artifacts live in the repo
handoffs live in the repo
operational state lives in the repo
regression memory lives in the repo
checkpoints live in the repo

The repo remembers, not the window.

Multi-Agent Development Is Already Here

A lot of tooling still assumes:

one human, one agent, one session.

That is not how many people are actually working anymore.

Real workflows increasingly look like:

Claude for architecture
Codex for implementation
Cursor for iteration
Gemini for exploration
a human reviewing all of it
another session tomorrow continuing the work

Continuity is no longer just memory. It is coordination across interchangeable execution surfaces. And once you frame it that way, the chat window stops looking like the right place to store anything important.

AI Agents Are Temporary. Repositories Persist.

I think we are entering a phase where software repositories themselves become cognitive systems:

accumulating decisions
preserving continuity
coordinating work
surviving agent turnover
carrying operational memory forward over time

Not because the models became infinitely smart.

But because the continuity stopped depending entirely on the model session.

That is the direction I have been exploring with Holistic, an open-source CLI for repo-native continuity across agents: https://github.com/lweiss01/holistic

Still early. Still evolving quickly. If you are working across multiple coding agents and running into the continuity problem, I would genuinely love feedback, critiques, or just a conversation about how you are solving it.

The repo remembers, not the window.