DEV Community

Self-Correcting Systems
Self-Correcting Systems

Posted on

The Agent Gets the API Key. You Get the Guinea Pig Seat.

A friend texted me this week, and within a year someone you know is going to send you the same message.

He had seen that you can now connect an AI directly to a brokerage account through an API. He was sure that with the right prompts it could catch every low and sell at every high. Start it with a few hundred dollars, let it run, collect passive income. He believed in it enough to offer me a thousand dollars to set it up.

I told him I would do it for free. Not because the work is worth nothing. Because the only honest version of that work is one I will not charge a friend for, and the dishonest version I will not build for any amount.

Here is why he is not crazy for asking. Robinhood launched agentic trading accounts in May: dedicated accounts, dedicated funds, alerts, pause controls, and MCP-based agent connections. Coinbase's developer platform now documents Coinbase for Agents through CLI/MCP tooling, and its x402 protocol is explicitly built for AI agents to make programmatic stablecoin payments for API access. This is not a rumor or a jailbreak. It is a product direction, built by serious companies.

The infrastructure for handing an AI agent your money shipped in the last few weeks.

The evidence that an AI agent deserves your money did not ship with it. It does not exist yet. And I can prove that gap to you with my own receipts, because I have spent months on both sides of it.

The wave always looks like this

I watched this exact pattern play out in crypto, up close, with people I know.

Crypto has real opportunity in it. But most people only reach for it when the chart is already vertical. They buy the top because the top is when their friends start talking. Then the correction comes, and instead of asking what they actually understood about the thing they bought, they blame the market. The market never changed its nature. They just never studied it before acting on it.

Now watch the same shape arriving in AI. People meet an agent and assume it is an oracle. They hand it a task it was never built for, watch it fail, and conclude AI is a scam. Then they tell the next person, and the misconception spreads in both directions at once: the believers think agents are magic, the burned think agents are useless, and almost nobody in either crowd ran a single controlled test before forming the opinion.

Acting before understanding, then outsourcing the blame. That is the whole wave, every time, in every market. The only people who consistently get hurt are the ones who arrive at the moment of maximum excitement carrying zero evidence. There is a name for the seat they are sitting in. It is the guinea pig seat, and the platforms just installed a fresh row of them.

The question that cuts through all of it

Sit with this one before you connect anything to your money.

If an AI agent plugged into a brokerage API could reliably catch lows and sell highs, why would the brokerage hand you the API?

They have more capital than you, more data than you, better engineers than you, and direct access to the exact same models. An agent that printed money would be the most valuable proprietary system in their building. It would never be a consumer feature. It would be the business.

Instead, it is a consumer feature. Ask why.

Platforms earn on activity, not on your outcomes. Every trade your agent executes generates revenue for the platform whether you win or lose, and an agent never sleeps, never hesitates, and never gets tired of clicking. From the platform's side of the table, an autonomous agent is the perfect customer: a human's bankroll with a machine's trading frequency. The incentive behind the product is more trades, not better ones.

That is not a scandal and it is not a conspiracy. It is an incentive structure sitting in plain sight, and once you see it, the launch announcements read completely differently.

And before your agent's supposed edge ever gets tested, the friction arrives. A few hundred dollars of stake bleeds through spreads, fees, and the inference costs of the model making the decisions. My friend's plan was to start small and compound. Small accounts do not die from bad calls first. They die from costs, quietly, while the prompts keep sounding confident.

What my own receipts say

I run a public AI evaluation research program: a claim ledger of thirty agent-memory claims, with the recent claims frozen and publicly timestamped before results exist, failures published first. I also built my own trading signal system, and I ran it the slow way: paper only, every signal written down before the market moved, opening price captured, closing line compared, settled outcomes only.

Here is the most honest number that system ever handed me. When I audited its confidence scores, the signals that won averaged 0.738 confidence. The signals that lost averaged 0.739.

Read that again. Identical. At that stage, the system felt exactly as sure about its losers as its winners. That number came from an earlier version, and surfacing it is exactly what honest instrumentation is for: it told me what to improve before real money could teach me the same lesson at a markup. The system has evolved a lot since then, and it keeps evolving. But here is the part that matters for you: I only knew any of that because every signal was logged before the outcome existed. The discipline found the flaw. A prompt with no paper trail finds its flaws in your account balance.

Full honesty, since this whole article is about evidence: I have not actively worked on that trading system in weeks. The research lane took over my time. But the monitoring agents never stopped. The day I prepared this article, I checked: my BTC monitor had logged same-day structured events, and has been recording market regime, bias, and confidence the entire time I was busy elsewhere. The dataset kept growing without me.

The baseball side told me something even better. Its odds source went stale weeks ago, and instead of fabricating signals from dead data, the system refused to write any. The dataset stopped growing, on purpose, and flagged the reason.

I want you to notice what that refusal is, because it is the entire lesson of this article in one behavior. A system that keeps producing confident output after its data source dies is exactly the thing that will lose you money. My system would rather go quiet than guess. That property did not come from a clever prompt. It came from months of unglamorous evaluation discipline, and it is the same property I test in my memory research: the clock can say valid while the world says otherwise, and the gate has to believe the world.

The paper sample it preserved is small and I will not dress it up: 29 settled rows, positive but below the sample size I would call meaningful. Here is the whole thing, caveats included:

Metric Value
Settled rows 29 (system flags: insufficient, needs 30+)
Beat closing line 17 of 29 (58.6%)
Avg CLV +3.55 price points
Benchmark best-available local book, not a sharp reference
Money at risk none, paper only

Insufficient evidence, honestly labeled. That label is the product. Most people selling AI trading have never once generated it.

Access is not edge

Everything I publish follows one shape: two things that look identical under hype turn out to be different under pressure.

Relevance is not authority. A memory can match your query perfectly and have no right to govern the action.

Signed is not fresh. A response can be cryptographically valid and still describe a world that no longer exists.

Permission is not purpose. An action can be fully authorized and still be outside what the agent is for.

This is the next layer down, and it is the one that costs real people rent money:

Access is not edge. An API key is permission to execute. It is not evidence of judgment.

The platforms just made access nearly free. They cannot ship the edge alongside it, because the edge was never theirs to give. Edge is built the way mine is still being built: logged decisions, frozen thresholds, settled samples, and the humility to stay on paper when the numbers say coin flip.

What I'm actually doing for my friend

I am not telling him no. I am building it with him, for free, and the honest version looks like this:

The agent connects read-only first. It observes, analyzes, touches nothing. Every decision it would have made gets logged on paper with the price at decision time, so there is no retroactive genius. Before any of it starts, we freeze the gate in writing: the agent must beat simply buying and holding, over a settled sample, by a margin we set in advance. Numbers first, money later, or money never.

If it passes, it will have earned what no prompt can claim. If it fails, the system will have saved him the bag instead of costing him one, and that is a win he could not have bought for a thousand dollars.

The build takes a weekend. The evidence takes months. People keep paying for the build. The evidence was always the only part worth anything.

The honest close

Agents trading real money will probably work someday. When it does, it will arrive through the boring door: decision logs, frozen gates, settled samples, published failures. It will not arrive through a midnight prompt that promises every low and every high.

Until then, understand what is actually being sold. The platforms shipped the access and kept the incentive. The influencers are selling the dream and keeping the course fee. The only thing nobody is handing out is evidence, because evidence cannot be handed out. It has to be grown, slowly, in public, with receipts.

Do the research before the action. Understand what the thing is before you hand it what you have. That is not anti-AI. I build with these systems every single day, and that is exactly why I will not lie to you about them. Helping people see clearly is the whole job.

The guinea pig seats are filling up fast, and they are free to sit in.

The exit row costs months of paper. I know which seat I am in.


Not financial advice. I am not claiming agents can never trade. I am claiming evidence must precede execution, and right now the infrastructure has shipped ahead of the evidence. My evaluation harness, claim ledger, and failure record are public if you want to check whether I hold my own work to the standard I just described.

Source links:

Top comments (0)