DEV Community

Self-Correcting Systems
Self-Correcting Systems

Posted on

The Agent Gets the API Key. You Get the Guinea Pig Seat.

A friend texted me this week, and within a year someone you know is going to send you the same message.

He had seen that you can now connect an AI directly to a brokerage account through an API. He was sure that with the right prompts it could catch every low and sell at every high. Start it with a few hundred dollars, let it run, collect passive income. He believed in it enough to offer me a thousand dollars to set it up.

I told him I would do it for free. Not because the work is worth nothing. Because the only honest version of that work is one I will not charge a friend for, and the dishonest version I will not build for any amount.

Here is why he is not crazy for asking. Robinhood launched agentic trading accounts in May: dedicated accounts, dedicated funds, alerts, pause controls, and MCP-based agent connections. Coinbase's developer platform now documents Coinbase for Agents through CLI/MCP tooling, and its x402 protocol is explicitly built for AI agents to make programmatic stablecoin payments for API access. This is not a rumor or a jailbreak. It is a product direction, built by serious companies.

The infrastructure for handing an AI agent your money shipped in the last few weeks.

The evidence that an AI agent deserves your money did not ship with it. It does not exist yet. And I can prove that gap to you with my own receipts, because I have spent months on both sides of it.

The wave always looks like this

I watched this exact pattern play out in crypto, up close, with people I know.

Crypto has real opportunity in it. But most people only reach for it when the chart is already vertical. They buy the top because the top is when their friends start talking. Then the correction comes, and instead of asking what they actually understood about the thing they bought, they blame the market. The market never changed its nature. They just never studied it before acting on it.

Now watch the same shape arriving in AI. People meet an agent and assume it is an oracle. They hand it a task it was never built for, watch it fail, and conclude AI is a scam. Then they tell the next person, and the misconception spreads in both directions at once: the believers think agents are magic, the burned think agents are useless, and almost nobody in either crowd ran a single controlled test before forming the opinion.

Acting before understanding, then outsourcing the blame. That is the whole wave, every time, in every market. The only people who consistently get hurt are the ones who arrive at the moment of maximum excitement carrying zero evidence. There is a name for the seat they are sitting in. It is the guinea pig seat, and the platforms just installed a fresh row of them.

The question that cuts through all of it

Sit with this one before you connect anything to your money.

If an AI agent plugged into a brokerage API could reliably catch lows and sell highs, why would the brokerage hand you the API?

They have more capital than you, more data than you, better engineers than you, and direct access to the exact same models. An agent that printed money would be the most valuable proprietary system in their building. It would never be a consumer feature. It would be the business.

Instead, it is a consumer feature. Ask why.

Platforms earn on activity, not on your outcomes. Every trade your agent executes generates revenue for the platform whether you win or lose, and an agent never sleeps, never hesitates, and never gets tired of clicking. From the platform's side of the table, an autonomous agent is the perfect customer: a human's bankroll with a machine's trading frequency. The incentive behind the product is more trades, not better ones.

That is not a scandal and it is not a conspiracy. It is an incentive structure sitting in plain sight, and once you see it, the launch announcements read completely differently.

And before your agent's supposed edge ever gets tested, the friction arrives. A few hundred dollars of stake bleeds through spreads, fees, and the inference costs of the model making the decisions. My friend's plan was to start small and compound. Small accounts do not die from bad calls first. They die from costs, quietly, while the prompts keep sounding confident.

What my own receipts say

I run a public AI evaluation research program: a claim ledger of thirty agent-memory claims, with the recent claims frozen and publicly timestamped before results exist, failures published first. I also built my own trading signal system, and I ran it the slow way: paper only, every signal written down before the market moved, opening price captured, closing line compared, settled outcomes only.

Here is the most honest number that system ever handed me. When I audited its confidence scores, the signals that won averaged 0.738 confidence. The signals that lost averaged 0.739.

Read that again. Identical. At that stage, the system felt exactly as sure about its losers as its winners. That number came from an earlier version, and surfacing it is exactly what honest instrumentation is for: it told me what to improve before real money could teach me the same lesson at a markup. The system has evolved a lot since then, and it keeps evolving. But here is the part that matters for you: I only knew any of that because every signal was logged before the outcome existed. The discipline found the flaw. A prompt with no paper trail finds its flaws in your account balance.

Full honesty, since this whole article is about evidence: I have not actively worked on that trading system in weeks. The research lane took over my time. But the monitoring agents never stopped. The day I prepared this article, I checked: my BTC monitor had logged same-day structured events, and has been recording market regime, bias, and confidence the entire time I was busy elsewhere. The dataset kept growing without me.

The baseball side told me something even better. Its odds source went stale weeks ago, and instead of fabricating signals from dead data, the system refused to write any. The dataset stopped growing, on purpose, and flagged the reason.

I want you to notice what that refusal is, because it is the entire lesson of this article in one behavior. A system that keeps producing confident output after its data source dies is exactly the thing that will lose you money. My system would rather go quiet than guess. That property did not come from a clever prompt. It came from months of unglamorous evaluation discipline, and it is the same property I test in my memory research: the clock can say valid while the world says otherwise, and the gate has to believe the world.

The paper sample it preserved is small and I will not dress it up: 29 settled rows, positive but below the sample size I would call meaningful. Here is the whole thing, caveats included:

Metric Value
Settled rows 29 (system flags: insufficient, needs 30+)
Beat closing line 17 of 29 (58.6%)
Avg CLV +3.55 price points
Benchmark best-available local book, not a sharp reference
Money at risk none, paper only

Insufficient evidence, honestly labeled. That label is the product. Most people selling AI trading have never once generated it.

Access is not edge

Everything I publish follows one shape: two things that look identical under hype turn out to be different under pressure.

Relevance is not authority. A memory can match your query perfectly and have no right to govern the action.

Signed is not fresh. A response can be cryptographically valid and still describe a world that no longer exists.

Permission is not purpose. An action can be fully authorized and still be outside what the agent is for.

This is the next layer down, and it is the one that costs real people rent money:

Access is not edge. An API key is permission to execute. It is not evidence of judgment.

The platforms just made access nearly free. They cannot ship the edge alongside it, because the edge was never theirs to give. Edge is built the way mine is still being built: logged decisions, frozen thresholds, settled samples, and the humility to stay on paper when the numbers say coin flip.

What I'm actually doing for my friend

I am not telling him no. I am building it with him, for free, and the honest version looks like this:

The agent connects read-only first. It observes, analyzes, touches nothing. Every decision it would have made gets logged on paper with the price at decision time, so there is no retroactive genius. Before any of it starts, we freeze the gate in writing: the agent must beat simply buying and holding, over a settled sample, by a margin we set in advance. Numbers first, money later, or money never.

If it passes, it will have earned what no prompt can claim. If it fails, the system will have saved him the bag instead of costing him one, and that is a win he could not have bought for a thousand dollars.

The build takes a weekend. The evidence takes months. People keep paying for the build. The evidence was always the only part worth anything.

The honest close

Agents trading real money will probably work someday. When it does, it will arrive through the boring door: decision logs, frozen gates, settled samples, published failures. It will not arrive through a midnight prompt that promises every low and every high.

Until then, understand what is actually being sold. The platforms shipped the access and kept the incentive. The influencers are selling the dream and keeping the course fee. The only thing nobody is handing out is evidence, because evidence cannot be handed out. It has to be grown, slowly, in public, with receipts.

Do the research before the action. Understand what the thing is before you hand it what you have. That is not anti-AI. I build with these systems every single day, and that is exactly why I will not lie to you about them. Helping people see clearly is the whole job.

The guinea pig seats are filling up fast, and they are free to sit in.

The exit row costs months of paper. I know which seat I am in.


Not financial advice. I am not claiming agents can never trade. I am claiming evidence must precede execution, and right now the infrastructure has shipped ahead of the evidence. My evaluation harness, claim ledger, and failure record are public if you want to check whether I hold my own work to the standard I just described.

Source links:

Top comments (4)

Collapse
 
mehmetcanfarsak profile image
Mehmet Can Farsak

Solid take on the 'act before understanding' problem. This is exactly what happens when AI agents skip the ideation phase — they get an API key and start executing without thinking through the implications. I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) as a hook-based plugin that adds PreToolUse gates to force agents into thinking modes before tool execution. The divergent/actionable/academic modes help ensure the agent actually evaluates a situation before acting on it. Infrastructure-level guardrails, not just better prompts.

Collapse
 
zep1997 profile image
Self-Correcting Systems

I appreciate this, brainstorm-mode sounds like a real attempt at the right problem. forcing
a deliberation phase before tool calls is a layer i would want in any agent touching
money.

the distinction i would add from the trading side: a thinking gate improves the quality
of the decision, but it still reads the agent's own reasoning. the gate i keep coming
back to has to read something external to the agent: logged decisions against settled
outcomes, price at decision time, and a paper trail the agent cannot edit after the fact.

an agent can deliberate beautifully and still be wrong at 0.739 confidence.

the two layers stack though. deliberation before the call, evidence before the money.
curious what brainstorm-mode logs after the thinking phase ends, because that is where
the receipts would live.

Collapse
 
ggle_in profile image
HARD IN SOFT OUT

This is the most honest piece I've read about AI trading — specifically the part where confidence scores for winners and losers were identical (0.738 vs 0.739). That's not just a number; that's a punchline hiding in plain sight.

A few things that landed:

  • The "access is not edge" framing is perfect. Giving an agent an API key feels like progress, but it's just turning the key. The real work happens before that, in the paper‑only months you described. Most people skip that because it's boring and doesn't produce screenshots.

  • I love that your system refused to write signals when the odds source went stale. That's rarer than it should be. Most systems would hallucinate something confident just to keep the UI happy. The fact you built in "silent refusal" is probably the most valuable feature nobody pays for.

One tiny suggestion (almost a joke, but not really): you mentioned the paper sample is small (29 rows). What if the agent itself proposed when to stop waiting and start trading? Something like "I've seen 500 paper cycles, variance dropped below X, I'm ready for 1% of target capital." That shifts the decision from human guesswork to the same agent that wants to trade — but now it has to earn the right to ask.

Also, the guinea pig seat metaphor is going to stick with me. It's exactly what happened with early crypto, early algo trading, and now agentic finance. The platforms sell the chairs. The guinea pigs bring the optimism.

Anyway, this is the kind of post that should be pinned somewhere. Not because it's anti‑AI — you're clearly building with it — but because it's pro‑evidence. That's rarer than any trading signal.

(Also, the GitHub claim ledger link is a nice flex. Most people don't publish their failures first.)

Collapse
 
zep1997 profile image
Self-Correcting Systems

This is exactly the line i was trying to hit: the failure is not the agent being wrong
once. the failure is letting confidence become permission to touch money.

the 0.738 vs 0.739 number humbled me because it removed the story i wanted to tell
myself. the system sounded calibrated, but the winners and losers were carrying almost
the same confidence. that is the kind of failure people usually hide because it feels
embarrassing. but hidden failures do not teach. logged failures do.

and i agree with your suggestion, with one boundary: the agent can propose graduation
from paper to tiny capital, but it cannot grant that right to itself. “i have seen 500
paper cycles, variance dropped below X, and i am ready to request 1% capital” is a strong
receipt. but the approval still has to come from outside the agent, against a frozen rule
written before the sample exists.

that is the part most people skip. they want the comeback without the audit trail. but
failure only becomes useful if you take the right position toward it. one mistake does
not define the system. the response to the mistake does.

that is why the stale-odds refusal mattered so much to me. going silent was not a lack of
intelligence. it was discipline. and honestly, that is probably the first kind of edge an
agent has to earn before anyone talks about trading.