Austin Starks

Posted on Apr 13 • Originally published at nexustrade.io

Coinbase calls their chatbot an agent. I got fired for building a real one.

#programming #ai #tutorial #career

Note from the author: You're reading a Dev.to adaptation. The original on NexusTrade includes interactive trace viewers, animated diagrams, equity curve visualizations, and embedded course exercises. Read it there for the full experience.

Coinbase fired me for building something they couldn't.

Coinbase Advisor's own FAQ asks it directly: "Does the AI make trades on my behalf?" The answer: No. It answers your questions. It suggests a portfolio. Then it waits for you to click. That's the architecture of a chatbot: one question, one answer, you do the rest.

NexusTrade is fundamentally different. It runs a loop. You send one message. Aurora figures out what it needs to know, calls tools, reads the results, and keeps going until the task is done. No clicking through each step. No waiting for your approval on every action unless you want it.

Tell Aurora to research a stock, build a strategy, backtest it across three market regimes, and stage it for live trading. It chains all of that together on its own. Whether trades execute automatically or require your sign-off is up to you.

The Actual Difference

Coinbase built a 24/7 AI advisor. I built an autonomous agent. Here's the gap.

Coinbase's marketing for Advisor is well-written. "Elite financial advice, democratized." "Turn your questions into actionable financial plans." Strong copy. The product behind it is a chatbot that generates recommendations you then manually execute.

A team of engineers and millions of dollars, and they built a chatbot that waits for you to click. Here's what that buys you versus an actual agent:

Capability	Coinbase Advisor	NexusTrade (Aurora)
Answers financial questions	Yes	Yes
Executes trades autonomously	No, requires explicit approval	Yes, fully automated mode
Multi-step task chaining	No	Yes, up to 50 iterations
Builds and backtests a strategy	No	Yes, in a single agent run
Spawns subagents for parallel work	No	Yes
Human-in-the-loop approval controls	Required for all actions	Optional, per-action toggle

The bottom row is the one that matters. Coinbase Advisor requires approval for everything because it has no loop. It answers, then stops. NexusTrade makes approval optional because Aurora has a loop. It can keep going on its own, and the approval controls let you decide how much of that autonomy you want.

The loop is the product. Everything else is a feature.

The Problem

A chatbot answers once. That's not enough.

You ask a chatbot: "Build me a momentum strategy and backtest it." It generates a paragraph describing what a momentum strategy should look like. Then it stops. If you want the backtest, you take what it said and do the work yourself.

That's the one-shot problem. A language model answers the question it's given. It doesn't ask the next question, run the next tool, or check if the answer it gave was actually correct. It responds and waits.

For a lot of tasks, that's fine. Answering a question is useful. But there's a whole class of work that requires chaining actions together: research a stock, pull its indicators, build a strategy based on what you find, backtest it, check if it survived 2022, adjust the parameters, backtest again. No single response handles that. You need something that runs until the task is done.

You need a loop.

Q: What is the difference between a chatbot and an AI agent?

A: A chatbot responds once and waits for you. An AI agent runs a loop: think, act, observe, repeat. It chains multiple tool calls together until the task is done without you directing each step.

The Loop

Thought. Action. Observation. Repeat.

In 2022, a research team at Google published a paper called ReAct: Synergizing Reasoning and Acting in Language Models. The core idea was simple: instead of asking a model to produce a final answer, ask it to produce a thought, then an action, then read the result, then think again. Repeat until done.

Thought → Action → Observation is the pattern every agent today runs on. Cursor uses it when you ask it to refactor a file and it reads the file, makes an edit, checks the diff, and continues. Claude Code uses it when it plans a multi-step task, runs each step, observes the output, and adjusts. Aurora uses it every time you send it a complex request.

That loop is not a metaphor. It's a while loop in production code. Aurora runs it on a background worker that polls every 500ms. Each iteration increments a counter, calls the model, executes the tool, saves the result, and goes again until the model sets finalAnswer or the iteration limit is hit.

# the ReAct loop, stripped to its core
messages = [system_prompt, user_task]  # full conversation history

while iteration < max_iterations:
    output  = llm.complete(messages)   # Thought → Action → Input

    if output.action == "final_answer":
        return output.answer           # task complete

    result  = tools[output.action](output.input)  # execute tool
    messages += [output, result]       # model sees what it did + what happened
    iteration += 1

return summarize(messages)  # hit the limit — compress + return best answer so far

LangChain runs that same while loop internally:

from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm   = ChatOpenAI(model="gpt-4o-mini")
tools = [stock_screener, backtest_tool, create_strategy]

# ReAct prompt tells the model to output Thought / Action / Observation
prompt   = hub.pull("hwchase17/react")
agent    = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "Research NVDA and build a momentum strategy"
})
# verbose=True prints each Thought / Action / Observation to the terminal.
# You'll see the while loop running in real time.

Aurora implements the same loop directly, without LangChain, on a TypeScript background worker with a state machine that handles summarization, parallel subagents, and iteration limits. The pattern is identical. The infrastructure is purpose-built for trading. LangChain abstracts the loop, but hides the cost controls — per-iteration token metering, configurable summarization thresholds, and circuit breakers that halt a runaway agent are not concerns it was designed to handle.

Production Reality

What it actually looks like when Aurora runs.

The screenshots below are from a real Aurora session. The task: build a fully autonomous 0DTE SPY options bot on a $25,000 account.

Before the ReAct loop starts, a separate step runs first: the planner. This is a specialized prompt, distinct from the main loop, that takes the user's request, reasons about what it needs to know, and generates a structured plan. It's not iteration 1. It's the step before iteration 1. Aurora asks clarifying questions here, not because the loop told it to, but because the planner is designed to gather everything the loop will need before the first tool call fires:

Once the planner has what it needs, it produces a structured plan and hands it to the main agent. The ReAct loop starts. This is the Thought + Action visible in the approval modal:

After that action runs, the result comes back as an observation. The agent reads it, generates the next Thought, and continues. Here's the SPY regime data Aurora used to decide which options hypothesis to test first:

And here's the plan Aurora generated for the full strategy before calling any tool:

War story — $300 in one day: When I launched strategy-triggered agents, I made one mistake: I didn't charge tokens for the planning phase. The planner is a separate LLM call that happens before the loop. I had token checks on the iterations. The planner, at launch, was free. A user configured a strategy with conditions that were almost always true, set to trigger as frequently as possible. Every time the condition fired: plan, stop (out of tokens), fire again. Plan. Stop. Fire. $300 in API costs. One day. Hundreds of planning calls. None made it past iteration 1. The fix was two lines: deduct tokens before the planner runs, and deactivate the strategy if the user can't afford it. The step before the loop is just as real as the loop. Gate them all.

Autonomy Controls

Fully automated or semi-automated. You choose how much to trust it.

Every agent system has to answer one question: how much should the agent do on its own before checking in with a human? The naive answer is "let it run." The production answer is: every tool your agent can call should be explicitly approved for autonomous execution, or it should require human sign-off first.

Think of it as a whitelist. Some tools are cheap, fast, and reversible: reading market data, running a screener, generating a plan. Those can run without asking. Others are expensive, slow, or irreversible: submitting a live trade, deploying a strategy, deleting something. Those should pause and wait. The toggle in Aurora's UI is the implementation of that concept.

Aurora has two modes:

Fully Automated — runs the entire loop without stopping. Every tool call executes immediately. You see the results when it's done.
Semi-Automated — pauses before every action for your approval. You see the Thought and the proposed Action before anything executes.

In semi-automated mode, Aurora shows you its Thought and proposed Action before running any tool. You can approve, reject with feedback, or switch to fully automated if you've seen enough to trust it.

This is what "human-in-the-loop" actually means in production. It's not a philosophical stance about AI safety. It's a checkbox in the UI. Most experienced users start in semi-automated mode to verify the plan, then switch to fully automated once they trust the direction.

Q: Why use subagents instead of giving one agent all the tools and letting it run everything?

A: Context window limits. By iteration 15-20, thousands of tokens of Thought/Action/Observation history are accumulating and reasoning quality drops. Subagents keep each context small and focused on one task.

The Engineering Problem Nobody Talks About

Long loops make agents dumb. Here's how Aurora solves it.

The ReAct loop has a problem that shows up around iteration 15-20: the context window fills up. Every Thought, Action, and Observation gets appended to the conversation. By the time you're 20 iterations deep, the model is attending to thousands of tokens just to decide what to do next. Reasoning quality drops. The agent starts making worse decisions.

The standard advice is "use subagents to keep contexts small." That's true and Aurora does it. But there's a second mechanism that's less talked about: conversation summarization.

At iteration 20, Aurora doesn't stop. It summarizes. The model compresses everything it has learned: findings, portfolios created, what worked, what didn't. That summary becomes the context for a new conversation. The loop restarts with a clean window and the knowledge of everything that came before.

The hard cap is separate: totalIterations accumulates across all conversation resets. When you hit your configured maximum (default: 20 total, up to 50 on premium), the agent stops and delivers a final answer with whatever it accomplished. The summarization is how it stays sharp. The hard cap is how you control cost.

Aurora also queries its own memory across sessions. If you've run five agent tasks this week, it can synthesize findings from all of them and bring relevant context into a new run. That's not the base model, which has no memory. That's the app layer, built on top of a stateless model, giving it continuity across sessions that the model itself can't have.

Module 3

Reading about the loop is not the same as running it.

The ReAct loop looks simple on paper. In practice, the interesting questions are the ones that only come up when you run it: Why did it pick that tool instead of a different one? Why did the reasoning change after iteration 5? What happens when a tool call fails?

Module 3 of AI Agents from Scratch puts you in the loop directly. You send Aurora a real task in fully automated mode and watch each Thought → Action → Observation cycle play out in real time. Then you switch to semi-automated and approve or reject actions one at a time.

You'll also see what happens when context starts to accumulate, when the summarization triggers, and what the compressed summary actually looks like. These aren't simulations. It's the live production system.