Austin Starks

Posted on Apr 13 • Originally published at nexustrade.io

Everyone thinks ChatGPT is an AI agent. It's not.

#programming #ai #chatgpt #tutorial

Note from the author: You're reading a Dev.to adaptation. The original on NexusTrade includes interactive trace viewers, animated diagrams, equity curve visualizations, and embedded course exercises. Read it there for the full experience.

Everyone thinks ChatGPT is an AI agent. It isn't.

It's a chatbot with tools. And that difference is the reason most "AI agent" startups don't actually work.

The distinction isn't semantic. It changes what you can build, what breaks, and why. If you're building an agent, evaluating one, or wondering why the product you're using doesn't do what it claims, this is the answer.

Start Here

A language model knows nothing. That's by design.

A raw language model is stateless. It has no memory of you. It doesn't know what happened in markets today. It can't look anything up. All it can do is take whatever text you hand it and predict what should come next.

That sounds limiting. It is. But it's also the foundation everything else builds on. The OpenAI Playground is the closest thing to a language model in its purest form. No apps layered on top. No tools. Just a system prompt, a conversation, and a model responding to exactly what you give it.

The OpenAI Playground shows you the raw model. No tools. No memory. No app layer. Ask it your name and it doesn't know. Give it your name in the system prompt and now it knows. Everything the model knows in a given conversation came from somewhere in the prompt: system message, user message, or tool results. Nothing else.

ChatGPT is an app built on top of that model. It knows your name because it has memory. It can search the web because it has tools. Those things aren't the model. They're layers the app added. Strip them away and you're back to the Playground.

ChatGPT still operates as a back-and-forth conversation where you're the one directing every move. That's a chatbot. An agent is something that can direct itself.

An agent runs a loop. It thinks, picks an action, executes it through a tool, observes the result, and repeats until the task is done or it can't continue. You don't direct each step. The agent does. That loop is what separates it from every chatbot you've ever used. Tools and system prompts are how you build the loop. The loop is what makes it an agent.

Q: What is the difference between ChatGPT and an AI agent?

A: ChatGPT responds once per message and waits for you to reply. An AI agent runs a loop: it thinks, takes an action through a tool, observes the result, and decides what to do next — all without waiting for you to direct each step.

System Prompts

The instructions the user never sees.

Before any user message reaches a language model in a production app, there's a system prompt. It runs first, every time. It tells the model who it is, what it can do, what format to respond in, and how to handle edge cases.

A well-designed system prompt isn't a paragraph of vague instructions. It has structure: an identity section, explicit directives, data sources or context, examples of correct behavior, and output format rules. The model's responses are only as good as the system prompt shaping them.

# INSTRUCTIONS
You are Aurora, an AI trading assistant for NexusTrade.
You help users build, backtest, and manage trading strategies.
Always respond in JSON. forceJSON: true.
Never recommend a specific stock without a supporting backtest.
If the request is ambiguous, ask one clarifying question before proceeding.

# EXAMPLES
User: "I want to back test a trading strategy"
Assistant: {"tool": "backtest", "portfolio_id": "...", "start": "2022-01-01", "end": "2024-01-01"}

User: "Screen for high momentum stocks"
Assistant: {"tool": "screener", "query": "SELECT ticker FROM stocks WHERE rsi_14 > 70 ORDER BY momentum DESC"}

# OUTPUT FORMAT
Always respond in syntactically valid JSON.
No markdown fences. No explanation unless explicitly asked.
Schema: {"tool": string, "parameters": object}

What makes that system prompt work? Each section has a specific job. Instructions pin the model's identity and hard constraints. If it's not written down, the model will invent behavior. Examples show the model what correct output looks like without having to explain it in prose — one good example beats three paragraphs of description. Output format eliminates ambiguity about structure. Without it, the model might respond in JSON sometimes and plain text other times, and your parser breaks.

The bad version of this prompt is four words: "You are a trading assistant." The model will try to be helpful and will fail in unpredictable ways. No output contract means you'll get markdown one response and raw JSON the next. No examples means the model guesses what "backtest" should return. No constraints means it'll recommend NVDA when it shouldn't, apologize when it doesn't need to, and ask five clarifying questions instead of one. Every missing line is a failure mode you'll discover in production.

Prompt engineering is designing the instructions that run silently before the user types anything. In production, that's the difference between an AI that does what you need and one that does something close but wrong in ways you can't predict.

Zero-shot vs. one-shot — same prompt, different parser:

Zero-shot (no examples)	One-shot (one example added)
Output format is inconsistent — sometimes JSON, sometimes prose	Output format is reliable — the model mirrors the example
Edge cases produce unpredictable structures	Edge cases degrade gracefully
Parser breaks in production	Parser handles it

From the course: In Module 2's first exercise, you build a real system prompt from scratch and run it against Gemini using a token grant we give you. You write the instructions, the examples, and the output format rules. Then you render it and see exactly what the model receives. Most people have never seen a production system prompt in full.

Tools

The AI doesn't execute anything. Your code does.

Here's the thing most people get wrong about AI agents: the model doesn't actually do anything. It generates text. Your system reads that text, figures out what to do with it, and executes the action. The result comes back. The model sees it and continues.

That's a tool call. The model outputs a structured JSON object that describes what it wants to do. Your code parses the JSON and runs the actual function. Nothing happens until your system does something with the output.

A concrete example. If the model outputs this:

{
  "tool": "backtest",
  "portfolio_id": "abc123",
  "start_date": "2022-01-01",
  "end_date": "2024-01-01"
}

The JSON itself does nothing. Your system reads it, calls the backtest API with those parameters, gets the results, and feeds them back into the conversation. Now the model can see what happened and decide what to do next.

This is why "the AI is doing it" is a slightly misleading frame. The AI is deciding what to do. Your infrastructure is doing it. The distinction matters because it means every tool an agent has is something a human explicitly built and wired up. Agents don't gain new capabilities on their own.

Q: An AI agent outputs a tool call to "buy 10 shares of AAPL." What actually executes the trade?

A: Your code does. The agent generates a JSON object describing what it wants. Your system parses it, calls the brokerage API with those parameters, and returns the result. The model never touches the market directly.

Production Reality

How this scales: 23 sub-prompts and one classifier.

Once you understand system prompts and tools, you can build an agent that does one thing well. The harder problem is building one that does many things well without the system prompt becoming impossible to maintain.

The answer most production apps land on is the same: don't build one giant prompt. Build many focused ones and route between them.

War story — Aurora V1 (2023): The first version ran on GPT-3. 2,048-token context window. One giant prompt, but the output window was so small it couldn't generate a full portfolio object in a single call. So I chained three separate prompts: portfolio → conditions → actions → an orchestration step to stitch the pieces together. JSON mode didn't exist yet. I'd instruct the model to respond in JSON, it would partially comply, I'd parse the output, watch it fail, then retry up to three times with a message explaining exactly where the JSON broke. Every prompt was a hardcoded string in the source code. Changing one instruction meant a code deploy. Aurora v1 did exactly one thing: create portfolios. That's it.

The classifier exists because I built that version first.

The controller is the decision layer that sits between the user and every sub-agent. In NexusTrade, every message you send to Aurora hits the classifier first. It reads your message and a list of 23 specialized sub-prompts, each with its own description. It picks the one that should handle your request and routes to it. That sub-prompt has a tight system prompt, a narrow tool list, and examples specific to its job. The main model only ever sees one task at a time.

The classifier is gemini-3.1-flash-lite-001 at temperature: 0 with forceJSON: true. Fast, cheap, deterministic. It runs on every message. The expensive models only run when a message reaches them.

Four engineering reasons this wins over a single giant prompt:

Focus. Each sub-prompt sees only the tools and instructions relevant to its task. The model isn't confused by 200 rules that don't apply.
Debuggability. When a route breaks, you know exactly which sub-prompt to fix. No hunting through a monolith.
Incremental scaling. Add a new capability by writing a new sub-prompt and a trigger description. Nothing else changes.
Cost control. Only the matched sub-prompt runs against the expensive model. The classifier is cheap by design.

This is the architecture almost every production AI app at scale converges on. ChatGPT's Custom GPTs are sub-prompts. Claude's Projects are sub-prompts. Cursor routes your request before invoking the right tool. You've been using this pattern without knowing what to call it.

One More Thing

MCP: the same concept with a standard interface.

The AI industry has a naming problem. Function calling, tool use, skills, MCP servers. They all describe the same core concept: a list of things the agent is allowed to do, with defined inputs and outputs, so it can generate parameters and your system can execute the call.

MCP (Model Context Protocol) is Anthropic's open standard for this. Think of it as USB for AI agents. Before USB, every device had its own connector. MCP creates one standard so any agent can connect to any tool that exposes an MCP server.

NexusTrade runs an MCP server. Here's what that actually looks like in practice. You add one entry to your Claude Desktop config:

{
  "mcpServers": {
    "nexustrade": {
      "url": "https://nexustrade.io/api/mcp",
      "headers": { "Authorization": "Bearer <your-api-key>" }
    }
  }
}

That's it. After that, you open Claude Desktop and ask: "What's the current RSI of NVDA?" — Claude calls screen_stocks on the NexusTrade MCP server, the server returns the live value, Claude reads it and responds with the number and what it means in context. The same tool engine Aurora uses inside NexusTrade. No copy-paste. No API docs. One tool implementation, available from any MCP-compatible client.

The name changes depending on the ecosystem. The pattern doesn't.

Module 2

Reading this isn't enough.

Reading about system prompts and writing one that works are different skills. Understanding the classifier pattern and knowing where it breaks are different things. The only way to close that gap is to build something and watch it fail.

Module 2 has two exercises built around this. In the first, you write a real system prompt from scratch (instructions, examples, output format) and render it against a live Gemini model using tokens we give you. You see exactly what the model receives and how it responds. In the second, you run the real NexusTrade classifier. You read the sub-prompt descriptions. You type messages and watch them route. Then you try to find edge cases that break it.

Both exercises use real infrastructure. Real models. Real NexusTrade prompts. Nothing is simulated.

Start Module 2 — free, no credit card

Part 2 of 5 in the AI Agents from Scratch series.

Try NexusTrade's AI trading agent free: https://nexustrade.io