Sumit V

Posted on Mar 25

Building an AI Agent from Scratch: A Step-by-Step Journey

#ai #agents #openai #programming

AI agents aren't magic. They're just a loop. This workshop breaks down how to build a real Todo Agent in 11 commits, teaching you the core pattern behind every AI agent system.

Everyone's talking about AI agents, but most explanations are either too abstract ("agents can reason and act!") or too complex (production systems with 50 dependencies).

This workshop takes a different approach: build one from scratch, one commit at a time. By the end, you'll understand the fundamental loop that powers everything from ChatGPT plugins to autonomous coding assistants.

GitHub Repository: https://github.com/sumitvairagar/simple-agent-workshop

The Journey: 11 Commits to Understanding

Step 1: Set Up the Project Skeleton

Every project starts somewhere. We begin with just a .gitignore and a README. Nothing runs yet, but we know what we're building: a Todo Assistant that can add tasks, list them, mark them done, and even prioritize using AI.

Key Learning: Start simple. Define the goal before writing code.

Step 2: Add Dependencies and TypeScript Config

We install the essentials:

openai → Official SDK to talk to GPT
dotenv → Load API keys securely
tsx → Run TypeScript without compilation
typescript → Type safety and autocomplete

Key Learning: Modern AI development needs surprisingly few dependencies.

Step 3: Say Hello to GPT 👋

Our first message to GPT and back!

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }]
});

console.log(response.choices[0].message.content);

What to Notice:

We send a messages array with a role and content
GPT replies in choices[0].message.content
finish_reason tells us WHY GPT stopped ("stop" = it's done)

Key Learning: The OpenAI API is just HTTP requests. No magic.

Step 4: Give GPT a Memory with Conversation History

The Problem: GPT has NO memory between calls. Every request starts fresh.

The Solution: Keep a messages array and send the full conversation every time.

The Golden Rule:

Push the user message into history
Call the API with the full history array
Push GPT's reply into history too — never skip this!

const messages = [];
messages.push({ role: "user", content: "My name is Alice" });
// ... call API ...
messages.push(response.choices[0].message);

messages.push({ role: "user", content: "What's my name?" });
// GPT correctly remembers: "Your name is Alice"

Key Learning: Conversation memory is just an array. You manage it, not GPT.

Step 5: Stream GPT's Reply Word by Word ✨

Instead of waiting for the full response, tokens appear as GPT writes them. This makes everything feel alive and instant.

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}

Key Learning: Streaming is the difference between "loading..." and feeling like you're talking to something intelligent.

Step 6: Give GPT Tools to Work With 🔧

Tools give GPT superpowers beyond just chatting. Each tool is a description that tells GPT:

What it's called (add_todo, list_todos, mark_done)
When to use it (the description)
What inputs to pass

const tools = [
  {
    type: "function",
    function: {
      name: "add_todo",
      description: "Add a new task to the todo list",
      parameters: {
        type: "object",
        properties: {
          task: { type: "string", description: "The task to add" }
        }
      }
    }
  }
];

Important: GPT does NOT run tools itself. It just tells us "please run add_todo with this input" and we do the actual work.

When GPT wants a tool, finish_reason changes to "tool_calls".

Key Learning: Tools are just JSON schemas. GPT reads them and decides when to use them.

Step 7: Actually Run the Tools GPT Asks For 🏃

Now we close the loop — when GPT says "call this tool", we do it!

The Flow:

Find all tool_calls in the response
Run the matching function (add_todo, list_todos, mark_done)
Push each result back as a "tool" role message
Include the tool_call_id so GPT knows which result matches which request

for (const toolCall of response.choices[0].message.tool_calls) {
  const result = executeTool(toolCall.function.name, toolCall.function.arguments);

  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: JSON.stringify(result)
  });
}

Key Learning: Tool execution is just function calls. You write the functions, GPT decides when to call them.

Step 8: The Agent Loop — This Is Where the Magic Happens 🤖

This is THE commit. Everything before was setup. This is the agent.

The whole idea in plain English:

Keep asking GPT what to do next.

If it wants a tool → run it, hand GPT the result, ask again.

If it says it's done → stop and show the answer.

while (true) {
  const response = await callGPT(messages);

  if (response.finish_reason === "tool_calls") {
    // Run the tools and add results to messages
    executeTools(response.tool_calls);
    continue; // Ask GPT again
  }

  if (response.finish_reason === "stop") {
    break; // GPT is done
  }
}

That loop is what makes this an "agent" instead of a chatbot.

GPT decides:

Which tools to call
How many times
When to stop

We just follow its lead.

Key Learning: The agent loop is the heart of every AI agent system. Master this pattern, and you understand 90% of AI agents.

Step 9: Add a System Prompt to Guide the Agent's Behaviour

Without instructions, GPT will do something... but inconsistently.

The system prompt is like a job description — it tells GPT:

What role it's playing (todo list assistant)
The rules to follow (add, list, mark done)
How to behave (brief and friendly)

messages.push({
  role: "system",
  content: `You are a helpful todo list assistant.

Rules:
- Use add_todo to add tasks
- Use list_todos to show all tasks
- Use mark_done to complete tasks
- Be brief and friendly`
});

Try This: Comment out the system prompt and run again. The difference is night and day.

Key Learning: System prompts are how you control agent behavior. They're more important than you think.

Step 10: Stream the Final Answer to the User in Real Time

After all the tools are done, we do one final call to GPT and stream the summary token by token to the terminal.

Why Two Separate Calls?

During the loop we need the complete response to see tool requests
Streaming and tool use can't be combined cleanly — pick one
So: tool use in the loop, streaming for the final pretty answer

Key Learning: Production systems often separate "thinking" (tool use) from "presentation" (streaming).

Step 11: Add a Prioritize Tool That Calls GPT Under the Hood 🪆

One GPT call using another GPT call as a tool.

The main agent manages the todo workflow. When the user asks to prioritize, it hands off the thinking to a smaller focused GPT call (gpt-4o-mini) that reorders tasks.

Main agent → calls prioritize → GPT sub-call → reordered list → back to main agent

Key Learning: This "agent inside an agent" pattern is how real production systems handle tasks that are too big or specialized for one model to do alone.

The Big Picture

After these 11 steps, you've built a real AI agent. Not a toy, not a demo — a working system that:

Maintains conversation memory
Uses tools to interact with the world
Makes decisions autonomously
Streams responses for great UX
Can even delegate to other AI calls

And the core pattern? It's just a loop:

1. User sends message
2. GPT decides which tool to use
3. Execute the tool
4. Send result back to GPT
5. GPT responds to user

That's it. That's the pattern behind ChatGPT plugins, GitHub Copilot, autonomous coding agents, and every other AI agent system.

Try It Yourself

The workshop is designed to be hands-on. Clone the repo and use the demo script to navigate through each commit:

git clone https://github.com/sumitvairagar/simple-agent-workshop.git
cd simple-agent-workshop
./simple-agent-workshop-demo.sh

Commands:

n — Next step
p — Previous step
l — List all steps
g 5 — Jump to step 5
q — Quit

At each step, read the code, run npm start, and see how it works.

Why Software Engineers Need to Understand This

AI agents aren't replacing software engineers. They're becoming a core tool in our toolkit.

Understanding how they work means:

You can build AI features into your products
You can debug when they fail (and they will)
You can architect systems that use AI effectively
You can evaluate which problems AI can actually solve

This isn't optional knowledge anymore. It's foundational.

Just like every engineer should understand HTTP, databases, and async programming — understanding the agent loop is becoming a core skill.

What's Next?

This workshop teaches the fundamentals. Real production systems add:

Error handling and retries
Rate limiting and cost controls
Observability and logging
Multi-agent orchestration
Memory systems beyond conversation history
Security and sandboxing

But they all build on this same core loop.

Master the basics first. Then level up.

Resources

GitHub Repo: https://github.com/sumitvairagar/simple-agent-workshop
OpenAI API Docs: https://platform.openai.com/docs
TypeScript: https://www.typescriptlang.org

Built something cool with this? Share it in the comments! I'd love to see what you create.

DEV Community