DEV Community

Cover image for Building an AI Agent from Scratch: A Step-by-Step Journey
Sumit V
Sumit V

Posted on

Building an AI Agent from Scratch: A Step-by-Step Journey

AI agents aren't magic. They're just a loop. This workshop breaks down how to build a real Todo Agent in 11 commits, teaching you the core pattern behind every AI agent system.

Everyone's talking about AI agents, but most explanations are either too abstract ("agents can reason and act!") or too complex (production systems with 50 dependencies).

This workshop takes a different approach: build one from scratch, one commit at a time. By the end, you'll understand the fundamental loop that powers everything from ChatGPT plugins to autonomous coding assistants.

GitHub Repository: https://github.com/sumitvairagar/simple-agent-workshop

The Journey: 11 Commits to Understanding

Step 1: Set Up the Project Skeleton

Every project starts somewhere. We begin with just a .gitignore and a README. Nothing runs yet, but we know what we're building: a Todo Assistant that can add tasks, list them, mark them done, and even prioritize using AI.

Key Learning: Start simple. Define the goal before writing code.


Step 2: Add Dependencies and TypeScript Config

We install the essentials:

  • openai β†’ Official SDK to talk to GPT
  • dotenv β†’ Load API keys securely
  • tsx β†’ Run TypeScript without compilation
  • typescript β†’ Type safety and autocomplete

Key Learning: Modern AI development needs surprisingly few dependencies.


Step 3: Say Hello to GPT πŸ‘‹

Our first message to GPT and back!

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }]
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

What to Notice:

  • We send a messages array with a role and content
  • GPT replies in choices[0].message.content
  • finish_reason tells us WHY GPT stopped ("stop" = it's done)

Key Learning: The OpenAI API is just HTTP requests. No magic.


Step 4: Give GPT a Memory with Conversation History

The Problem: GPT has NO memory between calls. Every request starts fresh.

The Solution: Keep a messages array and send the full conversation every time.

The Golden Rule:

  1. Push the user message into history
  2. Call the API with the full history array
  3. Push GPT's reply into history too β€” never skip this!
const messages = [];
messages.push({ role: "user", content: "My name is Alice" });
// ... call API ...
messages.push(response.choices[0].message);

messages.push({ role: "user", content: "What's my name?" });
// GPT correctly remembers: "Your name is Alice"
Enter fullscreen mode Exit fullscreen mode

Key Learning: Conversation memory is just an array. You manage it, not GPT.


Step 5: Stream GPT's Reply Word by Word ✨

Instead of waiting for the full response, tokens appear as GPT writes them. This makes everything feel alive and instant.

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}
Enter fullscreen mode Exit fullscreen mode

Key Learning: Streaming is the difference between "loading..." and feeling like you're talking to something intelligent.


Step 6: Give GPT Tools to Work With πŸ”§

Tools give GPT superpowers beyond just chatting. Each tool is a description that tells GPT:

  • What it's called (add_todo, list_todos, mark_done)
  • When to use it (the description)
  • What inputs to pass
const tools = [
  {
    type: "function",
    function: {
      name: "add_todo",
      description: "Add a new task to the todo list",
      parameters: {
        type: "object",
        properties: {
          task: { type: "string", description: "The task to add" }
        }
      }
    }
  }
];
Enter fullscreen mode Exit fullscreen mode

Important: GPT does NOT run tools itself. It just tells us "please run add_todo with this input" and we do the actual work.

When GPT wants a tool, finish_reason changes to "tool_calls".

Key Learning: Tools are just JSON schemas. GPT reads them and decides when to use them.


Step 7: Actually Run the Tools GPT Asks For πŸƒ

Now we close the loop β€” when GPT says "call this tool", we do it!

The Flow:

  1. Find all tool_calls in the response
  2. Run the matching function (add_todo, list_todos, mark_done)
  3. Push each result back as a "tool" role message
  4. Include the tool_call_id so GPT knows which result matches which request
for (const toolCall of response.choices[0].message.tool_calls) {
  const result = executeTool(toolCall.function.name, toolCall.function.arguments);

  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: JSON.stringify(result)
  });
}
Enter fullscreen mode Exit fullscreen mode

Key Learning: Tool execution is just function calls. You write the functions, GPT decides when to call them.


Step 8: The Agent Loop β€” This Is Where the Magic Happens πŸ€–

This is THE commit. Everything before was setup. This is the agent.

The whole idea in plain English:

Keep asking GPT what to do next.

If it wants a tool β†’ run it, hand GPT the result, ask again.

If it says it's done β†’ stop and show the answer.

while (true) {
  const response = await callGPT(messages);

  if (response.finish_reason === "tool_calls") {
    // Run the tools and add results to messages
    executeTools(response.tool_calls);
    continue; // Ask GPT again
  }

  if (response.finish_reason === "stop") {
    break; // GPT is done
  }
}
Enter fullscreen mode Exit fullscreen mode

That loop is what makes this an "agent" instead of a chatbot.

GPT decides:

  • Which tools to call
  • How many times
  • When to stop

We just follow its lead.

Key Learning: The agent loop is the heart of every AI agent system. Master this pattern, and you understand 90% of AI agents.


Step 9: Add a System Prompt to Guide the Agent's Behaviour

Without instructions, GPT will do something... but inconsistently.

The system prompt is like a job description β€” it tells GPT:

  • What role it's playing (todo list assistant)
  • The rules to follow (add, list, mark done)
  • How to behave (brief and friendly)
messages.push({
  role: "system",
  content: `You are a helpful todo list assistant.

Rules:
- Use add_todo to add tasks
- Use list_todos to show all tasks
- Use mark_done to complete tasks
- Be brief and friendly`
});
Enter fullscreen mode Exit fullscreen mode

Try This: Comment out the system prompt and run again. The difference is night and day.

Key Learning: System prompts are how you control agent behavior. They're more important than you think.


Step 10: Stream the Final Answer to the User in Real Time

After all the tools are done, we do one final call to GPT and stream the summary token by token to the terminal.

Why Two Separate Calls?

  • During the loop we need the complete response to see tool requests
  • Streaming and tool use can't be combined cleanly β€” pick one
  • So: tool use in the loop, streaming for the final pretty answer

Key Learning: Production systems often separate "thinking" (tool use) from "presentation" (streaming).


Step 11: Add a Prioritize Tool That Calls GPT Under the Hood πŸͺ†

One GPT call using another GPT call as a tool.

The main agent manages the todo workflow. When the user asks to prioritize, it hands off the thinking to a smaller focused GPT call (gpt-4o-mini) that reorders tasks.

Main agent β†’ calls prioritize β†’ GPT sub-call β†’ reordered list β†’ back to main agent
Enter fullscreen mode Exit fullscreen mode

Key Learning: This "agent inside an agent" pattern is how real production systems handle tasks that are too big or specialized for one model to do alone.


The Big Picture

After these 11 steps, you've built a real AI agent. Not a toy, not a demo β€” a working system that:

  • Maintains conversation memory
  • Uses tools to interact with the world
  • Makes decisions autonomously
  • Streams responses for great UX
  • Can even delegate to other AI calls

And the core pattern? It's just a loop:

1. User sends message
2. GPT decides which tool to use
3. Execute the tool
4. Send result back to GPT
5. GPT responds to user
Enter fullscreen mode Exit fullscreen mode

That's it. That's the pattern behind ChatGPT plugins, GitHub Copilot, autonomous coding agents, and every other AI agent system.


Try It Yourself

The workshop is designed to be hands-on. Clone the repo and use the demo script to navigate through each commit:

git clone https://github.com/sumitvairagar/simple-agent-workshop.git
cd simple-agent-workshop
./simple-agent-workshop-demo.sh
Enter fullscreen mode Exit fullscreen mode

Commands:

  • n β€” Next step
  • p β€” Previous step
  • l β€” List all steps
  • g 5 β€” Jump to step 5
  • q β€” Quit

At each step, read the code, run npm start, and see how it works.


Why Software Engineers Need to Understand This

AI agents aren't replacing software engineers. They're becoming a core tool in our toolkit.

Understanding how they work means:

  • You can build AI features into your products
  • You can debug when they fail (and they will)
  • You can architect systems that use AI effectively
  • You can evaluate which problems AI can actually solve

This isn't optional knowledge anymore. It's foundational.

Just like every engineer should understand HTTP, databases, and async programming β€” understanding the agent loop is becoming a core skill.


What's Next?

This workshop teaches the fundamentals. Real production systems add:

  • Error handling and retries
  • Rate limiting and cost controls
  • Observability and logging
  • Multi-agent orchestration
  • Memory systems beyond conversation history
  • Security and sandboxing

But they all build on this same core loop.

Master the basics first. Then level up.


Resources


Built something cool with this? Share it in the comments! I'd love to see what you create.

Top comments (0)