DEV Community

gsk-007
gsk-007

Posted on • Edited on

Building an AI Agent with Gemini and TypeScript

Hey devs! 👋

Recently, I took a deep dive into building AI agents — the kind that can think, plan, and act on your behalf. Inspired by Scott Moss’s "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.

The result is a modular, extensible, and hackable project:
👉 github.com/gsk-007/ai-agent-gemini


🧠 What the Agent Does

This AI agent:

  1. Takes in a user-defined goal
  2. Uses Gemini 2.0 flash to reason through the steps
  3. Executes actions via pluggable tools (like fetching Reddit posts or generating images)
  4. Stores memory between steps
  5. Loops until the goal is completed — completely autonomously

⚙️ Tech Stack

Here’s what powers the project under the hood:

🔧 Core Technologies

  • TypeScript – Strictly typed and modular
  • Node.js (via Volta) – Runtime (v20.17.0)
  • Google Gemini Pro – Language + image generation
  • LowDB – Lightweight JSON-based memory system
  • dotenv – Secure environment variables
  • Ora + Colors – Friendly CLI feedback
  • TSX – Seamless TypeScript execution during development

🔌 Tool Integrations

The real power comes from its extensible tooling system. Right now, the agent supports:

  1. Reddit Reader
    Fetches trending posts from
    https://www.reddit.com/.json?limit=5

  2. Dad Joke Fetcher
    Uses the classic
    https://icanhazdadjoke.com/ API

  3. Gemini Image Generator
    Converts text prompts into images using Gemini’s multimodal API

You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.


🧱 Agent Architecture

The agent follows a simplified but powerful cognitive loop:

Goal -> Plan -> Reason -> Execute -> Remember -> Repeat
Enter fullscreen mode Exit fullscreen mode

Each component is modular:

  • agent.ts: The main reasoning loop
  • ai.ts: Interacts with Gemini
  • toolRunner.ts: Delegates tool use
  • memory.ts: Stores past tasks
  • systemPrompt.ts: Shapes Gemini's behavior
  • ui.ts: Command-line interface

This decoupled design makes it ideal for building more advanced agents — from AutoGPT-like projects to task-specific copilots.


💡 Why This Can Be a Template

This project was designed to be plug-and-play:

  • Want to add search capabilities? Add a new tool.
  • Need better memory? Swap out LowDB for Pinecone or ChromaDB.
  • Want to run it on the web? Wire it into a React front-end or an Express API.

The base is strong — all you need to do is build on top.


🧪 Challenges & Lessons

  • Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
  • Streaming support: Gemini doesn’t stream easily via Node yet — so feedback handling needed tweaks.
  • Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.

🚀 What’s Next

  • 🔍 Add a Google Search or Wikipedia tool
  • 📂 File system access for longer tasks
  • 🧠 Use vector memory for smarter recall
  • 🌐 Build a web UI with Next.js or Electron

📢 Final Thoughts

If you’re curious about building autonomous agents — not just running chatbots — this project is a great starting point.

Use it, fork it, break it, and make it your own. Let’s push the boundaries of what AI can automate for us.

🔗 GitHub Repo: github.com/gsk-007/ai-agent-gemini


Top comments (0)