DEV Community

gsk-007
gsk-007

Posted on • Edited on

Building an AI Agent with Gemini and TypeScript

Hey devs! ๐Ÿ‘‹

Recently, I took a deep dive into building AI agents โ€” the kind that can think, plan, and act on your behalf. Inspired by Scott Mossโ€™s "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.

The result is a modular, extensible, and hackable project:
๐Ÿ‘‰ github.com/gsk-007/ai-agent-gemini


๐Ÿง  What the Agent Does

This AI agent:

  1. Takes in a user-defined goal
  2. Uses Gemini 2.0 flash to reason through the steps
  3. Executes actions via pluggable tools (like fetching Reddit posts or generating images)
  4. Stores memory between steps
  5. Loops until the goal is completed โ€” completely autonomously

โš™๏ธ Tech Stack

Hereโ€™s what powers the project under the hood:

๐Ÿ”ง Core Technologies

  • TypeScript โ€“ Strictly typed and modular
  • Node.js (via Volta) โ€“ Runtime (v20.17.0)
  • Google Gemini Pro โ€“ Language + image generation
  • LowDB โ€“ Lightweight JSON-based memory system
  • dotenv โ€“ Secure environment variables
  • Ora + Colors โ€“ Friendly CLI feedback
  • TSX โ€“ Seamless TypeScript execution during development

๐Ÿ”Œ Tool Integrations

The real power comes from its extensible tooling system. Right now, the agent supports:

  1. Reddit Reader
    Fetches trending posts from
    https://www.reddit.com/.json?limit=5

  2. Dad Joke Fetcher
    Uses the classic
    https://icanhazdadjoke.com/ API

  3. Gemini Image Generator
    Converts text prompts into images using Geminiโ€™s multimodal API

You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.


๐Ÿงฑ Agent Architecture

The agent follows a simplified but powerful cognitive loop:

Goal -> Plan -> Reason -> Execute -> Remember -> Repeat
Enter fullscreen mode Exit fullscreen mode

Each component is modular:

  • agent.ts: The main reasoning loop
  • ai.ts: Interacts with Gemini
  • toolRunner.ts: Delegates tool use
  • memory.ts: Stores past tasks
  • systemPrompt.ts: Shapes Gemini's behavior
  • ui.ts: Command-line interface

This decoupled design makes it ideal for building more advanced agents โ€” from AutoGPT-like projects to task-specific copilots.


๐Ÿ’ก Why This Can Be a Template

This project was designed to be plug-and-play:

  • Want to add search capabilities? Add a new tool.
  • Need better memory? Swap out LowDB for Pinecone or ChromaDB.
  • Want to run it on the web? Wire it into a React front-end or an Express API.

The base is strong โ€” all you need to do is build on top.


๐Ÿงช Challenges & Lessons

  • Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
  • Streaming support: Gemini doesnโ€™t stream easily via Node yet โ€” so feedback handling needed tweaks.
  • Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.

๐Ÿš€ Whatโ€™s Next

  • ๐Ÿ” Add a Google Search or Wikipedia tool
  • ๐Ÿ“‚ File system access for longer tasks
  • ๐Ÿง  Use vector memory for smarter recall
  • ๐ŸŒ Build a web UI with Next.js or Electron

๐Ÿ“ข Final Thoughts

If youโ€™re curious about building autonomous agents โ€” not just running chatbots โ€” this project is a great starting point.

Use it, fork it, break it, and make it your own. Letโ€™s push the boundaries of what AI can automate for us.

๐Ÿ”— GitHub Repo: github.com/gsk-007/ai-agent-gemini


Top comments (0)