Building an AI Agent with Gemini and TypeScript

#ai #typescript #node #gemini

Hey devs! 👋

Recently, I took a deep dive into building AI agents — the kind that can think, plan, and act on your behalf. Inspired by Scott Moss’s "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.

The result is a modular, extensible, and hackable project:
👉 github.com/gsk-007/ai-agent-gemini

🧠 What the Agent Does

This AI agent:

Takes in a user-defined goal
Uses Gemini 2.0 flash to reason through the steps
Executes actions via pluggable tools (like fetching Reddit posts or generating images)
Stores memory between steps
Loops until the goal is completed — completely autonomously

⚙️ Tech Stack

Here’s what powers the project under the hood:

🔧 Core Technologies

TypeScript – Strictly typed and modular
Node.js (via Volta) – Runtime (v20.17.0)
Google Gemini Pro – Language + image generation
LowDB – Lightweight JSON-based memory system
dotenv – Secure environment variables
Ora + Colors – Friendly CLI feedback
TSX – Seamless TypeScript execution during development

🔌 Tool Integrations

The real power comes from its extensible tooling system. Right now, the agent supports:

Reddit Reader
Fetches trending posts from
https://www.reddit.com/.json?limit=5
Dad Joke Fetcher
Uses the classic
https://icanhazdadjoke.com/ API
Gemini Image Generator
Converts text prompts into images using Gemini’s multimodal API

You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.

🧱 Agent Architecture

The agent follows a simplified but powerful cognitive loop:

Goal -> Plan -> Reason -> Execute -> Remember -> Repeat

Each component is modular:

agent.ts: The main reasoning loop
ai.ts: Interacts with Gemini
toolRunner.ts: Delegates tool use
memory.ts: Stores past tasks
systemPrompt.ts: Shapes Gemini's behavior
ui.ts: Command-line interface

This decoupled design makes it ideal for building more advanced agents — from AutoGPT-like projects to task-specific copilots.

💡 Why This Can Be a Template

This project was designed to be plug-and-play:

Want to add search capabilities? Add a new tool.
Need better memory? Swap out LowDB for Pinecone or ChromaDB.
Want to run it on the web? Wire it into a React front-end or an Express API.

The base is strong — all you need to do is build on top.

🧪 Challenges & Lessons

Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
Streaming support: Gemini doesn’t stream easily via Node yet — so feedback handling needed tweaks.
Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.