Hey devs! π
Recently, I took a deep dive into building AI agents β the kind that can think, plan, and act on your behalf. Inspired by Scott Mossβs "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.
The result is a modular, extensible, and hackable project:
π github.com/gsk-007/ai-agent-gemini
π§ What the Agent Does
This AI agent:
- Takes in a user-defined goal
- Uses Gemini 2.0 flash to reason through the steps
- Executes actions via pluggable tools (like fetching Reddit posts or generating images)
- Stores memory between steps
- Loops until the goal is completed β completely autonomously
βοΈ Tech Stack
Hereβs what powers the project under the hood:
π§ Core Technologies
- TypeScript β Strictly typed and modular
-
Node.js (via Volta) β Runtime (
v20.17.0) - Google Gemini Pro β Language + image generation
- LowDB β Lightweight JSON-based memory system
- dotenv β Secure environment variables
- Ora + Colors β Friendly CLI feedback
- TSX β Seamless TypeScript execution during development
π Tool Integrations
The real power comes from its extensible tooling system. Right now, the agent supports:
Reddit Reader
Fetches trending posts from
https://www.reddit.com/.json?limit=5Dad Joke Fetcher
Uses the classic
https://icanhazdadjoke.com/APIGemini Image Generator
Converts text prompts into images using Geminiβs multimodal API
You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.
π§± Agent Architecture
The agent follows a simplified but powerful cognitive loop:
Goal -> Plan -> Reason -> Execute -> Remember -> Repeat
Each component is modular:
-
agent.ts: The main reasoning loop -
ai.ts: Interacts with Gemini -
toolRunner.ts: Delegates tool use -
memory.ts: Stores past tasks -
systemPrompt.ts: Shapes Gemini's behavior -
ui.ts: Command-line interface
This decoupled design makes it ideal for building more advanced agents β from AutoGPT-like projects to task-specific copilots.
π‘ Why This Can Be a Template
This project was designed to be plug-and-play:
- Want to add search capabilities? Add a new tool.
- Need better memory? Swap out LowDB for Pinecone or ChromaDB.
- Want to run it on the web? Wire it into a React front-end or an Express API.
The base is strong β all you need to do is build on top.
π§ͺ Challenges & Lessons
- Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
- Streaming support: Gemini doesnβt stream easily via Node yet β so feedback handling needed tweaks.
- Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.
π Whatβs Next
- π Add a Google Search or Wikipedia tool
- π File system access for longer tasks
- π§ Use vector memory for smarter recall
- π Build a web UI with Next.js or Electron
π’ Final Thoughts
If youβre curious about building autonomous agents β not just running chatbots β this project is a great starting point.
Use it, fork it, break it, and make it your own. Letβs push the boundaries of what AI can automate for us.
π GitHub Repo: github.com/gsk-007/ai-agent-gemini
Top comments (0)