Hey devs! 👋
Recently, I took a deep dive into building AI agents — the kind that can think, plan, and act on your behalf. Inspired by Scott Moss’s "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.
The result is a modular, extensible, and hackable project:
👉 github.com/gsk-007/ai-agent-gemini
🧠 What the Agent Does
This AI agent:
- Takes in a user-defined goal
- Uses Gemini 2.0 flash to reason through the steps
- Executes actions via pluggable tools (like fetching Reddit posts or generating images)
- Stores memory between steps
- Loops until the goal is completed — completely autonomously
⚙️ Tech Stack
Here’s what powers the project under the hood:
🔧 Core Technologies
- TypeScript – Strictly typed and modular
-
Node.js (via Volta) – Runtime (
v20.17.0
) - Google Gemini Pro – Language + image generation
- LowDB – Lightweight JSON-based memory system
- dotenv – Secure environment variables
- Ora + Colors – Friendly CLI feedback
- TSX – Seamless TypeScript execution during development
🔌 Tool Integrations
The real power comes from its extensible tooling system. Right now, the agent supports:
Reddit Reader
Fetches trending posts from
https://www.reddit.com/.json?limit=5
Dad Joke Fetcher
Uses the classic
https://icanhazdadjoke.com/
APIGemini Image Generator
Converts text prompts into images using Gemini’s multimodal API
You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.
🧱 Agent Architecture
The agent follows a simplified but powerful cognitive loop:
Goal -> Plan -> Reason -> Execute -> Remember -> Repeat
Each component is modular:
-
agent.ts
: The main reasoning loop -
ai.ts
: Interacts with Gemini -
toolRunner.ts
: Delegates tool use -
memory.ts
: Stores past tasks -
systemPrompt.ts
: Shapes Gemini's behavior -
ui.ts
: Command-line interface
This decoupled design makes it ideal for building more advanced agents — from AutoGPT-like projects to task-specific copilots.
💡 Why This Can Be a Template
This project was designed to be plug-and-play:
- Want to add search capabilities? Add a new tool.
- Need better memory? Swap out LowDB for Pinecone or ChromaDB.
- Want to run it on the web? Wire it into a React front-end or an Express API.
The base is strong — all you need to do is build on top.
🧪 Challenges & Lessons
- Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
- Streaming support: Gemini doesn’t stream easily via Node yet — so feedback handling needed tweaks.
- Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.
🚀 What’s Next
- 🔍 Add a Google Search or Wikipedia tool
- 📂 File system access for longer tasks
- 🧠 Use vector memory for smarter recall
- 🌐 Build a web UI with Next.js or Electron
📢 Final Thoughts
If you’re curious about building autonomous agents — not just running chatbots — this project is a great starting point.
Use it, fork it, break it, and make it your own. Let’s push the boundaries of what AI can automate for us.
🔗 GitHub Repo: github.com/gsk-007/ai-agent-gemini
Top comments (0)