Hey devs! ๐
Recently, I took a deep dive into building AI agents โ the kind that can think, plan, and act on your behalf. Inspired by Scott Mossโs "Agent From Scratch" course, I decided to reimplement the core ideas using Google's Gemini API and a modern TypeScript + Node.js stack.
The result is a modular, extensible, and hackable project:
๐ github.com/gsk-007/ai-agent-gemini
๐ง What the Agent Does
This AI agent:
- Takes in a user-defined goal
- Uses Gemini 2.0 flash to reason through the steps
- Executes actions via pluggable tools (like fetching Reddit posts or generating images)
- Stores memory between steps
- Loops until the goal is completed โ completely autonomously
โ๏ธ Tech Stack
Hereโs what powers the project under the hood:
๐ง Core Technologies
- TypeScript โ Strictly typed and modular
-
Node.js (via Volta) โ Runtime (
v20.17.0
) - Google Gemini Pro โ Language + image generation
- LowDB โ Lightweight JSON-based memory system
- dotenv โ Secure environment variables
- Ora + Colors โ Friendly CLI feedback
- TSX โ Seamless TypeScript execution during development
๐ Tool Integrations
The real power comes from its extensible tooling system. Right now, the agent supports:
Reddit Reader
Fetches trending posts from
https://www.reddit.com/.json?limit=5
Dad Joke Fetcher
Uses the classic
https://icanhazdadjoke.com/
APIGemini Image Generator
Converts text prompts into images using Geminiโs multimodal API
You can easily add your own tools by following a consistent interface pattern. Tools are dynamically selected by the agent based on task needs.
๐งฑ Agent Architecture
The agent follows a simplified but powerful cognitive loop:
Goal -> Plan -> Reason -> Execute -> Remember -> Repeat
Each component is modular:
-
agent.ts
: The main reasoning loop -
ai.ts
: Interacts with Gemini -
toolRunner.ts
: Delegates tool use -
memory.ts
: Stores past tasks -
systemPrompt.ts
: Shapes Gemini's behavior -
ui.ts
: Command-line interface
This decoupled design makes it ideal for building more advanced agents โ from AutoGPT-like projects to task-specific copilots.
๐ก Why This Can Be a Template
This project was designed to be plug-and-play:
- Want to add search capabilities? Add a new tool.
- Need better memory? Swap out LowDB for Pinecone or ChromaDB.
- Want to run it on the web? Wire it into a React front-end or an Express API.
The base is strong โ all you need to do is build on top.
๐งช Challenges & Lessons
- Prompt engineering for Gemini: Getting reliable tool selection and reasoning took trial and error.
- Streaming support: Gemini doesnโt stream easily via Node yet โ so feedback handling needed tweaks.
- Image generation: The multimodal API is powerful, but requires slightly different prompting strategies.
๐ Whatโs Next
- ๐ Add a Google Search or Wikipedia tool
- ๐ File system access for longer tasks
- ๐ง Use vector memory for smarter recall
- ๐ Build a web UI with Next.js or Electron
๐ข Final Thoughts
If youโre curious about building autonomous agents โ not just running chatbots โ this project is a great starting point.
Use it, fork it, break it, and make it your own. Letโs push the boundaries of what AI can automate for us.
๐ GitHub Repo: github.com/gsk-007/ai-agent-gemini
Top comments (0)