Mariano Gobea Alcoba

Posted on Mar 8 • Originally published at mgatc.com

AI Agents Run Smoothly on Gemini's Free Tier!

#deeplearning #largelanguagemodels #gemini #openclaw

Introduction

Running a one-person tech agency can be both rewarding and challenging, especially when it comes to managing various aspects of the business efficiently. In this article, we will delve into the architecture and implementation of a system where four AI agents handle content creation, sales lead generation, security scanning, and operations for a tech agency, all while operating on the free tier of Gemini 2.5 Flash. This setup allows for zero monthly LLM costs, with minimal infrastructure expenses.

Architecture Overview

Key Components

AI Agents: Four AI agents built using OpenClaw, an open-source framework.
LLM: Gemini 2.5 Flash free tier, providing 1,500 requests per day.
Operating Environment: WSL2 on a local machine, managed by systemd timers.
Infrastructure: Vercel (hobby plan) and Firebase (free plan).
Automation Tools: Telegram Bot, Resend, Jina Reader.

Workflow

The workflow of the AI agents is designed to be highly efficient and token-optimized. Each agent follows a specific sequence of steps to minimize the number of tokens used:

Read Pre-Computed Intelligence Files: Local Markdown files are used to provide context without consuming any tokens.
Focused Prompt with Context Injection: A single, focused prompt is generated with all necessary context included.
Generate Response: The LLM generates a response based on the prompt.
Parse and Act: The response is parsed and the agent performs the required action.

Agent Functions

Content Generation

The content generation agent is responsible for creating high-quality social media posts. It operates as follows:

Research Pipeline: Uses RSS feeds, Hacker News, and web scraping to gather relevant information. This step does not involve the LLM.
Content Creation: Generates 8 social posts daily across various platforms.
Quality Gates: Each post undergoes a self-review process. If the quality score is below a certain threshold, the post is rewritten.

Sales Lead Generation

The sales lead generation agent identifies and qualifies potential clients. Its workflow includes:

Lead Identification: Scans online platforms and forums for potential leads.
Qualification: Evaluates the leads based on predefined criteria.
Outreach: Sends personalized messages to qualified leads using Resend.

Security Scanning

The security scanning agent ensures the security of the tech agency's systems and applications. It performs the following tasks:

Vulnerability Scanning: Uses tools like OWASP ZAP to identify vulnerabilities.
Threat Analysis: Analyzes the identified threats and provides recommendations.
Reporting: Generates detailed reports and sends them to the relevant stakeholders.

Operations

The operations agent handles the day-to-day management of the tech agency. Its responsibilities include:

Health Checks: Monitors the status of various services and systems.
Task Scheduling: Manages the execution of tasks using systemd timers.
Notifications: Sends alerts and updates via Telegram and Discord.

Technical Implementation

OpenClaw Framework

OpenClaw is an open-source framework that simplifies the development and deployment of AI agents. It provides a robust set of tools and libraries to build, train, and manage AI models. The framework supports integration with various LLMs, including Gemini 2.5 Flash.

WSL2 and Systemd Timers

WSL2 (Windows Subsystem for Linux 2) is used to run the agents on a local machine. Systemd timers are configured to trigger the agents at specific intervals, ensuring that tasks are executed consistently and efficiently.

# Example systemd timer unit file
[Unit]
Description=Run content generation agent

[Timer]
OnCalendar=*-*-* 00,06,12,18:00:00
Persistent=true

[Install]
WantedBy=timers.target

Token Optimization

To minimize the number of tokens used, the agents follow a strict workflow:

Pre-Computed Intelligence Files: Local Markdown files store context and data, reducing the need for token-heavy queries.
Focused Prompts: Each request is a single, focused prompt with all necessary context included.
Efficient Parsing: Responses are parsed and acted upon immediately, avoiding unnecessary follow-up requests.

Infrastructure

Vercel (Hobby Plan)

Vercel is used to host the web application and API endpoints. The hobby plan provides sufficient resources for a small-scale operation.

// vercel.json configuration
{
  "version": 2,
  "builds": [
    {
      "src": "index.js",
      "use": "@vercel/node"
    }
  ],
  "routes": [
    {
      "src": "/api/(.*)",
      "dest": "/api/$1"
    },
    {
      "src": "/(.*)",
      "dest": "/index.html"
    }
  ]
}

Firebase (Free Plan)

Firebase is used for real-time database and authentication services. The free plan offers enough capacity for the current workload.

// firebase.json configuration
{
  "hosting": {
    "public": "public",
    "ignore": [
      "firebase.json",
      "**/.*",
      "**/node_modules/**"
    ],
    "rewrites": [
      {
        "source": "**",
        "destination": "/index.html"
      }
    ]
  }
}

Automation Tools

Telegram Bot

A Telegram bot is used for health checks and notifications. It monitors the status of various services and sends alerts when issues are detected.

# Example Telegram bot script
import requests

def send_telegram_message(chat_id, text):
    url = f"https://api.telegram.org/bot{TELEGRAM_BOT_TOKEN}/sendMessage"
    payload = {
        'chat_id': chat_id,
        'text': text
    }
    response = requests.post(url, json=payload)
    return response.json()

# Usage
send_telegram_message(CHAT_ID, "All systems are operational.")

Resend

Resend is used for sending personalized emails to sales leads. It integrates seamlessly with the sales lead generation agent.

// Example Resend script
const resend = require('resend');

const resendClient = new resend.Resend(RESEND_API_KEY);

async function sendEmail(to, subject, text) {
    await resendClient.emails.send({
        from: 'noreply@yourdomain.com',
        to: to,
        subject: subject,
        text: text
    });
}

// Usage
sendEmail('lead@example.com', 'Follow-Up on Your Inquiry', 'Thank you for your interest...');

Jina Reader

Jina Reader is used for web scraping and data extraction. It processes the gathered data and prepares it for the content generation agent.

# Example Jina Reader script
from jina import Flow, Document

def scrape_website(url):
    flow = Flow().add(uses='jinahub+docker://HTTPReader')
    with flow:
        response = flow.post(on='/index', inputs=[Document(uri=url)], return_results=True)
    return response[0].docs[0].text

# Usage
content = scrape_website('https://example.com')
print(content)

Real Numbers and Performance

Social Media Impact

27 Automated Threads Accounts: With a total of 12K+ followers and 3.3M+ views.
Engagement Loop Bug: An early bug caused the agent to iterate through all posts instead of the top N, burning 800 RPD in one day.

Resource Utilization

RPD Utilization: 7% (105/1,500) — 93% headroom left.
Monthly Cost: $0 LLM + ~$5 infra (Vercel hobby + Firebase free).

Lessons Learned

API Key Management: Creating an API key from a billing-enabled GCP project instead of AI Studio led to a $127 Gemini bill in 7 days. Always create keys from AI Studio directly.
Rate Limiting: Implement rate limiting to prevent accidental overuse of the LLM.
Duplicate Messages: A conflict between the Telegram health check and the gateway's long-polling resulted in 18 duplicate messages in 3 minutes.

Conclusion

Running a one-person tech agency with AI agents on the free tier of Gemini 2.5 Flash is a viable and cost-effective solution. By leveraging open-source frameworks, local computing resources, and efficient token management, it is possible to automate various aspects of the business without incurring significant costs. The architecture described in this article provides a solid foundation for scaling and expanding the capabilities of the AI agents.

For consulting services and further assistance in implementing similar solutions, please visit https://www.mgatc.com.

Originally published in Spanish at www.mgatc.com/blog/ai-agents-run-smoothly-on-geminis-free-tier/

Top comments (1)

Agntable • Mar 11

The pre-computed intelligence files approach is smart — keeping context out of the token count is one of those optimizations that sounds obvious in hindsight but most people skip. The $127 accidental bill from the wrong API key source is a painful lesson that deserves its own warning banner somewhere. Curious how the quality gate threshold for the content agent was calibrated — is that a numeric score the LLM self-assigns, or something rule-based on the output?