Free 17,500 LLM Requests a Day

Alexey Leshchenko — Wed, 04 Feb 2026 12:33:18 +0000

The Problem: Rate Limits Kill Projects

We’ve all been there. You’re building a bot or research tool, and just when it gets interesting, you hit a rate limit or your credits run out. Everything goes dark, and it's incredibly frustrating.

The fix isn't finding one "perfect" free API. It’s about building a system that treats every provider as a disposable spare part. I built a Go-based gateway that handles 17,500+ requests a day for $0. Here’s how.

The Backstory: Tired of Broken Bots

I didn't actually want to write a Go service; I did it because I was sick of my antispam bot crashing.

I started with Python and n8n, which worked for about five minutes. As traffic grew, the setup crumbled. Free models on OpenRouter changed weekly, and my bot would quit whenever an API vanished. I tried Cloudflare’s AI Gateway, but it disconnected under heavy load. To get 100% uptime on a budget, I had to build a tool I could actually control.

The real hurdle was my hardware: a $3/month VDS with 700MB of RAM. Tools like LiteLLM used half my memory just idling. I needed a lightweight binary that could handle thousands of requests without a sweat.

The Plan: Building a "Meta-Tier"

Instead of relying on one provider, I grouped several free APIs into a "Meta-Tier." If one provider throttles or goes offline, the gateway instantly moves to the next one.

The Capacity Breakdown:

Groq (Free): ~15,000 Req/Day (Llama 3.3 70B) — Industry-leading inference speed.
Gemini (AI Studio): 1,500 Req/Day (Gemini 1.5 Flash) — Massive context window.
OpenRouter: 1,000 Req/Day (GPT-OSS / Qwen) — Access to niche/experimental models.
Mistral (Exp): Variable Capacity (Mistral Small) — Excellent for complex logic fallback.

Total: 17,500+ Requests for $0.00/month.

How the Gateway Works

This is a specialized load balancer designed for LLM-specific failures. Since we want to keep things lean, we avoid complex visual libraries and stick to a robust request flow:

The Request Flow:

Client (Bot/App) → Sends HTTPS request to Nginx.
Nginx → Proxies via Unix Socket to the Go Gateway.
Go Gateway → Performs internal Auth & Token check.
- Sequential Rotator → Picks the first available provider (e.g., Groq).
- Failover Logic → If Provider A returns a 429 (Rate Limit), the Gateway instantly retries with Provider B (Gemini) or Provider C (OpenRouter).
- Logging → Every success and failure is saved as structured JSON for monitoring.

Why Go?

The 700MB RAM limit dictated the architecture. Python is too bloated for this hardware. This Go gateway is a small binary that sips ~15MB of RAM, leaving the rest of the server for your actual apps.

Catching the Errors

The "brain" is a Sequential Rotator that is "429-aware." When a provider returns a rate-limit error, the gateway catches it and retries with the next provider in milliseconds. Your application never sees the failure.

🚀 Get it Running

First off, clone https://github.com/leshchenko1979/ai-gateway.

1. Setup

Copy the example config and add your API keys.

cp config.yaml.example config.yaml

2. Install

Skip Docker to save resources. Use the script to build and install the systemd service.

./install.sh build
./install.sh install-service
sudo systemctl start ai-gateway

3. Remote Deploy

Deploy from your local machine straight to your server.

cp .env.example .env
SSH_HOST=your-server.com ./install.sh deploy

Monitoring

The gateway logs everything in JSON. Run

journalctl -u ai-gateway -f

to watch it swap providers in real-time as rate limits are reached.

Try it Out

Once running, the stack works like a single OpenAI-compatible endpoint:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer YOUR_INTERNAL_TOKEN" \
  -d '{"model": "gpt-oss-120b", "messages": [{"role": "user", "content": "Hello!"}]}'

By owning this layer, you've built a private "meta-tier" that’s more reliable than any single API on its own.

See the repo: https://github.com/leshchenko1979/ai-gateway

How I Built an MCP Server to Give AI Assistants Real Telegram Powers

Alexey Leshchenko — Wed, 19 Nov 2025 17:15:17 +0000

I've been working on AI integrations for a while, and one thing always bugged me: why can't AI assistants just... use Telegram like humans do? Search conversations, send messages, manage contacts - without all the complexity.

Building fast-mcp-telegram was my answer. It's a complete MCP server that lets AI assistants interact with Telegram naturally.

The Problem I Needed to Solve

Working with AI assistants, I kept running into limitations. They could analyze data, but actually using Telegram was clunky. Direct API calls required complex session management, bot frameworks were for user-facing chatbots, and search tools couldn't send messages or manage contacts.

I needed something that gave AI assistants full Telegram capabilities in a natural way.

What Makes This Different

After trying various approaches, I built something specifically for AI assistants:

MCP Tools: Direct Telegram access through the invoke_mtproto tool and standard messaging/search functions
HTTP Bridge: For no-code tools like n8n and Make.com that can't use MCP directly
Web Setup: Handles authentication and generates config files automatically
Production Support: Bearer tokens, session isolation, and proper error handling
Smart Search: Multi-query support with deduplication and filtering for AI assistants
Full Messaging: Send, edit, reply, share files, even message phone numbers not in contacts
File Handling: Works with URLs or local files, handles security and albums

Where This Fits In

Other Telegram projects solve specific problems:

Search tools are great for finding messages, but can't send replies or manage contacts
Bot frameworks work for user-facing chatbots, but not for AI assistants needing programmatic access
Other MCP servers connect specific tools; this brings the entire Telegram ecosystem to AI assistants via direct MTProto API access

Getting Started

Try the live demo: https://tg-mcp.redevest.ru/setup - log in and download a config file, no installation needed.

For developers:

pip install fast-mcp-telegram
fast-mcp-telegram-setup \
  --api-id="your_api_id" \
  --api-hash="your_api_hash" \
  --phone-number="+123456789"

Full docs: https://github.com/leshchenko1979/fast-mcp-telegram/#readme

What I Use It For

Daily news summaries: n8n automation searches my subscribed channels for the last 24 hours and sends AI-summarized digests
Smart spam detection: Uses HTTP-MTProto Bridge to get full user profiles (beyond regular Bot API) for better spam scoring
Content creation: AI assistants analyze my previous posts' formatting to maintain consistent style
Customer service: AI can respond to Telegram inquiries instead of just reading them
Research workflows: Search across channels and summarize findings

Wrapping Up

This started as a solution to my own AI-Telegram integration frustrations. The MCP protocol makes it natural for AI assistants to use, and the full feature set enables real applications beyond just demos.

If this sounds useful, try the demo - setup takes 2 minutes.

Demo: https://tg-mcp.redevest.ru/setup
GitHub: https://github.com/leshchenko1979/fast-mcp-telegram
Docs: https://github.com/leshchenko1979/fast-mcp-telegram/blob/master/docs/

This post is AI-assisted. But nowadays, everything is, right?

DEV Community: Alexey Leshchenko