eagerspark

Posted on Jun 16

How I Built a Telegram AI Bot That Saved Me Thousands

#ai #api #programming #tutorial

Look, how I Built a Telegram AI Bot That Saved Me Thousands

Okay so let me set the scene. Its 2026, I'm running my little SaaS side project, and my DMs are FLOODED with people asking the same questions over and over. Pricing. Features. How to integrate. You know the drill if you've ever shipped anything users actually like.

I was honestly gonna just answer them manually forever. I really was. But then I caught myself typing the same response for like the 47th time one Tuesday afternoon and I was like... no, we're done here. Time to build a Telegram bot.

What I didn't expect? How CHEAP it'd end up being. And how much of a pain in the ass the whole thing could've been if I'd picked the wrong setup. Let me walk you through what I did, the numbers, and the stuff I wish someone had told me upfront.

The "Wait, Why Is Everything So Expensive" Moment

Here's the thing nobody warns you about when you start poking around at AI APIs. The popular ones are BRUTAL on the wallet. I was looking at GPT-4o thinking "okay this is the safe bet" and then I saw the output pricing. $10.00 per million tokens. Ten. Dollars.

Now I don't know about you, but my little bot was gonna be handling hundreds of conversations a day. Do the math on that and suddenly my "side project" is costing me actual rent money. NOPE.

So I went hunting. I wanted a unified API where I could just point at different models without rewriting everything. Honestly, I gotta say, this is one of those decisions that sounds small but saves you HOURS later when you want to swap models.

What I Actually Found (The Pricing Table That Made Me Stay)

After way too much time on pricing pages, I landed on Global API because they had basically everything in one place. Heres what I was comparing:

Model	Input	Output	Context
DeepSeek V4 Flash	0.27	1.10	128K
DeepSeek V4 Pro	0.55	2.20	200K
Qwen3-32B	0.30	1.20	32K
GLM-4 Plus	0.20	0.80	128K
GPT-4o	2.50	10.00	128K

Look at GLM-4 Plus. Twenty cents in. Eighty cents out. For 128K context. That's literally less than a tenth of what GPT-4o charges for the same thing. Pretty much a no-brainer for my use case.

And DeepSeek V4 Flash at $0.27 input and $1.10 output? I ran some test prompts and honestly the quality was totally fine for answering support questions. Not everything needs to be a frontier model, you know?

The Actual Code (The Part You Actually Care About)

Okay heres the part that made me actually smile. The setup is SO simple because the SDK is OpenAI-compatible. I didn't have to learn a new library, didn't have to deal with weird auth flows, nothing. Just pointed the official OpenAI Python client at a different base URL and boom, it worked.

import openai
import os

client = openai.OpenAI(
    base_url="https://global-apis.com/v1",
    api_key=os.environ["GLOBAL_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[{"role": "user", "content": "Your prompt"}],
)

Thats literally it. I copied this from their docs, pasted it in, and got a response on the first try. When was the last time an AI integration actually worked on the first try? Like never. This was a first.

The whole Telegram side wasn't much harder. Just standard python-telegram-bot stuff, grabbing the message text, sending it to the API, and echoing back the response. I won't bore you with the entire bot framework because its not the interesting part. The interesting part is that I was up and running in under 10 minutes. I timed it. I was genuinely shocked.

Stuff I Wish I Knew Before Starting

Heres where I get a little opinionated. There are things nobody tells you when you're building these kinds of bots and I learned them the hard way. Save yourself the trouble.

Cache everything you can. I added a simple in-memory cache for repeated questions and my hit rate is sitting around 40%. That means 40% of my API calls cost me literally nothing. For support bots especially, people ask the same stuff CONSTANTLY. "How do I reset my password?" "Where do I find my API key?" Same questions, all day. Cache them.

Stream your responses. Even if the model is fast, streaming makes it FEEL instant. Telegram shows the typing indicator while tokens come in, and psychologically that's a huge difference. Plus you can start showing partial answers to the user way sooner. My average latency is around 1.2 seconds but with streaming it feels like the bot is typing in real time, which people LOVE.

Don't use the most expensive model for everything. I route simple queries to cheaper models and only escalate when needed. Honestly this alone probably saved me more money than any other optimization. If someone asks "what time does support open?" you do NOT need a 200K context frontier model for that.

Have a fallback. Rate limits happen. Models go down. Networks do weird stuff. I built a simple try/except that switches to a different model if the primary one fails. Users never see an error, they just get a slightly slower response. Graceful degradation is the move.

The Numbers (Because Indie Hackers Love Numbers)

Let me be real with you. The reason I'm writing this is because I want other solo devs and small teams to know what's possible here. I was paying basically nothing for this bot after I set it up. We're talking cents per day, not dollars.

Heres what the docs said and what my experience matched:

40-65% cheaper than going direct to the big-name providers
1.2s average latency
320 tokens/sec throughput on the models I tested
84.6% average benchmark score across the lineup

That cost reduction number is the real kicker. When you're a one-person operation, every dollar matters. Every. Single. Dollar. And the quality is comparable? I literally cannot think of a reason not to do this.

The Code That Made It All Work (Take Two)

Okay heres the second snippet because I want to show you how I actually use it in production. This is the routing logic that picks the model based on query complexity:

import openai
import os

client = openai.OpenAI(
    base_url="https://global-apis.com/v1",
    api_key=os.environ["GLOBAL_API_KEY"],
)

def get_response(user_message: str) -> str:
    if len(user_message) < 100:
        model = "deepseek-ai/DeepSeek-V4-Flash"
    else:
        model = "deepseek-ai/DeepSeek-V4-Pro"

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful support assistant."},
            {"role": "user", "content": user_message},
        ],
        temperature=0.7,
    )
    return response.choices[0].message.content

This is not fancy. This is not clever. But it works, its fast, and it costs me basically nothing to run. Sometimes thats the best kind of code.

Things I Got Wrong (So You Don't Have To)

Let me be honest about the stuff I fumbled:

First, I overengineered the first version. I was using GPT-4o for everything because "what if the user asks something hard?" Turns out most of what comes in is pretty simple stuff. Start cheap, upgrade later.

Second, I didn't set up logging early enough. I had no idea what people were actually asking the bot, which meant I couldn't optimize for real usage patterns. Now I log every query (anonymized of course) and its a goldmine for product feedback.

Third, I waited way too long to add a feedback button. Just a simple "was this helpful? 👍 👎" inline button. The signal I get from that is INVALUABLE. Like, I'm using that data to figure out which model performs best for my actual users, not just synthetic benchmarks.

What I'd Tell Past Me

If I could go back to that Tuesday afternoon when I was manually typing responses for the 47th time, heres what I'd say:

Just build the bot. Its not that hard.
Use the cheap models first. You can always upgrade.
The 10-minute setup time is real, dont overthink it.
Add caching from day one. Seriously. Day. One.
Stream everything. Always.
Get user feedback data as early as possible.

Thats pretty much it. Nothing revolutionary. Just the stuff I learned by doing it.

The Bottom Line

Look, I'm not gonna pretend I'm running some massive operation here. Its a Telegram bot. It answers questions. But it answers them 24/7, it costs me almost nothing, and it freed up probably 3-4 hours a week of my time. For a solo founder, thats HUGE. Thats a half-day of building actual features instead of answering the same five questions over and over.

The whole thing runs through Global API, which honestly I just stumbled into but I'm kinda glad I did. Having 184 models available through one interface is actually wild when I think about it. I'm not even using a fraction of them yet but knowing I can swap to whatever's best for any new feature without redoing my whole stack... thats peace of mind.

If you're thinking about building something like this, my honest advice? Just do it. Don't over-research. Don't over-plan. Pick a model, write the code, ship it, iterate. The marginal cost is so low that the only real risk is wasting an afternoon, and you'll learn a ton in that afternoon regardless.

And hey, if you want to check out Global API for yourself, they've got a bunch of free credits to get started so you can actually test stuff before committing. I used those to run my benchmarks and figure out which models were worth it. Pretty cool that they let you kick the tires like that. Take a look if you want — no pressure, just sharing what worked for me.

Now if you'll excuse me, I have a bot to keep ignoring while it does my job for me. 😉

DEV Community