Sam Hartley

Posted on Jun 9

I Built an AI Assistant That Lives in My Telegram — Here's What 6 Months Taught Me

#ai #telegram #selfhosted #automation

Six months ago I got tired of switching between apps to talk to AI. ChatGPT in the browser. Claude in another tab. Local models in a terminal. It was like having five friends who all live in different cities and refuse to visit each other.

So I did what any developer with too many GPUs and too little patience would do: I built my own assistant and put it where I already spend my day — Telegram.

It's not a chatbot for customers. It's not a business automation tool. It's just... my assistant. It lives in a private chat on my phone and handles the stuff I used to do manually. Here's what six months of actually using it has looked like.

What I Actually Built (And Why Telegram)

I already had three machines running Ollama at home — a Mac Mini M4, a Windows PC with an RTX 3060, and an Ubuntu box. Three endpoints, eight models, and me constantly forgetting which model was good for what.

Telegram was the obvious choice because:

I'm already there all day (friends, family, a few dev groups)
It works on my phone, my Mac, and my watch
The Bot API is dead simple
I can send voice messages, photos, documents — and the bot can handle all of them

The setup: a Python bot running on the Mac Mini, connected to all three Ollama endpoints. When I message it, the bot classifies what I want, routes to the right model on the right machine, and replies in the same chat thread.

Sounds simple. Took three evenings to get right. Took six months to make actually useful.

The Things I Actually Use It For

Here's the honest list. Not the marketing pitch — the real daily usage:

1. Quick questions without context switching

"Summarize this article" (I paste a link). "Explain this error" (I paste a stack trace). "Rewrite this email less formally." These used to mean opening a browser tab, logging in, maybe hitting a rate limit. Now I just... send a message. The reply comes back in 2-8 seconds depending on which model handles it.

The routing is simple but effective: quick chat → small model on the Mac. Code → 30B coder on the GPU machine. Complex reasoning → 8B reasoning model. Vision (screenshots) → vision model on GPU. It's not fancy — just keyword matching — but it works 90% of the time.

2. Voice notes while walking

This was the surprise killer feature. I walk a lot (living near Sakarya, there's decent hiking). I send voice messages to the bot while walking. It transcribes them (Whisper via Ollama), processes the request, and replies with text I can read when I'm back.

"Remind me to refactor the database module when I'm home" → transcribed, understood, added to my notes. "What was that Python pattern for retry logic with exponential backoff?" → code snippet in my pocket before I finish the trail.

I probably send 5-10 voice messages a day now. Never would have predicted that.

3. Code review on my phone

Someone sends me a code snippet in a dev group. I forward it to the bot: "review this." It comes back with actual useful feedback — variable naming issues, potential edge cases, suggestions for simplification. Is it as good as a senior dev? No. Is it better than my phone-scrolling half-attention review? Absolutely.

4. Document Q&A

I dump PDFs, markdown files, or pasted text into the chat and ask questions. The bot uses a local RAG setup (Chroma + nomic-embed-text) that indexes my project docs, notes, and anything I feed it. "How does my Garmin watch face fetch stock data?" → actual answer from my own documentation, not a hallucinated guess.

5. The dumb stuff that adds up

"Convert this JSON to a Python dataclass"
"What's 847 * 16 / 3 in hex?"
"Translate this Turkish message to German"
"Generate a regex that matches these three examples"

None of these are hard. All of them are annoying to do manually. Having an always-on assistant in my most-used app removes the friction completely.

What Went Wrong (The Honest Part)

The "it's down and I don't know why" problem

For the first month, the bot crashed randomly. Out of memory on the Mac Mini (it's only got 16GB). Network hiccup to the Windows PC. Ubuntu box decided to update itself and reboot. I'd message the bot and... silence. Then I'd SSH in, check logs, restart services, and feel like I was maintaining infrastructure instead of having an assistant.

Fix: health checks, auto-restart via launchd, and a fallback chain. If the GPU machine is down, everything routes to the Mac's smaller model. Degraded but functional.

The "it answered confidently and was wrong" problem

Early on, I'd trust the bot's answers without verifying. It told me a Python function was valid. It wasn't. It gave me a Docker command with a subtle flag error. I spent 20 minutes debugging before I realized the bot hallucinated a flag that doesn't exist.

My rule now: if the answer matters, I verify it. The bot is my fastest junior developer. It's also my most confident one.

The "I talk to it more than some humans" problem

This is just a weird psychological thing. I realized after a few months that I was messaging the bot 20-30 times a day. More than some friends. There's something slightly dystopian about having your most responsive conversation partner be a Python script. I'm aware of it. I haven't fixed it. Just noting it.

The Architecture (If You Want to Build This)

Telegram Message
→ Python Bot (python-telegram-bot)
→ Classify intent (simple keyword router)
→ Route to Ollama endpoint
→ Mac Mini (qwen3:4b) for quick chat
→ Windows PC (qwen3-coder:30b) for code
→ Windows PC (granite3.2-vision:2b) for images
→ Ubuntu (minicpm-v) as fallback
→ Optional: RAG lookup in Chroma DB
→ Format reply (code blocks, markdown, etc.)
→ Send back to Telegram

The whole thing runs on a Mac Mini M4. Total cost: $0 for software, maybe $8/month in electricity if you count the always-on machines.

What I'd Do Differently

1. Build the router on day one. I started with "just use the big model for everything." It worked but was slow and kept my GPU busy. The router took an afternoon to write and improved response times by 3x.

2. Add voice support immediately. I added it as a "nice to have" afterthought. It became 30% of my usage. If you're building something similar, start with voice. People talk more than they type on phones.

3. Make it degrade gracefully. Machines go down. Networks hiccup. Your bot should always answer something, even if it's "I'm running slow today, but here's a basic answer." Silence is worse than a degraded response.

4. Log everything. I log every request, response time, and which model handled it. Not for analytics — for debugging. When something feels slow, the logs tell me if it's the model, the network, or my terrible code.

Is This Better Than ChatGPT Plus?

Depends on what you value.

	My Bot	ChatGPT Plus
Cost	$0/month	$20/month
Privacy	✅ Everything stays local	❌ Cloud
Speed	⚡ 0.3-12s depending on model	⚡ ~2-5s
Availability	🟢 24/7 (if I maintain it)	🟢 24/7 (they maintain it)
Model choice	8 models, I pick	4 models, they pick
Voice	✅ Native in Telegram	✅ Yes
Reliability	🟡 I fix it when it breaks	🟢 It just works

For me, the privacy and model flexibility win. For someone who doesn't want to maintain infrastructure, ChatGPT Plus is the obvious choice. This is a hobby project that became useful, not a product recommendation.

The Real Lesson

The best AI assistant isn't the most powerful one. It's the one that's actually there when you need it, in the app you already use, without making you think about models or endpoints or API keys.

I built this because I was annoyed. I kept using it because it removed friction from my day. That's the bar: not "can it do X?" but "is it easier than doing X myself?"

For 80% of what I ask, the answer is yes. For the remaining 20%, I still open a terminal or a browser. And that's fine.

I write about building things with local AI, self-hosting, and side projects that accidentally become useful. If you're running a home lab or experimenting with local models, I'd love to hear your setup — drop it in the comments.

DEV Community