I built a Telegram bot that reads 70 arXiv papers a day so I don't have to

landigf — Sat, 11 Apr 2026 22:18:52 +0000

I built a Telegram bot that reads 70 arXiv papers a day so I don't have to

the problem

i was drowning in arXiv. I had 30 tabs of Zotero saved papers I'd never opened, an inbox full of unread newsletter digests, and the creeping certainty that someone, somewhere, had already published the exact thing I was about to spend 3 weeks "discovering."

I tried everything:

arXiv RSS feeds → too noisy. 100+ papers a day, no signal.
Email newsletters → I never opened them. My subconscious classifies "newsletter" alongside "marketing email."
Twitter accounts that summarize papers → algorithmic, not personalized to my niche.
Just trying harder → did not work.

The real question I kept failing to answer was: what changed in my exact subfield in the last 24 hours, and is it worth my time?

So I built it.

what it is

Broletter is a Telegram bot. Every morning it sends me one short message with 4 sections:

🔬 Daily Science — Saturday, 12 Apr

Tap a section to read it. Skip the rest.

💡 Virtual Memory: The OS Magic You Use Daily
   Deep Curiosity — Ever wonder how your computer juggles so many apps?
   Dive into the surprisingly elegant system that makes it all possible.

📄 Benchmarking Science: Extracting Experiments from Papers
   Research Spotlight — Chong and Colindres introduce LitXBench, a new
   tool to automatically extract experimental details from scientific
   literature for materials science.

⚡ The Universe: Humanity's First Computer?
   Quick Bites — Could the entire universe have functioned as a giant
   computer running the laws of physics since the Big Bang?

🎯 Testing APIs Beyond Basic CRUD Operations
   Your Research Corner — Yang et al. propose a new log-based approach
   for API testing that accounts for complex business logic, going
   beyond simple OpenAPI specs.

   [📖 Curiosity] [📖 Research]
   [📖 Bites] [📖 Corner]
   [📖 Read all] [⏭ Skip today]

Each section is a one-line preview + a tap-to-expand button. I tap what looks interesting, full content arrives, the rest stays hidden. Reactions tune what I see tomorrow.

That's the entire UX. It works because it removes the only thing that ever broke my reading habit: the wall of text that makes me say "I'll read this later" and never return.

Useful mostly for STEM, but you can configure it for internships, startup news, funding rounds, lab updates, etc. It's a delivery mechanism, not a content silo.

how I got there (the embarrassing first version)

Version 1 was a wall of text. 5 sections, each ~300 words, sent as separate Telegram messages with reaction buttons. Looked like an actual newsletter. I shipped it to a Telegram group of friends.

A friend wrote back the next day:

"no bro troppo lungo non me lo leggerò mai. Cioè tipo meglio che mi fai un sunto ultra veloce e io dico subito cosa mi interessa e cosa no e tu mi mandi il messaggio completo"

(Italian for: "too long, I'll never read it. Better if you give me an ultra-quick summary and I tell you what interests me, then you send the full thing.")

Then I checked Firestore. Across 8 users:

7 votes for "shorter", 4 for "perfect", 0 for "longer"
Only 3 of 8 users had ever pressed a reaction button at all
The friend who wrote the feedback had zero reactions before giving up

That's not a tuning problem. That's a fundamental UX failure. I rebuilt it in 2 days as the preview-card flow you see above.

the architecture (this is the part you came for)

                    ┌─────────────────────┐
                    │  Cloud Scheduler    │
                    │  (cron, daily 8pm)  │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │  Cloud Run Job:     │
                    │  prefetch-papers    │  ← One arXiv fetch
                    │                     │     for ALL users
                    └──────────┬──────────┘
                               │ writes
                    ┌──────────▼──────────┐
                    │  Firestore          │
                    │  /papers_cache/     │
                    └──────────┬──────────┘
                               │ reads
                    ┌──────────▼──────────┐
                    │  Cloud Run Job:     │
                    │  generate-all       │  ← Generates per-user
                    │                     │     newsletters
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │  Telegram Bot API   │
                    │  (preview cards)    │
                    └─────────────────────┘

Stack:

Python 3.13 + FastAPI (web container handling Telegram webhook)
Gemini 2.5 Flash + Flash-Lite (mixed — see cost section)
Firestore (multi-tenant, one doc per user)
Cloud Run + Cloud Run Jobs (web stays warm, batch jobs run on schedule)
Telegram Stars payments — users pay in Telegram's native currency, no credit card

cost engineering (this is the part you really came for)

I started on Gemini 2.5 Flash, with everything default. Projected cost at 100 users: ~$1.20 per user per month. I wanted to charge less than $3/month and still profit, so that was way too high. Three changes dropped it to ~5 cents per user per month.

1. disable thinking tokens (the big one — 3-10x cost cut)

Gemini 2.5 Flash uses "thinking" tokens by default. They're billed at output rate but they're invisible to you — you don't see them in the response, but they multiply your bill. For creative writing tasks (which is what newsletter generation is), thinking adds nothing. Disable it:

from google import genai

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt,
    config=genai.types.GenerateContentConfig(
        system_instruction=system,
        thinking_config=genai.types.ThinkingConfig(thinking_budget=0),
    ),
)

That single line cut my output token usage by 3-10x depending on the prompt. I verified by logging actual usage_metadata.candidates_token_count before and after — the visible output tokens stayed the same, and the bill dropped.

Lesson: thinking is for math and code. For creative writing, it's burning money without improving quality.

2. two-tier model split

Not all sections deserve the same model:

Curiosity, Research Spotlight, Quick Bites — these are the same for any user with overlapping interests. They benefit from Flash-Lite ($0.10 in / $0.40 out per 1M tokens, 6x cheaper than Flash standard).
Personal research section — this is hyper-personalized to each user's exact research keywords and feedback history. It needs the better model. I use Flash standard here.

class PoolGenerator:
    """Cheap shared sections — Flash-Lite, generic system prompt"""
    def __init__(self):
        self.model = "gemini-2.5-flash-lite"

class NewsletterGenerator:
    """Per-user personal section + Sunday recap — Flash, personalized prompt"""
    def __init__(self, config):
        self.model = "gemini-2.5-flash"

3. content pool architecture

Sections that don't need to be unique per user shouldn't be generated per user. I generate a daily "pool" of curiosity articles (one per theme), research spotlights (one per top paper), and quick bites (3 sets) — once per day, globally. Then for each user, I assemble their newsletter from the pool by matching their interests.

def generate_daily_pool(date: str):
    """Runs ONCE per day, regardless of user count."""
    themes = collect_unique_themes_across_all_users()
    for theme in themes:
        save_pool_item(date, "curiosity", theme,
                       pool_gen.curiosity(theme, words=300))
    # Same for research spotlights and quick bites

def assemble_for_user(user, date):
    """Runs per user. Picks from pool + generates only personal sections."""
    sections = {}
    sections["curiosity"] = load_pool_item(date, "curiosity", user.theme)
    sections["research"] = load_pool_item(date, "research", best_match(user))
    sections["quick_bites"] = load_pool_item(date, "quick_bites", rotating)
    sections["personal"] = personal_gen.generate(user)
    return sections

The math at 100 users:

Pool generation: ~16 LLM calls/day (8 themes + 5 papers + 3 quick bite sets), once. Fixed cost: ~$0.12/month.
Per-user generation: 1 personal call + 1 Sunday recap call. Per-user cost: ~$0.045/month.
Total at 100 users: ~$4.62/month.

At $1/month per user revenue: 95% margin. Even at $0.50/month: 89%.

bonus: LLM-generated preview hooks (the part where I learned to not be lazy)

The preview card needs a one-line hook per section. My first attempt: parse the first bold phrase or first sentence of each generated section.

The user (same friend, different feedback) responded:

"How do you think this could be a good solution just putting the first sentence? You need to write a short summary, use an API call idk, but need more details to understand if I like it."

He was right. The first sentence is a hook for the section author, not for the section reader. I added a separate Flash-Lite call after generation that takes all sections as input and returns structured JSON {title, teaser} for each. Cost per call: ~$0.0002. So $0.0066 per user per month for genuinely useful previews.

def _generate_previews(client, sections):
    parts = [f"--- {key} ---\n{content[:400]}\n"
             for key, content in sections.items() if content]
    prompt = PREVIEW_PROMPT.format(sections_text="\n".join(parts))
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=prompt,
        config=genai.types.GenerateContentConfig(
            thinking_config=genai.types.ThinkingConfig(thinking_budget=0),
            response_mime_type="application/json",
        ),
    )
    return json.loads(response.text)

what I learned

Default to "off" for thinking tokens unless your task is reasoning. The default in the SDK is "auto" which means "on for Flash."
The first user feedback that hurts is the most valuable. I was proud of v1. The "troppo lungo" text was the best thing that happened to the product.
Architecture is for the cost sheet, not the org chart. A multi-tenant content pool sounds enterprise-y. It's actually 80 lines of Python and saves 90% of your LLM bill.
A 1-line thinking_budget=0 is worth more than any prompt engineering you'll do this week. Try it on your existing app right now.
Telegram Stars are underrated for indie devs. No Stripe setup, no chargebacks, no PCI compliance, no credit card forms. You get paid in Stars → withdraw to TON → swap to fiat. Telegram takes 0% on the withdrawal.

try it

It's open source: github.com/landigf/Broletter — MIT, you can self-host.

Or try the hosted version: @BroletterBot on Telegram. 7-day free trial, no credit card. Then 50/100/150 Stars per month (~$0.75–$2.25). If you find a bug, /feedback in the bot goes straight to my Telegram.

Brutal feedback wanted. The product is ~2 weeks old.

DEV Community: landigf

I built a Telegram bot that reads 70 arXiv papers a day so I don't have to

I built a Telegram bot that reads 70 arXiv papers a day so I don't have to

the problem

what it is

how I got there (the embarrassing first version)

the architecture (this is the part you came for)

cost engineering (this is the part you really came for)

1. disable thinking tokens (the big one — 3-10x cost cut)

2. two-tier model split

3. content pool architecture

bonus: LLM-generated preview hooks (the part where I learned to not be lazy)

what I learned

try it