DEV Community

Perufitlife
Perufitlife

Posted on • Originally published at apify.com

How to Scrape Public Telegram Channels Without the API, Login, or MTProto

Most tutorials on scraping Telegram start the same way: register an app at my.telegram.org, get an api_id and api_hash, install a giant MTProto client like Telethon or GramJS, and authenticate with your own phone number. That works, but it has a nasty cost: you are putting a real account on the line. Telegram bans MTProto sessions that look automated, and tying your personal number to a scraper is a great way to lose it.

For public channels, you don't need any of that. There's a much simpler door, and it's been hiding in plain sight: t.me/s/.

The trick: t.me/s/<channel>

When you open a public channel in a browser, Telegram normally serves a JS app. But there's a special preview route — the /s/ (for "slug" / preview) path — that returns server-rendered HTML of the channel feed. No JavaScript execution, no login wall, no token.

Try it yourself:

https://t.me/s/durov
Enter fullscreen mode Exit fullscreen mode

That page contains the last ~20 messages as plain HTML, with view counts, timestamps, media URLs, link previews, and forwarded-from info baked right into the markup. You can paginate backwards in time with a ?before=<messageId> query parameter.

Parsing it

Each message lives in a .tgme_widget_message block. Here's a minimal Node example using cheerio:

import got from 'got';
import * as cheerio from 'cheerio';

async function fetchChannel(channel, before = null) {
  const url = before
    ? `https://t.me/s/${channel}?before=${before}`
    : `https://t.me/s/${channel}`;

  const html = await got(url).text();
  const $ = cheerio.load(html);
  const messages = [];

  $('.tgme_widget_message').each((_, el) => {
    const $el = $(el);
    const dataPost = $el.attr('data-post'); // "durov/123"
    const messageId = dataPost ? Number(dataPost.split('/')[1]) : null;

    messages.push({
      messageId,
      text: $el.find('.tgme_widget_message_text').text().trim(),
      date: $el.find('time').attr('datetime'),
      views: $el.find('.tgme_widget_message_views').text().trim(),
      hasMedia: $el.find('.tgme_widget_message_photo_wrap, video').length > 0,
    });
  });

  // oldest message id on this page -> next ?before value
  const oldest = messages.length ? Math.min(...messages.map(m => m.messageId)) : null;
  return { messages, nextBefore: oldest };
}
Enter fullscreen mode Exit fullscreen mode

Loop on nextBefore and you have full backward pagination through the channel's history. Dedupe by messageId and you're done.

Why this beats MTProto for public data

  • No account at risk. You never log in, so there's nothing to ban.
  • No rate-limit dance with FLOOD_WAIT. It's just HTTP; rotate IPs if you go heavy.
  • Stateless and parallelizable. No session files, no auth key persistence.
  • It's literally the data Telegram chose to make public. The /s/ preview exists so links unfurl nicely on the web — you're reading the same thing a Twitter card preview reads.

The one limitation: this only works for public channels (the ones with a t.me/<name> handle). Private channels and DMs genuinely require MTProto + auth, and that's a good thing.

A reality check on the ecosystem

If you look at existing Telegram scrapers on the Apify Store, the top result has a rating around 1.4 stars — and the reviews all say the same thing: it forces you to hand over your phone number and api credentials, then gets your session limited. People hate it because credential-based scraping of public data is the wrong tool for the job.

That's exactly why I built a Telegram Channel Scraper around the t.me/s/ approach instead. You pass channel handles, it returns structured channel metadata (subscriber count, photo/video/link counters) plus message records with viewCount, date, hasMedia, parsed links, hashtags, linkPreview, and forwardedFrom — no API key, no login, no phone number. It handles the backward pagination, dedupe, and the edge cases (private/nonexistent channels) for you.

But even if you never touch the hosted version, the takeaway stands: for public Telegram data, scrape the web preview, not the API. It's simpler, safer, and it's the data Telegram already decided to publish.

If you build something with the t.me/s/ trick, I'd love to hear what edge cases you hit — the media array parsing (albums vs single photos vs video round messages) is the fun part.

Top comments (0)