DEV Community

Tracepilot
Tracepilot

Posted on

Your Trending Repos Script Broke. Again.

Your Trending Repos Script Broke. Again.

You set up a GitHub Action to scrape trending TypeScript repos. It ran at 9 AM. You got an empty JSON array. No errors. Just [].

Sound familiar?

Here's what happened: GitHub's trending page changed its HTML structure. Your cheerio selector stopped matching. The cron job ran happily, found zero repos, and overwrote your daily digest with nothing. Nobody noticed until someone asked "why wasn't X trending yesterday?"

This sucks. I know. I've debugged this exact thing three times in the last year.

Why It Breaks

GitHub's trending page doesn't have an API. You're scraping HTML. The issue isn't your code — it's that you're parsing a document designed for humans, not machines.

// Your current approach — fragile as hell
const $ = cheerio.load(html);
const repos = $('.Box article.Box-row').map((i, el) => {
  const name = $(el).find('h2 a').text().trim();
  return { name };
}).get();
Enter fullscreen mode Exit fullscreen mode

That selector worked in January. In February, GitHub added a wrapper div. In March, they changed h2 to h3. Your script silently failed every time.

The real problem: You have no feedback loop. The script runs, produces nothing, and you only find out when someone complains. No alert. No trace. Just silence.

Manual Fix — Make It Resilient

Let's fix this properly. First, add validation. Fail loudly.

async function fetchTrending() {
  const res = await fetch('https://github.com/trending/typescript?since=daily');
  const html = await res.text();

  // Validate we got real content
  if (!html.includes('trending')) {
    throw new Error('HTML structure changed — trending section missing');
  }

  const $ = cheerio.load(html);
  const repos = [];

  // Try multiple selectors — be defensive
  const articles = $('article.Box-row').length 
    ? $('article.Box-row') 
    : $('div[class*="Box-row"]');

  if (articles.length === 0) {
    throw new Error('No repos found — selector likely broken');
  }

  articles.each((i, el) => {
    repos.push({
      name: $(el).find('h2 a, h3 a').first().text().trim(),
      stars: $(el).find('.octicon-star').parent().text().trim(),
      description: $(el).find('p').text().trim()
    });
  });

  return repos;
}
Enter fullscreen mode Exit fullscreen mode

Now you get errors instead of silence. But you still don't know why it broke. You'll SSH into the box, run it manually, stare at the HTML, and curse.

The Better Way — TracePilot

Add one import. Wrap your fetch. Now every run is recorded, including the raw HTML that broke your parser.

import { TracePilot } from 'tracepilot-sdk';

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY);

async function fetchTrending() {
  await tp.startTrace('trending-repos-scraper');

  const res = await fetch('https://github.com/trending/typescript?since=daily');
  const html = await res.text();

  const { result, spanId } = await tp.wrapToolCall(
    'parse-trending-page',
    () => {
      const $ = cheerio.load(html);
      const repos = [];
      const articles = $('article.Box-row');

      if (articles.length === 0) {
        throw new Error('No repos found');
      }

      articles.each((i, el) => {
        repos.push({ name: $(el).find('h2 a').text().trim() });
      });

      return repos;
    },
    null,
    1
  );

  return result;
}
Enter fullscreen mode Exit fullscreen mode

When it fails, open your dashboard. You'll see the exact HTML that was returned. Fork the trace. Edit the selector. Replay. No redeployment. No guessing.

Guess what happens next? You can even set up alerts: "If zero repos returned, notify the team." Real feedback. Real debugging.

The fix isn't better selectors. It's knowing exactly what happened the moment it broke.


Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Top comments (0)