Nakul Dev

Posted on Mar 15

I Built a Bot That Keeps My Resume Always Up to Date on GitHub

#github #automation #node #webdev

I use Overleaf to write my resume in LaTeX. Every time I made an edit, I had to manually compile it, download the PDF, and push it to my GitHub repo so my portfolio website could link to it. After doing this one too many times, I decided to automate the whole thing.

Here's the full story - the scraper, the GitHub Actions workflow, the bugs I hit, and how I eventually wired it to my portfolio site.

The Problem

My portfolio at nakuldev.vercel.app links directly to my resume PDF. For that link to always point to the latest version, I'd have to:

Open Overleaf, compile, download
Replace the old PDF in my repo
Commit and push

Boring. Repetitive. Easy to forget. So I automated it.

The Plan

Write a Node.js script that opens my Overleaf share link in a headless browser
Find the PDF download link in the DOM
Download the PDF and save it locally
Run this on a schedule via GitHub Actions
Auto-commit and push the new PDF back to the repo

Step 1 - Scraping Overleaf with Playwright

Overleaf is a React app - you can't just fetch() it and parse the HTML. The download link only appears after the project compiles, which happens client-side. So I needed a real browser.

I went with Playwright for this.

npm install playwright
npx playwright install chromium

The download button in Overleaf's DOM looks like this:

<a href="/download/project/69b65297.../output/output.pdf?..." 
   aria-label="Download PDF">

So the selector I needed was:

a[href^="/download/project"]

Here's the core scraping logic:

await page.goto("https://www.overleaf.com/read/YOUR_SHARE_LINK", {
  waitUntil: "networkidle",
  timeout: 60_000
});

await page.waitForSelector('a[href^="/download/project"]', {
  state: "attached",  // important - element lives inside a closed dropdown
  timeout: 90_000,
});

const relativeHref = await page.evaluate(() => {
  const links = Array.from(
    document.querySelectorAll('a[href^="/download/project"]')
  );
  const pdfLink = links.find((el) => el.href.includes("output.pdf"));
  return pdfLink
    ? new URL(pdfLink.href).pathname + new URL(pdfLink.href).search
    : links[0]?.getAttribute("href") ?? null;
});

const fullUrl = `https://www.overleaf.com${relativeHref}`;

Then download it using the browser's session cookies (so Overleaf accepts the request):

const cookies = await context.cookies();
await downloadFile(fullUrl, destPath, cookies);

Bug I hit: `waitForSelector` timing out

My first attempt used the default visible state:

await page.waitForSelector('a[href^="/download/project"]'); // ❌ timed out

The error log showed:

180 × locator resolved to 2 elements. Proceeding with the first one

The element existed but was hidden inside a closed dropdown menu. It was never going to become visible. The fix was switching to state: "attached" - which only requires the element to be in the DOM, not visible on screen.

Step 2 - Saving to Two Places

I wanted two things from each run:

scrapped/resume01.pdf - a numbered archive (auto-deleted after 7 days)
Nakul_Dev_M_V_Resume.pdf in the root - always the latest, fixed filename, safe to link to

// Numbered archive
const pdfName = nextPdfName(); // resume01.pdf, resume02.pdf...
await downloadFile(fullUrl, path.join(SCRAPPED_DIR, pdfName), cookies);

// Fixed root copy
const rootPdfPath = path.join(ROOT_DIR, "Nakul_Dev_M_V_Resume.pdf");
if (fs.existsSync(rootPdfPath)) fs.unlinkSync(rootPdfPath);
fs.copyFileSync(path.join(SCRAPPED_DIR, pdfName), rootPdfPath);

Cleanup for files older than 7 days:

function cleanOldScrapped() {
  const now = Date.now();
  const ONE_WEEK_MS = 7 * 24 * 60 * 60 * 1000;

  fs.readdirSync(SCRAPPED_DIR)
    .filter((f) => /^resume\d+\.pdf$/i.test(f))
    .forEach((f) => {
      const filePath = path.join(SCRAPPED_DIR, f);
      if (now - fs.statSync(filePath).mtimeMs > ONE_WEEK_MS) {
        fs.unlinkSync(filePath);
      }
    });
}

Step 3 - GitHub Actions Workflow

The local script worked great. Now I needed to run it automatically in CI.

name: Overleaf PDF Scraper

on:
  schedule:
    - cron: "0 6 * * *"     # every day at 06:00 UTC
  workflow_dispatch:         # manual trigger via Actions tab
  repository_dispatch:
    types: [visitor-trigger] # triggered via API (more on this below)

permissions:
  contents: write

jobs:
  scrape-and-push:
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4.2.2
        with:
          fetch-depth: 0

      - uses: actions/setup-node@v4.4.0
        with:
          node-version: "20"
          cache: "npm"

      - run: npm ci
      - run: npx playwright install --with-deps chromium
      - run: node scraper.js

      - name: Commit and push
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add Nakul_Dev_M_V_Resume.pdf scrapped/ History.json
          if git diff --cached --quiet; then
            echo "Nothing new to commit."
          else
            git commit -m "feat: update resume [$(date -u '+%Y-%m-%d %H:%M UTC')]"
            git push
          fi

Important: enable write permissions

By default GitHub Actions can't push to your repo. Go to:
Settings → Actions → General → Workflow permissions → Read and write permissions

Step 4 - Triggering From My Portfolio

The daily cron covers most cases, but I also wanted to trigger a fresh scrape whenever someone visits my portfolio - so they always get the absolute latest version.

The challenge: you can't call the GitHub API directly from frontend JavaScript because you'd have to expose your token. The solution: a Next.js API route that acts as a secure middleman.

Visitor loads portfolio
  → calls /api/trigger (server-side)
  → API route calls GitHub with the secret token
  → GitHub Actions fires

The API route at app/api/trigger/route.js:

const COOLDOWN_HOURS = 6;
let lastTriggeredAt = null;

export async function POST() {
  const now = Date.now();
  const cooldownMs = COOLDOWN_HOURS * 60 * 60 * 1000;

  // Don't trigger more than once every 6 hours
  if (lastTriggeredAt && now - lastTriggeredAt < cooldownMs) {
    return Response.json({ ok: false, reason: "cooldown" }, { status: 429 });
  }

  await fetch("https://api.github.com/repos/nakuldevmv/Resume/dispatches", {
    method: "POST",
    headers: {
      Authorization: `token ${process.env.GITHUB_TOKEN}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ event_type: "visitor-trigger" }),
  });

  lastTriggeredAt = now;
  return Response.json({ ok: true });
}

Called from the portfolio page:

useEffect(() => {
  fetch("/api/trigger", { method: "POST" });
}, []);

The token lives in .env.local and never touches the browser:

GITHUB_TOKEN=ghp_xxxxxxxxxxxx

The repository_dispatch trigger in the workflow listens for visitor-trigger - which matches the event_type sent by the API route exactly.

Step 5 - Failure Notifications

If Overleaf goes down or the compile fails and the download link isn't found, I wanted to know immediately. The simplest option: GitHub's built-in email notifications.

Go to: GitHub profile → Settings → Notifications → Actions → Failed workflows

That's it. Zero config, instant email on any workflow failure.

For something more immediate (like Telegram), you can add this step at the end of the workflow:

- name: Notify on failure
  if: failure()
  run: |
    curl -s -X POST "https://api.telegram.org/bot${{ secrets.TELEGRAM_TOKEN }}/sendMessage" \
      -d chat_id="${{ secrets.TELEGRAM_CHAT_ID }}" \
      -d text="❌ Resume scraper failed - check the Actions tab."

The Final Repo Structure

Resume/
├── .github/workflows/scrape.yml   # the workflow
├── scrapped/                      # daily archive (7-day TTL)
│   ├── resume01.pdf
│   └── resume02.pdf
├── scraper.js                     # the scraper
├── package.json
├── History.json               # download log
└── Nakul_Dev_M_V_Resume.pdf       # ← always the latest

History.json logs every run:

{
  "timestamp": "2026-03-15T06:02:41.000Z",
  "scrappedFile": "scrapped/resume01.pdf",
  "rootFile": "Nakul_Dev_M_V_Resume.pdf",
  "link": "https://www.overleaf.com/download/project/..."
}

Result

✅ Resume auto-updates every day at 06:00 UTC
✅ Portfolio visitors can trigger a fresh scrape via the API route (with cooldown)
✅ Root PDF always has a fixed, linkable filename
✅ Weekly archive kept for 7 days, then cleaned up
✅ Email alert if anything breaks

Direct link to my latest resume:
👉 nakuldevmv.github.io/Resume/Nakul_Dev_M_V_Resume.pdf

Full source code:
👉 github.com/nakuldevmv/Resume

Portfolio:
👉 nakuldev.vercel.app

If you also write your resume in LaTeX on Overleaf, you can fork the repo, swap out the share URL and PDF filename in scraper.js, and have this running for yourself in under 10 minutes.

DEV Community

I Built a Bot That Keeps My Resume Always Up to Date on GitHub

The Problem

The Plan

Step 1 - Scraping Overleaf with Playwright

Bug I hit: `waitForSelector` timing out

Step 2 - Saving to Two Places

Step 3 - GitHub Actions Workflow

Important: enable write permissions

Step 4 - Triggering From My Portfolio

Step 5 - Failure Notifications

The Final Repo Structure

Result

Top comments (0)

The Problem

The Plan

Step 1 - Scraping Overleaf with Playwright

Bug I hit: waitForSelector timing out

Step 2 - Saving to Two Places

Step 3 - GitHub Actions Workflow

Important: enable write permissions

Step 4 - Triggering From My Portfolio

Step 5 - Failure Notifications

The Final Repo Structure

Result

Bug I hit: `waitForSelector` timing out