DEV Community

Siddhant Sharma
Siddhant Sharma

Posted on

How to automatically monitor new ML research papers on Arxiv by keyword

Staying on Top of ML Research

With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.

The Automation

I built an Arxiv scraper on Apify that:

  1. Keyword search: Define the topics you care about (e.g., "diffusion models", "LLM alignment", "RLHF")
  2. Scheduled runs: Set it to check daily or hourly
  3. Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
  4. Easy integration: JSON output works with any webhook, Slack bot, or Notion database

Example use: Slack Bot

import requests

# Run the scraper
result = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
    json={"keywords": ["diffusion models"], "maxResults": 10}
)

# Post to Slack
for paper in result.json():
    requests.post("YOUR_SLACK_WEBHOOK", json={
        "text": f"*New paper*: {paper[title]}\n{paper[url]}"
    })
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:

  • Zero missed papers in your niche
  • Daily digest delivered to your preferred platform
  • Easy collaboration with teams (shared paper feeds)

Try it on the Apify Store β€” free tier available.

Top comments (0)