How to automatically monitor new ML research papers on Arxiv by keyword

#machinelearning #ai #research #automation

Staying on Top of ML Research

With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.

The Automation

I built an Arxiv scraper on Apify that:

Keyword search: Define the topics you care about (e.g., "diffusion models", "LLM alignment", "RLHF")
Scheduled runs: Set it to check daily or hourly
Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
Easy integration: JSON output works with any webhook, Slack bot, or Notion database

Example use: Slack Bot

import requests

# Run the scraper
result = requests.post(
    "https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
    json={"keywords": ["diffusion models"], "maxResults": 10}
)

# Post to Slack
for paper in result.json():
    requests.post("YOUR_SLACK_WEBHOOK", json={
        "text": f"*New paper*: {paper[title]}\n{paper[url]}"
    })