Staying on Top of ML Research
With ~10,000 new papers on Arxiv every month, staying current in your specific niche is nearly impossible through manual browsing.
The Automation
I built an Arxiv scraper on Apify that:
- Keyword search: Define the topics you care about (e.g., "diffusion models", "LLM alignment", "RLHF")
- Scheduled runs: Set it to check daily or hourly
- Structured output: Returns paper title, authors, abstract, arXiv URL, PDF link, and categories
- Easy integration: JSON output works with any webhook, Slack bot, or Notion database
Example use: Slack Bot
import requests
# Run the scraper
result = requests.post(
"https://api.apify.com/v2/acts/technicaldost~arxiv-paper-scraper/run-sync",
json={"keywords": ["diffusion models"], "maxResults": 10}
)
# Post to Slack
for paper in result.json():
requests.post("YOUR_SLACK_WEBHOOK", json={
"text": f"*New paper*: {paper[title]}\n{paper[url]}"
})
Why This Matters
Researchers and engineers waste hours browsing Arxiv. An automated pipeline means:
- Zero missed papers in your niche
- Daily digest delivered to your preferred platform
- Easy collaboration with teams (shared paper feeds)
Try it on the Apify Store β free tier available.
Top comments (0)