arXiv has a free Atom XML API for searching 2M+ research papers.
Search Papers
http://export.arxiv.org/api/query?search_query=all:large+language+models&sortBy=submittedDate&sortOrder=descending&max_results=10
Browse by Category
http://export.arxiv.org/api/query?search_query=cat:cs.AI&max_results=25
Categories include: cs.AI, cs.LG, cs.CL, cs.CV, physics, math, and 150+ more.
What You Get
- Title and abstract
- Authors list
- Categories
- Published and updated dates
- PDF and HTML links
- DOI and journal references
Parsing the XML
The API returns Atom XML. Use cheerio with xmlMode:
import * as cheerio from "cheerio";
const $ = cheerio.load(xml, { xmlMode: true });
$("entry").each((i, el) => {
const title = $(el).find("title").text().trim();
const abstract = $(el).find("summary").text().trim();
});
Rate Limits
arXiv asks for 3-second delays between requests. Be respectful.
I built an arXiv Paper Scraper with pagination — free on Apify (search knotless_cadence arxiv).
Top comments (0)