Skip to content

DEV Community

Алексей Спинов

Posted on Mar 18

arXiv API: How to Search Research Papers Programmatically

#research #api #machinelearning #science

arXiv has a free Atom XML API for searching 2M+ research papers.

Search Papers

http://export.arxiv.org/api/query?search_query=all:large+language+models&sortBy=submittedDate&sortOrder=descending&max_results=10

Browse by Category

http://export.arxiv.org/api/query?search_query=cat:cs.AI&max_results=25

Categories include: cs.AI, cs.LG, cs.CL, cs.CV, physics, math, and 150+ more.

What You Get

Title and abstract
Authors list
Categories
Published and updated dates
PDF and HTML links
DOI and journal references

Parsing the XML

The API returns Atom XML. Use cheerio with xmlMode:

import * as cheerio from "cheerio";
const $ = cheerio.load(xml, { xmlMode: true });
$("entry").each((i, el) => {
  const title = $(el).find("title").text().trim();
  const abstract = $(el).find("summary").text().trim();
});

Rate Limits

arXiv asks for 3-second delays between requests. Be respectful.

I built an arXiv Paper Scraper with pagination — free on Apify (search knotless_cadence arxiv).

Top comments (0)

Subscribe