DEV Community

Алексей Спинов
Алексей Спинов

Posted on

arXiv API: How to Search Research Papers Programmatically

arXiv has a free Atom XML API for searching 2M+ research papers.

Search Papers

http://export.arxiv.org/api/query?search_query=all:large+language+models&sortBy=submittedDate&sortOrder=descending&max_results=10
Enter fullscreen mode Exit fullscreen mode

Browse by Category

http://export.arxiv.org/api/query?search_query=cat:cs.AI&max_results=25
Enter fullscreen mode Exit fullscreen mode

Categories include: cs.AI, cs.LG, cs.CL, cs.CV, physics, math, and 150+ more.

What You Get

  • Title and abstract
  • Authors list
  • Categories
  • Published and updated dates
  • PDF and HTML links
  • DOI and journal references

Parsing the XML

The API returns Atom XML. Use cheerio with xmlMode:

import * as cheerio from "cheerio";
const $ = cheerio.load(xml, { xmlMode: true });
$("entry").each((i, el) => {
  const title = $(el).find("title").text().trim();
  const abstract = $(el).find("summary").text().trim();
});
Enter fullscreen mode Exit fullscreen mode

Rate Limits

arXiv asks for 3-second delays between requests. Be respectful.

I built an arXiv Paper Scraper with pagination — free on Apify (search knotless_cadence arxiv).

Top comments (0)