DEV Community

Cover image for PMID to BibTeX - fixes the page-range and preprint-ID bugs
Sahil Kumar
Sahil Kumar

Posted on • Originally published at thelatexlab.com

PMID to BibTeX - fixes the page-range and preprint-ID bugs

Paste a PMID (numeric only - PMCIDs go through a different path; up to 50 batched), it hits Europe PMC's REST API instead of NCBI, specifically because NCBI's E-utilities don't send CORS headers, so a real browser-only tool can't talk to them directly. Europe PMC mirrors the same MEDLINE data and supports CORS.

Two real bugs fixed: NLM compresses page ranges, so "436-44" means pages 436 through 444, not 36 to 44 - most tools emit the literal string, technically valid but wrong under any bibstyle that doesn't expand it. This expands it to 436--444. Second, COVID-era papers that started on medRxiv/bioRxiv sometimes have the preprint ID itself sitting in the page field, like 2020.03.27.20044925 - passed through as pages that's a nonsense citation that still compiles. This detects multi-dot junk and drops the field instead.

Full journal title comes through (New England Journal of Medicine) rather than the NLM abbreviation (N Engl J Med), since most bibstyles expect the full form and abbreviate themselves. When Europe PMC also has a DOI, both are included so the entry resolves under either identifier. Same acronym brace-protection as the DOI tool. Europe PMC tags every record as a plain journal article - reviews, letters, editorials all come through the same - so there's a per-row type override.

One real limitation worth knowing: non-Latin author names (Chinese, Cyrillic, Arabic, Greek) pass through as UTF-8 in both dialects, since there's no LaTeX accent-macro equivalent - you need a Unicode-aware bibstyle or biber regardless of which converter you use.

Link: thelatexlab.com/pubmed-to-bibtex/

Top comments (0)