DEV Community

Adolfo Reyna
Adolfo Reyna

Posted on

Using DOI tags as References with Pandoc

Pandoc is a powerful text conversion tool that allows to write scientific documents completely in Markdown, and to be transform in properly formatted pdfs, web document, Latex or even Docx files.
With the use of filters, pandoc is able to extend the Markdown capabilities to reference in text previously published works, and to make use of figure, equations, table, etc. numbering and inside references as well.

I personally have found two filters very useful: Pandoc-citeproc and Pandoc-crossref.
Citeproc is a filter that looks for references in the text with the form @referencetag and format them with the indicated style (like APA or IEEE) in text and in the reference block at the end of the document.
In the other hand, the crossref filter give us a proper way of inserting equations, figures, tables and listings (code blocks), in such ways that they're automatically and properly numbered and referenced through custom tags.

Pandoc-citeproc requires to be pointed to a biblatex1 file which contains the information of the works cited, and optionally a csl file that determines the reference style used. These both can be indicated directly in the Markdown file using the YAML block in the beginning of the document as follows:

paper.md

--------
title: Pandoc doi2bib filter
bibliography: library.bib
csl: csl/apa.csl
--------

The author of [@Fausto2019] has mentioned this issue before.

# Refereces:

~~~

and the .bib file should contain the information of the reference such as:

~~~biblatex
@article{@Fausto2019,
    doi = {10.1007/s10462-018-09676-2},
    url = {https://doi.org/10.1007%2Fs10462-018-09676-2},
    year = 2019,
    month = {jan},
    publisher = {Springer Nature},
    author = {Fernando Fausto and Adolfo Reyna-Orta and Erik Cuevas and {\'{A}}ngel G. Andrade and Marco Perez-Cisneros},
    title = {From ants to whales: metaheuristics for all tastes},
    journal = {Artificial Intelligence Review}
}
~~~

Then the command `pandoc -s paper.md --filter pandoc-citeproc -t html` return the converted text from markdown to html with the references included:

~~~html
<!--Partial Result:-->
<body>
<header id="title-block-header">
<h1 class="title">Pandoc doi2bib filter</h1>
<p class="date">2019-07-01 10:35:49</p>
</header>
<p>The author of <span class="citation" data-cites="Fausto2019">(Fausto et al. 2019)</span> has mentioned this issue before.</p>
<h1 id="refereces" class="unnumbered">Refereces:</h1>
<div id="refs" class="references" role="doc-bibliography">
<div id="ref-Fausto2019">
<p>Fausto, Fernando, Adolfo Reyna-Orta, Erik Cuevas, ├üngel G. Andrade, and Marco Perez-Cisneros. 2019. ÔÇ£From Ants to Whales: Metaheuristics for All Tastes.ÔÇØ <em>Artificial Intelligence Review</em>, January. Springer Nature. <a href="https://doi.org/10.1007/s10462-018-09676-2">https://doi.org/10.1007/s10462-018-09676-2</a>.</p>
</div>
</div>
</body>
~~~

In this way, it is fairly simple to write and manage the document and presented in the required format for collages to collaborate or to be submitted for publication (most likely in Latex).

However, with this framework the creation and maintaining of the references file (.bib) and the referring tags of the cited works is left to be done manually or by third parties, such as reference managers like Zotero or Mendeley.

Due to the fact that most recent publications make use of the digital object identifier (DOI)[^2], it is possible to use this index as the citation tag in our documents.
By doing so, it is warranted that all citations reference to a unique document, different to usual tags on which an author could potentially have several publications for each year.
This also open the window for further automatization, as there is reliable web services that offers the citation information of any given DOI, such as [https://dx.doi.org/]().

This concept give birth to a new pandoc filter called [doi2bib](https://github.com/aeroreyna/pandoc-doi2bib).
This filter make use of specified bibliography file (only .bib) in the YAML configuration, it search for all references with the format @DOI:XXX.XXX\XXXXXXX and updates the this file accordingly.
This means that any new reference is automatically added using the reliable information offered in the correct format by _doi.org_.

This tool offers the following benefits:

 - The specified file can be an empty file, previously existed .bib filed or not existent.

 - Only newly references required to be downloaded, therefore it does not add significantly time of compilation.

 - Several document can share this .bib file, or use a global one for all your documents.

 - If all your reference uses this format, a new file with only the current citations in order of citation can be generated simply by changing the specified bibliography file in the document.

To make use of this filter, just download the last build from the [Github](https://github.com/aeroreyna/pandoc-doi2bib) and paste it in the same Path of your pandoc executable.
Then this can be implemented using the command `pandoc -s paper.md --filter pandoc-doi2bib --filter pandoc-citeproc -o paper.pdf`

~~~markdonw
paper.md

--------
title: Pandoc doi2bib filter
bibliography: library.bib
csl: csl/apa.csl
--------

The author of [@DOI:10.1007/s10462-018-09676-2] has mentioned this issue before.

# Refereces:

~~~

which results in:

![](https://thepracticaldev.s3.amazonaws.com/i/otd6iy7ftb5dkr0sjdaj.png)

I´m using this framework for my thesis and prospect publications, so I hope it might helps others as well.


Enter fullscreen mode Exit fullscreen mode

  1. Others file types like bibtex, json, or yaml can be used as well. 

Latest comments (1)

Collapse
 
aymhtnk profile image
Ayuha

Thanks for the useful post.
Now, a similar concept filter (doi2cite) is available in the official Pandoc filter repository (github.com/pandoc/lua-filters).
Users can use Lua filters without additional environmental settings.