DEV Community

Lars Willighagen for Citation.js

Posted on • Originally published at larsgw.blogspot.com on

Creating a RIS mapping from the "original" specification

So a while ago I was looking around for the RIS specification again. I had not found it earlier, only a reference implementation from Zotero, a surprisingly complete list of tags and types on Wikipedia and some examples from various websites and programs exporting RIS files. They did not seem to mesh, however. There were some slight differences in tags here and there, and a bunch of useful tags listed by Wikipedia were labelled “degenerate” in the Zotero codebase, and only used for imports — implying some sort of problem.

What could be going on? Well, I checked out the references on the Wikipedia page again, to see if there really was no official specification or some other more reliable source where it got its information from. And, suddenly, there was an actual source this time. I do not know how I missed it earlier, but there was a page (archived) that linked to a zip file containing a PDF file with general specifications and an Excel file with sheets with property lists for all different types.

That sounded useful, so I spent waaayy to much time automating a script to turn those sheets — with a bunch of user input — into usable mappings for Citation.js. I just finished that today, apart from some… questionable mappings, but I wanted to at least test the final script with an example. As for the results, well, see for yourself. The example, from the Wikipedia page (CC-BY-SA 3.0 Unported) was

TY - JOUR
T1 - On computable numbers, with an application to the Entscheidungsproblem
A1 - Turing, Alan Mathison
JO - Proc. of London Mathematical Society
VL - 47
IS - 1
SP - 230
EP - 265
Y1 - 1937
ER -

and my results were

{
  "issue": 1,
  "page": 230,
  "type": "article-journal",
  "volume": 47
}

That looked really weird and disappointing. Again, what could possibly be going on here? Well, the example on Wikipedia is using T1, A1, JO and Y1 while the specs say to use TI, AU, T2 and PY here. Were are these differences coming from?

After some digging around on Wikipedia I found a comment saying that there are in fact to specifications: one from 2011 and one from before. The archived spec I checked out was from 2012 (as linked by Wikipedia!), while they use the version from before 2011; which luckily is still available. To be continued.

Top comments (0)