Parsing Bible References with Elm

monty5811 profile image monty5811 Originally published at deanmontgomery.com ・4 min read

This was originally posted at deanmontgomery.com.

This post will give a brief intro into how to write a simple parser in Elm.
The thing we will be parsing are Bible references: a Bible reference is a shorthand that let's one quickly look up a specific verse or range of verses in the text of the Bible.

Sounds simple...

What's a reference look like?

A Bible reference can be broken down into a start location and an end location. Each location consists of a book, a chapter and a verse.

So a reference might look like Genesis 1:1 - Exodus 2:1 which tells us to start at the first verse of the first chapter of Genesis and end at the first verse of the second chapter of Exodus.

But the reference might also look like any of these:

  • Genesis 1 - A whole chapter
  • Genesis 1:1 - A single verse
  • Genesis 1:1-20
  • Genesis 1:20-2:24
  • Genesis 1-5 - Multiple whole chapters
  • Genesis 1 - Exodus 5
  • Genesis 1:1 - Exodus 5:20
  • Genesis 1:1 - Exodus 5
  • Genesis 1 - Exodus 5:20

Additionally, some books of the Bible only have a single chapter (e.g. Jude) and, by convention, the chapter number is dropped from the reference. So Jude 2 is the second verse of the first (and only) chapter of Jude, not all of Jude chapter 2.

We'll aim to handle all of these cases when we write our parser.

How do we do parsing in Elm?

elm/parser is a super nice parsing library written by the creator and maintainer of Elm. I won't go into details on it here - there is a nice tutorial and conference talk if you want to dig deeper.

We will parse the Bible reference in two steps:

  1. We parse the string into a list of statements
  2. Then we validate the list of statements to check it is a valid reference

Getting a list of statements

A Bible reference can have a space, a colon, a hypen, a book name and a number, so we define:

type Statement
    = BookName Book
    | Num Int
    | Dash
    | Colon

and a parser to turn a string into a list of statements:

{-| A `List Statement` parser. We use `P.loop` to consume the whole string
parser : P.Parser (List Statement)
parser =
    P.loop [] statementsHelp

statementsHelp : List Statement -> P.Parser (P.Step (List Statement) (List Statement))
statementsHelp revStmts =
        [ P.succeed (\stmt -> P.Loop (stmt :: revStmts))
            |. P.spaces
            |= statement
            |. P.spaces
        , P.succeed ()
            |> P.map (\_ -> P.Done (List.reverse revStmts))

{-| A `Statement` parser
statement : P.Parser Statement
statement =
        [ P.map BookName (P.oneOf bookTokensList)
        , P.map (\_ -> Dash) (P.symbol "-")
        , P.map (\_ -> Colon) (P.symbol ":")
        , P.map Num P.int

With this parser we can now turn a string into a List Statement:

parse : String -> Result String (List Statement)
parse str =
    P.run parser str

Validating the list of statements

Now we will either have a list of statements, like [Book Genesis, Colon, Num 1] or [Book John, Colon, Num 2, Dash, Num 2], etc. But there is nothing to guarantee that we have a valid collection of statements. For example, we could have [Colon, Colon, Colon] which is obviously not valid, or [Book Genesis, Num 52] which appears to be valid, but Genesis only has 50 books - so it is invalid.

First we will define a Reference type:

type alias Reference =
    { startBook : Book
    , startChapter : Int
    , startVerse : Int
    , endBook : Book
    , endChapter : Int
    , endVerse : Int

And a function processStatements : List Statement -> Result String Reference that will validate our list of statements. This function is rather large to account for all the potential formats available and to handle single chapter books, but the function is essentially a case statement:

processStatementsHelp : List Statement -> Result String Reference
processStatementsHelp stmts =
    case stmts of
        -- Gen
        [ BookName bk ] ->
                (numChapters bk)
                (numVerses bk (numChapters bk))

        -- Gen 1
        [ BookName bk, Num ch ] ->
            if numChapters bk == 1 then

                    (numVerses bk 1)

    -- truncated for brevity (full function can be seen: https://github.com/monty5811/elm-bible/blob/2.0.0/src/Internal/Parser.elm#L38-L243)

        -- Genesis - Revelation
        [ BookName startBk, Dash, BookName endBk ] ->
                (numChapters endBk)
                (numVerses endBk (numChapters endBk))

        [] ->
            Err "No reference found"

        _ ->
            Err <| "No valid reference found"

Now we have a Reference that contains a start book, start chapter, start verse, end chapter and end verse but we haven't checked that all of these are in order (e.g. the reference cannot end before it starts) so we use one last function to validate the reference:

validateRef : Reference -> Result String Reference
validateRef ref =
    validateBookOrder ref
        |> Result.andThen validateChapterOrder
        |> Result.andThen validateVerseOrder
        |> Result.andThen validateChapterBounds
        |> Result.andThen validateVerseBounds

-- see each validate function here: https://github.com/monty5811/elm-bible/blob/2.0.0/src/Internal/Parser.elm#L363

Finally! We have a validated Bible reference!

Note I think it should be possible to move all of this validation inside the parser and do everything in one step, but I think this is a cleaner approach.


This post has shown you how to create a parser in Elm so we can validate Bible references. Hopefully this will help you get started building a parser.

If you don't care about building a parser and just want an elm package to do this for you, then check out monty5811/elm-bible that provides a parser, nice formatting and a compact encoder/decoder.


Build Status

Parse and format Bible references in Elm.


  • Parse a reference from a string
  • Nicely format a reference to a string
  • Convert a reference to an encoded representation for sorting/comparing/storage

The following reference formats can be parsed:

  • Genesis 1
  • Genesis 1:1
  • Genesis 1:1-20
  • Genesis 1:20-2:24
  • Genesis 1-5
  • Genesis 1 - Exodus 5
  • Genesis 1:1 - Exodus 5:20
  • Genesis 1:1 - Exodus 5
  • Genesis 1 - Exodus 5:20


(fromString "Gen 1:1" |> Result.map format)
     == Ok "Genesis 1:1" 

(fromString "Gen 1:1 - Rev 5") |> Result.map format)
    == Ok "Genesis 1:1 - Revelation 5:14" 

(fromString "Gen 1:1 - Rev 5") |> Result.map encode) 
    == Ok {start = 1001001, end = 66005014}


Contributions welcome, please open an issue to get started.


markdown guide

Nicely done, next steps might be to add all the book abbreviations: logos.com/bible-book-abbreviations




The bookTokenList function that I left out for brevity already has some of those (github.com/monty5811/elm-bible/blo...).

But I'm definitely missing some - thanks for the link!