DEV Community

Cover image for JSON Parsing from Scratch in Kotlin
Alex Sokol
Alex Sokol

Posted on • Edited on

JSON Parsing from Scratch in Kotlin

Introduction

JSON has emerged as one of the most widely adopted file formats for data storage and transmission on the internet today. Originally linked to JavaScript, it has evolved into a language-independent format, finding support across various programming languages.

With a straightforward syntax specification comprising just 4 scalar and 2 composite data types, JSON serves as an excellent platform to delve into the fundamentals of parsing.

In this article, we will embark on a journey to build a JSON parser from scratch using Kotlin, exploring the core concepts along the way. Let's get started!

AST

In order to parse JSON in Kotlin, we need to understand and implement an Abstract Syntax Tree (AST) for JSON.

Abstract Syntax Tree
An Abstract Syntax Tree, or AST, is a tree representation of the source code of a computer program that conveys the structure of the source code. Each node in the tree represents a construct occurring in the source code.

First, we need to identify the different types of JSON values. JSON supports six types:

  1. Null
  2. Boolean (true/false)
  3. Number (integers and floating-point numbers)
  4. String
  5. Array (an ordered collection of JSON values)
  6. Object (a collection of key-value pairs)

For simplicity of the tutorial we will not support floating-point numbers and will only go with integers. Also, we will not support string escaping. However you are free to practice and implement it, since all the code is published to GitHub.

Next, we define classes to represent each of these JSON types:

sealed interface JsonNode
Enter fullscreen mode Exit fullscreen mode

That is the root type for all JSON values. Let's define primitives first:

sealed interface JsonPrimitive : JsonNode

object JsonNull : JsonPrimitive {
    override fun toString() = "JsonNull"
}

sealed interface JsonBoolean : JsonPrimitive {
    val boolean: Boolean
}

object JsonTrue : JsonBoolean {
    override val boolean = true
    override fun toString(): String = "JsonTrue"
}

object JsonFalse : JsonBoolean {
    override val boolean = false
    override fun toString(): String = "JsonFalse"
}

data class JsonInteger(val string: String) : JsonPrimitive {
    val byte: Byte get() = string.toByte()
    val short: Short get() = string.toShort()
    val int: Int get() = string.toInt()
    val long: Long get() = string.toLong()
}

data class JsonString(val string: String) : JsonPrimitive
Enter fullscreen mode Exit fullscreen mode

And finally complex data types:

data class JsonArray(val list: List<JsonNode>) : JsonNode

data class JsonObject(val map: Map<String, JsonNode>) : JsonNode
Enter fullscreen mode Exit fullscreen mode

Kotlin features used: sealed interfaces

We also overrode toString() method for the objects, so they will be pretty-printed.

To make the code more visually appealing, let's use the "Hierarchical View" in IntelliJ IDEA:

Image description

With the AST implementation complete, we can now proceed to define our basic parser type.

Parser

The simplest parser may look like this:

(String) -> Result<Pair<T, String>>
Enter fullscreen mode Exit fullscreen mode

It's a function that accepts String and returns either a Pair of T and String or Failure. T is a result of parsing, String is a remaining String that we then can pass to the next parsers and that's the way to combine them.

It'll be more convenient to have such function as interface:

fun interface Parser<out T> {
    fun parse(source: String): ParserResult<T>
}
Enter fullscreen mode Exit fullscreen mode

Kotlin features used: Functional (SAM) interfaces, Variance (in, out)

We also now need to define ParserResult class which is as simple as it is:

sealed interface ParserResult<out T> {
    data class Success<out T>(val value: T, val remaining: String) : ParserResult<T>
    // No detailed error-reporting for this simple parser
    object Failure : ParserResult<Nothing> {
        override fun toString(): String = "Failure"
    }
}
Enter fullscreen mode Exit fullscreen mode

Char Parser. First attempt

The easiest parser we can implement is a parser that requires a specific char to be present.

But first we need to introduce Consumer type which is basically a Parser<Unit>:

typealias Consumer = Parser<Unit>
Enter fullscreen mode Exit fullscreen mode

And here is charConsumer implementation that uses previously defined types:

fun charConsumer(char: Char): Consumer = Consumer { source ->
    if (source.startsWith()) {
        ParserResult.Success(
            value = Unit, 
            remaining = source.drop(n = 1)
        )
    } else {
        ParserResult.Failure
    }
}
Enter fullscreen mode Exit fullscreen mode

It's amazing how such a small parser takes up so much code. What can we do with it?

Parser DSL

Kotlin features used: DSLs

Even though types we defined are immutable and don't contain any state, their implementations, however, can contain one.

So to get rid of constant ParserResult.Success(value, remaining = ...) pieces of code we will extract them to a separate builder.

First, we introduce a function that will automatically create a parser based on the result of lambda:

fun <T> parser(block: (String) -> T): Parser<T> = Parser { source ->
    ParserResult.Success(block())
}
Enter fullscreen mode Exit fullscreen mode

Now we need to add an ability to exit from Parser with ParserResult.Failure. Let's add a special Exception for this:

private class ParserFailure : Throwable()

class ParserState {
    fun fail(): Nothing = throw ParserFailure()
}
Enter fullscreen mode Exit fullscreen mode

Now the builder-function will look like this:

fun <T> parser(block: ParserState.(String) -> T): Parser<T> = Parser { source ->
    try {
        val state = ParserState()
        ParserResult.Success(state.block())
    } catch (_: ParserFailure) {
        ParserResult.Failure
    }
}
Enter fullscreen mode Exit fullscreen mode

Kotlin features used: Function literals with receiver

That's better, we can now fail() inside builder. The only thing is that the code won't compile since ParserResult.Success requires two arguments and the second argument is the remaining string.

We can add it to ParserState:

class ParserState(var source: String) {
    fun fail(): Nothing = throw ParserFailure()
}
Enter fullscreen mode Exit fullscreen mode

And then make final improvements of the builder-function:

fun <T> parser(block: ParserState.(String) -> T): Parser<T> = Parser { source ->
    try {
        val state = ParserState()
        ParserResult.Success(
            value = state.block(), 
            remaining = state.source
        )
    } catch (_: ParserFailure) {
        ParserResult.Failure
    }
}
Enter fullscreen mode Exit fullscreen mode

Char Parser. Second attempt

Let's define a charConsumer function with the newest types:

fun charConsumer(char: Char): Consumer = parser { this: ParserState
    if (!source.startsWith(char)) fail()
    source = source.substring(1)
}
Enter fullscreen mode Exit fullscreen mode

By calling source = source.substring(1) we shift our string right by one, so the required char will be marked as handled and won't be passed to the next parsers.

However, this particular line may also be extracted to a more convenient function:

fun ParserState.discard(n: Int) {
    source = source.substring(n)
}
Enter fullscreen mode Exit fullscreen mode

Kotlin features used: Extension functions

And here is the parser:

fun ParserState.char(char: Char) {
    if (!source.startsWith(char)) fail()
    discard(n = 1)
}

fun charConsumer(char: Char): Consumer = parser { char(char) }
Enter fullscreen mode Exit fullscreen mode

Note that I intentionally made two functions, so our char parser can be used both as a separate parser and inside of parser block

Null Parser

Now it's very easy for us to implement null parser, but first I will create the same parser as above, but for String. This time it will be a lot faster:

fun ParserState.string(string: String) {
    if (!source.startsWith(string)) fail()
    discard(string.length)
}

fun stringConsumer(string: String): Consumer = parser { string(string) }
Enter fullscreen mode Exit fullscreen mode

Finally, our first parser for a real JSON value:

fun jsonNullParser() = parser {
    string("null")
    JsonNull
}
Enter fullscreen mode Exit fullscreen mode

Let's run that simple parser:

fun main() {
    val parser = jsonNullParser()
    println(parser.parse("null"))  // JsonNull
    println(parser.parse("nul"))   // JsonFailure
}
Enter fullscreen mode Exit fullscreen mode

Boolean Parser

Parsers for true and false are pretty easy:

fun trueParser() = parser {
    string("true")
    JsonTrue
}

fun falseParser() = parser {
    string("false")
    JsonFalse
}
Enter fullscreen mode Exit fullscreen mode

To finally make a boolean parser, we need somehow to combine those two parsers. We will write an extension-function for this:

fun <T> ParserState.any(parsers: List<Parser<T>>): T {
    for (parser in parsers) {
        when (val result = parser.parse(source)) {
            ParserResult.Success -> {
                source = result.remaining
                return result.value
            }
            ParserResult.Failure -> continue
        }
    }
    fail()
}
Enter fullscreen mode Exit fullscreen mode

Let's simplify that function by moving Parser.parser functionality to ParserState:

class ParserState(...) {
    ...
    fun <T> Parser<T>.parse(): T = 
        when (val result = parse(source)) {
            is ParserResult.Success -> {
                source = result.remaining
                result.value
            }
            is ParserResult.Failure -> fail()
        }

    // I intentionally used kotlin.Result here
    // because it has lot's of useful methods 
    // out-of-the-box
    fun <T> Parser<T>.tryParse(): Result<T> = 
        runCatching { parse() }
}
Enter fullscreen mode Exit fullscreen mode

And here is how our brand new any method will look like:

fun <T> ParserState.any(parsers: List<Parser<T>>): T {
    for (parser in parsers) {
        parser.tryParse().onSuccess { result -> 
            source = result.remaining
            return result.value
        }
    }
    fail()
}

fun <T> anyParser(parsers: List<Parser<T>>) = parser { any(parsers) }
Enter fullscreen mode Exit fullscreen mode

Now some helper overload functions:

fun <T> anyParser(vararg parsers: Parser<T>) = 
    anyParser(parsers.toList())
fun <T> ParserState.any(vararg parsers: Parser<T>) = 
    any(parsers.toList())
Enter fullscreen mode Exit fullscreen mode

And we can finish our boolean parser:

fun jsonBooleanParser() = anyParser(
    trueParser(),
    falseParser()
)

fun trueParser() = parser {
    string("true")
    JsonTrue
}

fun falseParser() = parser {
    string("true")
    JsonFalse
}
Enter fullscreen mode Exit fullscreen mode

Test it like this:

fun main() {
    val parser = jsonBooleanParser()
    println(parser.parse("true"))  // JsonTrue
    println(parser.parse("false"))   // JsonFalse
}
Enter fullscreen mode Exit fullscreen mode

Moreover, we can now combine our boolean and null parsers like this:

fun main() {
    val parser = anyParser(
        jsonBooleanParser(),
        jsonNullParser()
    )
    println(parser.parse("true"))  // JsonTrue
    println(parser.parse("false"))   // JsonFalse
    println(parser.parse("null"))   // JsonNull
    println(parser.parse("nul"))   // Failure
}
Enter fullscreen mode Exit fullscreen mode

Integer Parser

For that parser, we need to implement takeWhile function:

inline fun ParserState.takeWhile(
    predicate: (Char) -> String
): String {
    val string = source.takeWhile(predicate)
    discard(string.length)
    return string
}

fun takeWhileParser(predicate: (Char) -> String) = 
    parser { takeWhile(predicate) }
Enter fullscreen mode Exit fullscreen mode

Now, to the parser:

fun jsonIntegerParser() = parser {
    val minus = runCatching {
        char('-')
        "-" // return "-" string
    }.getOrElse { "" }

    val digits = takeWhile { it.isDigit() }

    JsonInteger("$minus$digits")
}
Enter fullscreen mode Exit fullscreen mode

Let's test it:

fun main() {
    val parser = anyParser(
        jsonBooleanParser(),
        jsonNullParser(),
        jsonIntegerParser()
    )
    println(parser.parse("-12345"))   // JsonInteger(-12345)
    println(parser.parse("true"))  // JsonTrue
    println(parser.parse("false"))   // JsonFalse
    println(parser.parse("null"))   // JsonNull
    println(parser.parse("nul"))   // Failure
}
Enter fullscreen mode Exit fullscreen mode

Works like a charm!

String Parser

This is so easy:

fun jsonStringParser() = parser {
    char('"')
    val string = takeWhile { it != '"' }
    char('"')
    JsonString(string)
}
Enter fullscreen mode Exit fullscreen mode

And of course the test:

fun main() {
    val parser = anyParser(
        jsonBooleanParser(),
        jsonNullParser(),
        jsonIntegerParser(),
        jsonStringParser()
    )
    println(parser.parse("\"String\"") // JsonString(String)
    println(parser.parse("-12345"))   // JsonInteger(-12345)
    println(parser.parse("true"))  // JsonTrue
    println(parser.parse("false"))   // JsonFalse
    println(parser.parse("null"))   // JsonNull
    println(parser.parse("nul"))   // Failure
}
Enter fullscreen mode Exit fullscreen mode

Primitive Parser

Before parsing next JSON data types, I think it's better to extract all parsers that we have for now into a separate one:

fun jsonPrimitiveParser() = anyParser(
    jsonBooleanParser(),
    jsonNullParser(),
    jsonIntegerParser(),
    jsonStringParser()
)
Enter fullscreen mode Exit fullscreen mode

Now we can parse any valid JSON primitive with that thing, let's move on!

Notice how we struggled with implementing even a simple JsonNull parser, but then we started moving fast. That's because we create reusable functions and for the next parsers we just compose them.

Array Parser

This will be a bit tricky. But to start with, we can at least parse bounds of the array:

fun jsonArrayParser() = parser {
    char('[')
    takeWhile { it.isWhitespace() }
    // ...
    takeWhile { it.isWhitespace() }
    char(']')
}
Enter fullscreen mode Exit fullscreen mode

Of course we make a separate function for whitespaces:

fun ParserState.whitespace() {
    takeWhile { it.isWhitespace() }
}

fun whitespaceConsumer(): Consumer = parser { whitespace() }
Enter fullscreen mode Exit fullscreen mode

Then the start of array parser will look like this:

fun jsonArrayParser() = parser {
    char('[')
    whitespace()
    // ...
    whitespace()
    char(']')
}
Enter fullscreen mode Exit fullscreen mode

What should we place inside? We don't know the exact amount of inner items, therefore we need a special function to parse many occurrences of one parser:

fun <T> ParserState.many(elementParser: Parser<T>): List<T> {
    val results = mutableListOf<T>()
    while (true) {
        elementParser.tryParse().onSuccess { value ->
            results += value
        }.onFailure {
            return results
        }
    }
}

fun <T> manyParser(elementParser: Parser<T>) = 
    parser { many(elementParser) }
Enter fullscreen mode Exit fullscreen mode

But we can't parse array items just yet. Because the items is not just being repeated, but they also are being separated by comma, so we also need a parser for separated values:

fun <T> ParserState.manySeparated(
    elementParser: Parser<T>,
    separatorConsumer: Consumer
): List<T> {
    // first, let's check if we have
    // any element present
    val first = elementParser.tryParse()
        .getOrElse { return emptyList() }

    // [ item|parsed|, item2|, item3|, item4 ]
    // we parsed the first item, so now we can
    // just parse many separator + parser
    val remaining = many(
        parser {
            separatorConsumer.parse()
            elementParser.parse()
        }
    )

    return listOf(first) + remaining
}

fun <T> manySeparatedParser(
    elementParser: Parser<T>,
    separatorConsumer: Consumer
) = parser { manySeparated(elementParser, separatorConsumer) }
Enter fullscreen mode Exit fullscreen mode

That's it, now we can finish Array parser:

fun jsonArrayParser() = parser {
    char('[')
    whitespace()
    val nodes = many(
        elementParser = jsonNodeParser(),
        separatorConsumer = commaConsumer()
    )
    whitespace()
    char(']')
    JsonArray(nodes)
}
Enter fullscreen mode Exit fullscreen mode

Let's define missing functions:

fun jsonNodeParser() = anyParser(
    jsonPrimitiveParser(),
    jsonArrayParser(),
    // jsonObjectParser() – future parser
)
Enter fullscreen mode Exit fullscreen mode

And another one:

fun ParserState.comma() {
    whitespace()
    char(',')
    whitespace()
}

fun commaConsumer(): Consumer = parser { comma() }
Enter fullscreen mode Exit fullscreen mode

Let's test the Array:

fun main() {
    val parser = jsonNodeParser()
    // JsonArray(list=[JsonInteger(1), JsonNull, JsonArray(list=[]), JsonString(test)])
    println(parser.parse("[1  ,  null,[],   \"test\"]")
    println(parser.parse("\"String\"") // JsonString(String)
    println(parser.parse("-12345"))   // JsonInteger(-12345)
    println(parser.parse("true"))  // JsonTrue
    println(parser.parse("false"))   // JsonFalse
    println(parser.parse("null"))   // JsonNull
    println(parser.parse("nul"))   // Failure
}
Enter fullscreen mode Exit fullscreen mode

Look how it works even with inner array, even with those messed spaces. That's amazing! We now left a little to go (to implement Object parser).

Object Parser

Without further ado, let's write the parser:

fun jsonObjectParser() = parser {
    char('{')
    whitespace()
    // ...
    whitespace()
    char('}')
}
Enter fullscreen mode Exit fullscreen mode

Inside of the json object will be many pairs, so let's first write a parser for pair:

fun jsonPairParser() = parser {
    val key = jsonStringParser().parse()
    whitespace()
    char(':')
    whitespace()
    val value = jsonNodeParser().parse()
    key.string to value
}
Enter fullscreen mode Exit fullscreen mode

And finishing up Object parser:

fun jsonObjectParser() = parser {
    char('{')
    whitespace()
    val pairs = manySeparated(
        elementParser = jsonPairParser(),
        separatorConsumer = commaConsumer()
    )
    whitespace()
    char('}')
    JsonObject(pairs.toMap())
}
Enter fullscreen mode Exit fullscreen mode

Final Parser

We need to uncomment jsonObjectParser() in jsonNodeParser function, so the final parser will look like this:

fun jsonNodeParser() = anyParser(
    jsonPrimitiveParser(),
    jsonArrayParser(),
    jsonObjectParser()
)
Enter fullscreen mode Exit fullscreen mode

That's it! Let's add some real-world test to this and see what it will output:

fun main() {
    val parser: Parser<JsonNode> = jsonNodeParser()

    // language=json
    val exampleJson = """
        {
            "glossary": {
                "title": "example glossary",
                "pages": 1,
                "description": null,
                "GlossDiv": {
                    "id": -1,
                    "title": "S",
                    "GlossList": {
                        "GlossEntry": {
                            "ID": "SGML",
                            "SortAs": "SGML",
                            "GlossTerm": "Standard Generalized Markup Language",
                            "Acronym": "SGML",
                            "Abbrev": "ISO 8879:1986",
                            "GlossDef": {
                                "para": "A meta-markup language, used to create markup languages such as DocBook.",
                                "GlossSeeAlso": [
                                    "GML",
                                    "XML"
                                ]
                            },
                            "GlossSee": "markup"
                        }
                    }
                }
            }
        }
    """.trimIndent()

    println(parser.parse(exampleJson))
}
Enter fullscreen mode Exit fullscreen mode

Output:

Success(
  value=JsonObject(map={
    glossary=JsonObject(map={
      title=JsonString(string=example glossary), 
      pages=JsonInteger(string=1), 
      description=JsonNull, 
      GlossDiv=JsonObject(map={
        id=JsonInteger(string=-1), 
        title=JsonString(string=S), 
        GlossList=JsonObject(map={
          GlossEntry=JsonObject(map={
            ID=JsonString(string=SGML), 
            SortAs=JsonString(string=SGML), 
            GlossTerm=JsonString(string=Standard Generalized Markup Language), 
            Acronym=JsonString(string=SGML), 
            Abbrev=JsonString(string=ISO 8879:1986), 
            GlossDef=JsonObject(map={
              para=JsonString(string=A meta-markup language, used to create markup languages such as DocBook.), 
              GlossSeeAlso=JsonArray(list=[
                JsonString(string=GML), 
                JsonString(string=XML)
              ])
            }), 
            GlossSee=JsonString(string=markup)
          })
        })
      })
    })
  }), 
  remaining=
)
Enter fullscreen mode Exit fullscreen mode

Conclusion

In that one article we covered some parsing basics and learned how to create a JSON parser in Kotlin completely from Scratch. Share your feedback in the comments, all the code is available at GitHub: https://github.com/y9san9/kotlin-simple-json/

By the way, it's my first article, so if you have any suggestions how I can improve it, consider to write about them in comments as well.

Happy coding!

Top comments (2)

Collapse
 
demn profile image
demn

awesome

Collapse
 
davidlev profile image
David Lev

Amazing article!