Introduction
JSON has emerged as one of the most widely adopted file formats for data storage and transmission on the internet today. Originally linked to JavaScript, it has evolved into a language-independent format, finding support across various programming languages.
With a straightforward syntax specification comprising just 4 scalar and 2 composite data types, JSON serves as an excellent platform to delve into the fundamentals of parsing.
In this article, we will embark on a journey to build a JSON parser from scratch using Kotlin, exploring the core concepts along the way. Let's get started!
AST
In order to parse JSON in Kotlin, we need to understand and implement an Abstract Syntax Tree (AST) for JSON.
Abstract Syntax Tree
An Abstract Syntax Tree, or AST, is a tree representation of the source code of a computer program that conveys the structure of the source code. Each node in the tree represents a construct occurring in the source code.
First, we need to identify the different types of JSON values. JSON supports six types:
- Null
- Boolean (true/false)
- Number (integers and floating-point numbers)
- String
- Array (an ordered collection of JSON values)
- Object (a collection of key-value pairs)
For simplicity of the tutorial we will not support floating-point numbers and will only go with integers. Also, we will not support string escaping. However you are free to practice and implement it, since all the code is published to GitHub.
Next, we define classes to represent each of these JSON types:
sealed interface JsonNode
That is the root type for all JSON values. Let's define primitives first:
sealed interface JsonPrimitive : JsonNode
object JsonNull : JsonPrimitive {
override fun toString() = "JsonNull"
}
sealed interface JsonBoolean : JsonPrimitive {
val boolean: Boolean
}
object JsonTrue : JsonBoolean {
override val boolean = true
override fun toString(): String = "JsonTrue"
}
object JsonFalse : JsonBoolean {
override val boolean = false
override fun toString(): String = "JsonFalse"
}
data class JsonInteger(val string: String) : JsonPrimitive {
val byte: Byte get() = string.toByte()
val short: Short get() = string.toShort()
val int: Int get() = string.toInt()
val long: Long get() = string.toLong()
}
data class JsonString(val string: String) : JsonPrimitive
And finally complex data types:
data class JsonArray(val list: List<JsonNode>) : JsonNode
data class JsonObject(val map: Map<String, JsonNode>) : JsonNode
Kotlin features used: sealed interfaces
We also overrode toString()
method for the objects, so they will be pretty-printed.
To make the code more visually appealing, let's use the "Hierarchical View" in IntelliJ IDEA:
With the AST implementation complete, we can now proceed to define our basic parser type.
Parser
The simplest parser may look like this:
(String) -> Result<Pair<T, String>>
It's a function that accepts String and returns either a Pair of T
and String or Failure. T is a result of parsing, String is a remaining String that we then can pass to the next parsers and that's the way to combine them.
It'll be more convenient to have such function as interface:
fun interface Parser<out T> {
fun parse(source: String): ParserResult<T>
}
Kotlin features used: Functional (SAM) interfaces, Variance (in, out)
We also now need to define ParserResult
class which is as simple as it is:
sealed interface ParserResult<out T> {
data class Success<out T>(val value: T, val remaining: String) : ParserResult<T>
// No detailed error-reporting for this simple parser
object Failure : ParserResult<Nothing> {
override fun toString(): String = "Failure"
}
}
Char Parser. First attempt
The easiest parser we can implement is a parser that requires a specific char to be present.
But first we need to introduce Consumer
type which is basically a Parser<Unit>
:
typealias Consumer = Parser<Unit>
And here is charConsumer implementation that uses previously defined types:
fun charConsumer(char: Char): Consumer = Consumer { source ->
if (source.startsWith()) {
ParserResult.Success(
value = Unit,
remaining = source.drop(n = 1)
)
} else {
ParserResult.Failure
}
}
It's amazing how such a small parser takes up so much code. What can we do with it?
Parser DSL
Kotlin features used: DSLs
Even though types we defined are immutable and don't contain any state, their implementations, however, can contain one.
So to get rid of constant ParserResult.Success(value, remaining = ...)
pieces of code we will extract them to a separate builder.
First, we introduce a function that will automatically create a parser based on the result of lambda:
fun <T> parser(block: (String) -> T): Parser<T> = Parser { source ->
ParserResult.Success(block())
}
Now we need to add an ability to exit from Parser with ParserResult.Failure
. Let's add a special Exception for this:
private class ParserFailure : Throwable()
class ParserState {
fun fail(): Nothing = throw ParserFailure()
}
Now the builder-function will look like this:
fun <T> parser(block: ParserState.(String) -> T): Parser<T> = Parser { source ->
try {
val state = ParserState()
ParserResult.Success(state.block())
} catch (_: ParserFailure) {
ParserResult.Failure
}
}
Kotlin features used: Function literals with receiver
That's better, we can now fail()
inside builder. The only thing is that the code won't compile since ParserResult.Success
requires two arguments and the second argument is the remaining
string.
We can add it to ParserState
:
class ParserState(var source: String) {
fun fail(): Nothing = throw ParserFailure()
}
And then make final improvements of the builder-function:
fun <T> parser(block: ParserState.(String) -> T): Parser<T> = Parser { source ->
try {
val state = ParserState()
ParserResult.Success(
value = state.block(),
remaining = state.source
)
} catch (_: ParserFailure) {
ParserResult.Failure
}
}
Char Parser. Second attempt
Let's define a charConsumer function with the newest types:
fun charConsumer(char: Char): Consumer = parser { this: ParserState
if (!source.startsWith(char)) fail()
source = source.substring(1)
}
By calling source = source.substring(1)
we shift our string right by one, so the required char will be marked as handled and won't be passed to the next parsers.
However, this particular line may also be extracted to a more convenient function:
fun ParserState.discard(n: Int) {
source = source.substring(n)
}
Kotlin features used: Extension functions
And here is the parser:
fun ParserState.char(char: Char) {
if (!source.startsWith(char)) fail()
discard(n = 1)
}
fun charConsumer(char: Char): Consumer = parser { char(char) }
Note that I intentionally made two functions, so our char
parser can be used both as a separate parser and inside of parser
block
Null Parser
Now it's very easy for us to implement null parser, but first I will create the same parser as above, but for String. This time it will be a lot faster:
fun ParserState.string(string: String) {
if (!source.startsWith(string)) fail()
discard(string.length)
}
fun stringConsumer(string: String): Consumer = parser { string(string) }
Finally, our first parser for a real JSON value:
fun jsonNullParser() = parser {
string("null")
JsonNull
}
Let's run that simple parser:
fun main() {
val parser = jsonNullParser()
println(parser.parse("null")) // JsonNull
println(parser.parse("nul")) // JsonFailure
}
Boolean Parser
Parsers for true
and false
are pretty easy:
fun trueParser() = parser {
string("true")
JsonTrue
}
fun falseParser() = parser {
string("false")
JsonFalse
}
To finally make a boolean parser, we need somehow to combine those two parsers. We will write an extension-function for this:
fun <T> ParserState.any(parsers: List<Parser<T>>): T {
for (parser in parsers) {
when (val result = parser.parse(source)) {
ParserResult.Success -> {
source = result.remaining
return result.value
}
ParserResult.Failure -> continue
}
}
fail()
}
Let's simplify that function by moving Parser.parser
functionality to ParserState:
class ParserState(...) {
...
fun <T> Parser<T>.parse(): T =
when (val result = parse(source)) {
is ParserResult.Success -> {
source = result.remaining
result.value
}
is ParserResult.Failure -> fail()
}
// I intentionally used kotlin.Result here
// because it has lot's of useful methods
// out-of-the-box
fun <T> Parser<T>.tryParse(): Result<T> =
runCatching { parse() }
}
And here is how our brand new any
method will look like:
fun <T> ParserState.any(parsers: List<Parser<T>>): T {
for (parser in parsers) {
parser.tryParse().onSuccess { result ->
source = result.remaining
return result.value
}
}
fail()
}
fun <T> anyParser(parsers: List<Parser<T>>) = parser { any(parsers) }
Now some helper overload functions:
fun <T> anyParser(vararg parsers: Parser<T>) =
anyParser(parsers.toList())
fun <T> ParserState.any(vararg parsers: Parser<T>) =
any(parsers.toList())
And we can finish our boolean parser:
fun jsonBooleanParser() = anyParser(
trueParser(),
falseParser()
)
fun trueParser() = parser {
string("true")
JsonTrue
}
fun falseParser() = parser {
string("true")
JsonFalse
}
Test it like this:
fun main() {
val parser = jsonBooleanParser()
println(parser.parse("true")) // JsonTrue
println(parser.parse("false")) // JsonFalse
}
Moreover, we can now combine our boolean and null parsers like this:
fun main() {
val parser = anyParser(
jsonBooleanParser(),
jsonNullParser()
)
println(parser.parse("true")) // JsonTrue
println(parser.parse("false")) // JsonFalse
println(parser.parse("null")) // JsonNull
println(parser.parse("nul")) // Failure
}
Integer Parser
For that parser, we need to implement takeWhile
function:
inline fun ParserState.takeWhile(
predicate: (Char) -> String
): String {
val string = source.takeWhile(predicate)
discard(string.length)
return string
}
fun takeWhileParser(predicate: (Char) -> String) =
parser { takeWhile(predicate) }
Now, to the parser:
fun jsonIntegerParser() = parser {
val minus = runCatching {
char('-')
"-" // return "-" string
}.getOrElse { "" }
val digits = takeWhile { it.isDigit() }
JsonInteger("$minus$digits")
}
Let's test it:
fun main() {
val parser = anyParser(
jsonBooleanParser(),
jsonNullParser(),
jsonIntegerParser()
)
println(parser.parse("-12345")) // JsonInteger(-12345)
println(parser.parse("true")) // JsonTrue
println(parser.parse("false")) // JsonFalse
println(parser.parse("null")) // JsonNull
println(parser.parse("nul")) // Failure
}
Works like a charm!
String Parser
This is so easy:
fun jsonStringParser() = parser {
char('"')
val string = takeWhile { it != '"' }
char('"')
JsonString(string)
}
And of course the test:
fun main() {
val parser = anyParser(
jsonBooleanParser(),
jsonNullParser(),
jsonIntegerParser(),
jsonStringParser()
)
println(parser.parse("\"String\"") // JsonString(String)
println(parser.parse("-12345")) // JsonInteger(-12345)
println(parser.parse("true")) // JsonTrue
println(parser.parse("false")) // JsonFalse
println(parser.parse("null")) // JsonNull
println(parser.parse("nul")) // Failure
}
Primitive Parser
Before parsing next JSON data types, I think it's better to extract all parsers that we have for now into a separate one:
fun jsonPrimitiveParser() = anyParser(
jsonBooleanParser(),
jsonNullParser(),
jsonIntegerParser(),
jsonStringParser()
)
Now we can parse any valid JSON primitive with that thing, let's move on!
Notice how we struggled with implementing even a simple JsonNull
parser, but then we started moving fast. That's because we create reusable functions and for the next parsers we just compose them.
Array Parser
This will be a bit tricky. But to start with, we can at least parse bounds of the array:
fun jsonArrayParser() = parser {
char('[')
takeWhile { it.isWhitespace() }
// ...
takeWhile { it.isWhitespace() }
char(']')
}
Of course we make a separate function for whitespaces:
fun ParserState.whitespace() {
takeWhile { it.isWhitespace() }
}
fun whitespaceConsumer(): Consumer = parser { whitespace() }
Then the start of array parser will look like this:
fun jsonArrayParser() = parser {
char('[')
whitespace()
// ...
whitespace()
char(']')
}
What should we place inside? We don't know the exact amount of inner items, therefore we need a special function to parse many occurrences of one parser:
fun <T> ParserState.many(elementParser: Parser<T>): List<T> {
val results = mutableListOf<T>()
while (true) {
elementParser.tryParse().onSuccess { value ->
results += value
}.onFailure {
return results
}
}
}
fun <T> manyParser(elementParser: Parser<T>) =
parser { many(elementParser) }
But we can't parse array items just yet. Because the items is not just being repeated, but they also are being separated by comma, so we also need a parser for separated values:
fun <T> ParserState.manySeparated(
elementParser: Parser<T>,
separatorConsumer: Consumer
): List<T> {
// first, let's check if we have
// any element present
val first = elementParser.tryParse()
.getOrElse { return emptyList() }
// [ item|parsed|, item2|, item3|, item4 ]
// we parsed the first item, so now we can
// just parse many separator + parser
val remaining = many(
parser {
separatorConsumer.parse()
elementParser.parse()
}
)
return listOf(first) + remaining
}
fun <T> manySeparatedParser(
elementParser: Parser<T>,
separatorConsumer: Consumer
) = parser { manySeparated(elementParser, separatorConsumer) }
That's it, now we can finish Array parser:
fun jsonArrayParser() = parser {
char('[')
whitespace()
val nodes = many(
elementParser = jsonNodeParser(),
separatorConsumer = commaConsumer()
)
whitespace()
char(']')
JsonArray(nodes)
}
Let's define missing functions:
fun jsonNodeParser() = anyParser(
jsonPrimitiveParser(),
jsonArrayParser(),
// jsonObjectParser() – future parser
)
And another one:
fun ParserState.comma() {
whitespace()
char(',')
whitespace()
}
fun commaConsumer(): Consumer = parser { comma() }
Let's test the Array:
fun main() {
val parser = jsonNodeParser()
// JsonArray(list=[JsonInteger(1), JsonNull, JsonArray(list=[]), JsonString(test)])
println(parser.parse("[1 , null,[], \"test\"]")
println(parser.parse("\"String\"") // JsonString(String)
println(parser.parse("-12345")) // JsonInteger(-12345)
println(parser.parse("true")) // JsonTrue
println(parser.parse("false")) // JsonFalse
println(parser.parse("null")) // JsonNull
println(parser.parse("nul")) // Failure
}
Look how it works even with inner array, even with those messed spaces. That's amazing! We now left a little to go (to implement Object parser).
Object Parser
Without further ado, let's write the parser:
fun jsonObjectParser() = parser {
char('{')
whitespace()
// ...
whitespace()
char('}')
}
Inside of the json object will be many
pairs, so let's first write a parser for pair:
fun jsonPairParser() = parser {
val key = jsonStringParser().parse()
whitespace()
char(':')
whitespace()
val value = jsonNodeParser().parse()
key.string to value
}
And finishing up Object parser:
fun jsonObjectParser() = parser {
char('{')
whitespace()
val pairs = manySeparated(
elementParser = jsonPairParser(),
separatorConsumer = commaConsumer()
)
whitespace()
char('}')
JsonObject(pairs.toMap())
}
Final Parser
We need to uncomment jsonObjectParser()
in jsonNodeParser
function, so the final parser will look like this:
fun jsonNodeParser() = anyParser(
jsonPrimitiveParser(),
jsonArrayParser(),
jsonObjectParser()
)
That's it! Let's add some real-world test to this and see what it will output:
fun main() {
val parser: Parser<JsonNode> = jsonNodeParser()
// language=json
val exampleJson = """
{
"glossary": {
"title": "example glossary",
"pages": 1,
"description": null,
"GlossDiv": {
"id": -1,
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": [
"GML",
"XML"
]
},
"GlossSee": "markup"
}
}
}
}
}
""".trimIndent()
println(parser.parse(exampleJson))
}
Output:
Success(
value=JsonObject(map={
glossary=JsonObject(map={
title=JsonString(string=example glossary),
pages=JsonInteger(string=1),
description=JsonNull,
GlossDiv=JsonObject(map={
id=JsonInteger(string=-1),
title=JsonString(string=S),
GlossList=JsonObject(map={
GlossEntry=JsonObject(map={
ID=JsonString(string=SGML),
SortAs=JsonString(string=SGML),
GlossTerm=JsonString(string=Standard Generalized Markup Language),
Acronym=JsonString(string=SGML),
Abbrev=JsonString(string=ISO 8879:1986),
GlossDef=JsonObject(map={
para=JsonString(string=A meta-markup language, used to create markup languages such as DocBook.),
GlossSeeAlso=JsonArray(list=[
JsonString(string=GML),
JsonString(string=XML)
])
}),
GlossSee=JsonString(string=markup)
})
})
})
})
}),
remaining=
)
Conclusion
In that one article we covered some parsing basics and learned how to create a JSON parser in Kotlin completely from Scratch. Share your feedback in the comments, all the code is available at GitHub: https://github.com/y9san9/kotlin-simple-json/
By the way, it's my first article, so if you have any suggestions how I can improve it, consider to write about them in comments as well.
Happy coding!
Top comments (2)
awesome
Amazing article!