loading...
Cover image for A Recursive Descent: Recreating JSON.parse

A Recursive Descent: Recreating JSON.parse

wpreble1 profile image Will Preble ・3 min read

My White Whale

Building an equivalent JSON.parse function has haunted me since I first encountered the problem alongside recreating JSON.strigify. With stringify, you're given a value which could be almost any data type, and it's your job to turn that value into a string of a specific format. That format, of course, is JSON. Using little more than typeof and isArray, you can sort values by data type, apply a format with string interpolation, deploy recursion for the nested structures, and soon enough you'll be returning a certified JSON string.

Starting JSON.parse felt like driving the wrong way down a one-way street. typeof won’t help you here. It will be our job to take the unwieldy beast that is a stringified JSON object and parse the appropriate JavaScript data contained within:

terrifying json

JSON Grammar & Mutual Recursion

The data contained within a JSON object can accommodate most data types in Javascript including objects, arrays, strings, numbers, true, false, and our favorite oddball, null. Functions and undefined data are excluded from the JSON format. We can rely on the fact that these data types will always be written in a predictable way because JSON follows a specific grammar.

array grammar

Looking at the grammar of an array, courtesy of json.org, we will see that it starts and ends with a square bracket and contains either ‘ws’ (white space) or ‘elements’. What are elements? Well, it could be a single ‘element’ or an ‘element’ separated from other ‘elements’ with a comma. By definition, this is semantic but illustrates an important point. Elements are separated from other elements with a comma. If there is a single element inside of an array, no comma. But wait, what is an element??

value grammar json

An element is a value with white space before and after. Since white space can be represented in JavaScript as an empty string, we can effectively ignore it’s presence here. So what is a value? A value can be an object, array, string… starting to sound familiar? That’s because an array allows nesting of other complex data. See the visual below representing the grammar of an array, also from json.org.

array grammar visual

Once we are able to parse our elements, we will need to call our primary parseJSON function to parse the values of the elements themselves. Here is how I structured my primary value parsing function:

parseJSON code

I'll cover regular expressions another time. The important thing to notice here is that simple data types can be returned as is, but arrays, objects, strings, and numbers all need to call another function. This act of “descending” into functions that will eventually call the function we started in is called “mutual recursion”. Hence the challenge of this exercise, coding a recursive descent parser.

The dirty work of parsing key-value pairs or escape characters in strings is done deep inside the parseObject or parseString functions, respectively. Here’s my code for parsing an array, all the way down to the bottom:

parseElements code

Helper Functions

I’m sure there is an elegant solution that traverses one character at a time through the entire JSON string, tracking the state of how many nested arrays or objects you’re inside of for a given index. This could operate at something close to a linear time complexity. I chose a different path, instead relying on helper functions to determine the state of a given index whenever I needed it. This adds some computing time but shortened the time it took me to code to a minimum viable product.

Here is a simple one that I named insideString:

insideString code

When do we need to know if we’re inside of a string? Let’s go back to our array example. Since elements inside of an array are separated by commas, we could parse our elements if we could locate the commas within our array. But what if it was an array of strings that contained commas? We only care about the commas separating the elements. The insideString helper function helps us locate only the commas we care about, i.e. those that are not within a string.

Thanks for reading! I found this project to be very challenging, and it is not for the faint of heart. If you decide to embark on this journey, beware the escape characters!

Posted on by:

wpreble1 profile

Will Preble

@wpreble1

Always learning...

Discussion

pic
Editor guide