loading...

JSON Parser pt2: Arrays

tttaaannnggg profile image tang Updated on ・9 min read

Where we last left off, we had a JSON parser that worked for primitives

function JSONParser(jstring){
  if(jstring[0] === '"') return jstring.slice(1, jstring.length-1);
  if(jstring[0] === 't') return true;
  if(jstring[0] === 'f') return false;
  if(jstring[0] === 'u') return undefined;
  if(jstring[0] === 'n') return null;
  if(jstring.charCodeAt() >= 48 && jstring.charCodeAt() <= 57) return Number(jstring); 
}

That's pretty simple, right? The thing is, our process gets a lot hairier when we start to consider composite datatypes. Arrays and objects can be nested arbitrarily deep in one another, such that we could have a JSON string that looks something like this:

{
  "a":12,
  "b":[1,2,3,{"c": false},[[[]]]],
  "d":{"hi":undefined},
  "e":{"f":[{},{},{}]},
  "g":"g"
}

It's a bit harder to tell where a string starts and ends, or how to distinguish between keys and values, or where arrays and objects start or end. We'll have some tricks to handle this.

Empty Arrays

First, we can identify an array by the [. Our functionality here will end up being a little more complicated, so we'll write a separate method and call it on our JSON string to resolve it.

function JSONParser(jstring){
  if(jstring[0] === '"') return jstring.slice(1, jstring.length-1);
  if(jstring[0] === 't') return true;
  if(jstring[0] === 'f') return false;
  if(jstring[0] === 'u') return undefined;
  if(jstring[0] === 'n') return null;
  if(jstring.charCodeAt() >= 48 && jstring.charCodeAt() <= 57) return Number(jstring); 
  if(jstring[0] === '[') return parseArray(jstring)
}

function parseArray(jstring){
  const output = [];
  return output;
}

Single-item Arrays

we'll focus on parseArray now. One key feature of arrays is that they can hold items in them. Let's look at a situation where we'll have one item in our array.

JSON.stringify([1]) // returns '[1]'

So, the first and last character are left square bracket and right square bracket, and the value in the middle is just a primitive. We've already written a function that can handle primitives, so why not just invoke that?

//...cont from JSONParser above
function parseArray(jstring){
  const output = [];
  if(jstring.length > 2){
    const valueStr = jstring.slice(1, jstring.length-1)
    const value = JSONParser(valueStr)
    output.push(value)
  }
  return output;
}

So, this will handle an array with a single item in it. We're taking our string, removing the first and last chars (which will be [ and ] respectively), then sending the resulting value back through the JSON parser, so that we can identify it as whatever datatype and push its value to the array.

One exciting side-effect of this is that, now that we can expect to return 0 item and single-item arrays in our JSONParser, it will actually work for nested arrays, such as [[[]]], or [[[[1]]]]! Congrats- you just learned why recursion is cool.

Not bad, but not good enough. We'll need to deal with multi-item arrays. Let's keep building and thinking through our approach.

Multi-item arrays

Now that we know how to deal with single item arrays, we just need to know how to identify and delimit each item in our array, such that we can apply our JSONParser to each of them.

The question is, what separates items in an array? The answer is commas. We can find the characters that sit between commas, slice them off, and feed them back into the JSONParser function in order to return data, then push them to our output array.

//...cont from JSONParser above
function parseArray(jstring){
  const output = [];
  if(jstring.length > 2){
    const valueStr = jstring.slice(1, jstring.length-1)
    let start = 0;
    for(let i = 0; i <= valuesStr.length; i++){
      if(valueStr[i] === ',' || i === valuesStr.length){
        const curVal = JSONParser(valuesStr.slice(start, i));
        output.push(curVal);
        start = i+1;
      }
    }  
  }
  return output;
}

The strategy here has some flaws, but here's the breakdown of how it works so far. In order to parse a stringified value, we need to know where the value starts and where it ends. In this case, we use a variable start to track the start of a value, and we figure that we're at its end when we hit a , or when we hit the end of the string. It'll work for JSON strings like the following:

'["abc","def","ghi"]'

'[123,[],null]'

Can you guess what they have in common? Can you guess what I'm excluding from these examples? The truth is that we only have part of the answer.

Now, consider a string such as this: "Mary waved, then said hello". We've got a comma in there! '["Mary waved, then said hello", "John waved, then said goodbye"]' would parse out to ["Mary wave", undefined, "John wave", undefined], which is not what we want. We'd run into similar problems with nested arrays, or nested objects. Thus, the question of where a stringified value ends is a very important one, and a much more complicated one than we may have expected. There are certain datatypes where we'll need to treat characters simply as characters, rather than as special markers.

Sometimes a character is just a character

Let's talk about what situations we'll have where we're likely to run into commas that do not delimit a separate item in our array.

  • inside of strings
  • inside of nested arrays
  • inside of nested objects

All of these situations have something in common: They're marked by special characters. A string starts with ", an array starts with [, and an object starts with {. So, let's create something to figure out if we're entering one. We can use an object to keep track of what's what.

const openings = {
  '"': true,
  '[': true,
  '{': true
}

This syntax can be a little bit unclear, so I'll tell you why I'm doing it. If I want to check if a character is an "opening" character- that is to say, that it starts something and that I should treat its contents a little differently, I can simply use the conditional, if (openings[char]) to do so. If I access openings with any other keys, it'll evaluate as undefined, which will be falsey, and thus avoid triggering my conditional.

So, if (openings[char]), we know that a thing has started. But how do we know where it ends? I took inspiration from Javascript's call stack in devising a solution. In other words, I figured I'd build a stack.

Stack it Up

As we hit opening characters, we'll push them to the stack, and as we hit closing characters, we'll pop items off of the stack. Once we've completely cleared the stack, we'll know that we've hit the last character of an item, and can parse that string as a value. This is a bit complicated, but I"ll walk you through the process.

We'll use an array as a stack.

const stack = []

Let's have another look at our problem array, for instance.

'["Mary waved, then said hello", "John waved, then said goodbye"]'

The first thing we've done is that we shaved off [ and ]. That leaves us with '"Mary waved, then said hello","John waved, then said goodbye"'.

The first character we see in our JSON string is ", so we'll push it to our stack.

['"'] //value of stack

Now that our stack has a double quote in it, we know that we have to disregard commas until we run into a matching double quote, since they'll all be part of the same string. Eventually, we reach the end of Hello", and see the matching double quote. At this point, we can pop our value from the stack, and JSONParse("Mary waved, then said hello"), which is a substring starting at the opening of our quote, and closing at the end of our quote.

Quotes are easy because they use the same character for opening and closing, so we can just check if (stack[stack.length-1] === '"'). With square and curly brackets, though, we'll have to check for a matching pair. Now, we can modify openings to be key/value pairs corresponding to opening/closing braces, and push the corresponding brace/quote to our stack, so as to make comparisons easier.

//...cont from JSONParser above

const openings = {
  '"': '"',
  '[': ']',
  '{': '}'
};

const stack = [];

function parseArray(jstring){
  const output = [];
  if(jstring.length < 3) return output; //small refactor to reduce nesting conditionals
  const valueStr = jstring.slice(1, jstring.length-1)
  let start = 0;
  for(let i = 0; i <= valueStr.length; i++){
    if(stack[stack.length-1] === valueStr[i]){
      stack.pop(); //pop needs to come first to ensure that we're not pushing our '"' to the stack if we've already got a '"' sitting there.
    } else if(openings[valueStr[i]]){
      stack.push(openings[valueStr[i]]);
    }
    if (!stack.length && valueStr[i] === ',' || i === valueStr.length) {
      const curVal = JSONParser(valueStr.slice(start, i));
      start = i+1;
    }
  }
  return output;
}

Now, our parseArray is contingent on a few things:

  • are we opening a string/object/array?
  • are we in the middle of a string/object/array?
  • have we closed out our string/object/array and hit a comma?

if we hit all of those conditions, we'll just parse our value, then push it to our array, and finally return our array. We haven't written any functionality in JSONParser to handle objects yet, so those will return undefined.

There's one last bit of functionality to add to it, though. Escape characters may exist inside of strings. For instance, '\"' is a valid string, and should not cause us to push '"' to the stack, or pop it if it's already there. We could get some unpleasant behavior with unbalanced brackets or quotes if we don't account for arrays like this: ["\"", "\]"]

There are two pieces of logic that comprise our functionality here. Since we're using stack as our gatekeeper to determine if we should be looking at characters as their own values, or part of a larger value, we'll simply avail ourselves of the stack to skip a character.

The first piece of logic is that we'll push "\" to the stack if that's our current character. The second piece of logic is that we'll pop it if it's the last thing that was pushed to our stack and skip to the next character. We actually need to do this in reverse order, because we can escape a backslash. If we have a string of "\\a", we want to skip the second \, and not skip the a.

All in all, our function looks like this now:

function JSONParser(jstring){
  if(jstring[0] === '"') return jstring.slice(1, jstring.length-1);
  if(jstring[0] === 't') return true;
  if(jstring[0] === 'f') return false;
  if(jstring[0] === 'u') return undefined;
  if(jstring[0] === 'n') return null;
  if(jstring.charCodeAt() >= 48 && jstring.charCodeAt() <= 57) return Number(jstring);
  if(jstring[0] === '[') return parseArray(jstring);
}

const openings = {
  '"': '"',
  '[': ']',
  '{': '}'
};

const stack = [];

function parseArray(jstring){
  const output = [];
  if(jstring.length < 3) return output;
  const valueStr = jstring.slice(1, jstring.length-1)
  let start = 0;
  for(let i = 0; i <= valueStr.length; i++){
// PLEASE NOTE: all instances of '\\ ' should actually be '\\'
// Dev.to's syntax highlighting doesn't appropriately account for the fact that the second backslash is escaped by the first.
    if(stack[stack.length-1] === '\\ '){ 
      stack.pop();
      continue;
    } else if(valueStr[i] === '\\ '){
      stack.push('\\ ');
    }
    if(stack[stack.length-1] === valueStr[i]){
      stack.pop();
    } else if(openings[valueStr[i]]){
      stack.push(openings[valueStr[i]]);
    }
    if (!stack.length && valueStr[i] === ',' || i === valueStr.length) {
      const curVal = JSONParser(valueStr.slice(start, i));
      output.push(curVal);
      start = i+1;
    }
  }
  return output;
}

As mentioned before, we're recursively calling JSONParser and parseArray, which allows our function to handle arbitrary depths of nesting (at least until we hit a stack overflow). Our final task is to add a method to handle objects, and, if we design it effectively, it'll cover whatever gaps are left.

Addendum

There is actually an issue with the array parser. I tried a test case of [["a","b"],["["],[]] and got back [["a","b"],['["],']] instead.

What was happening was that we weren't correctly avoiding situations where we had brackets nested inside of a string.

The way that I fixed this is a bit ugly, but it works. Essentially, before pushing or popping anything from our stack, we should check if the last thing on our stack was " so that we can make sure not to push or pop anything from our stack, unless we've found our matching " and know that we're out of the string.

function parseArray(jstring){
  const output = [];
  if(jstring.length < 3) return output;
  const valueStr = jstring.slice(1, jstring.length-1)
  let start = 0;
  for(let i = 0; i <= valueStr.length; i++){
// PLEASE NOTE: all instances of '\\ ' should actually be '\\'
// Dev.to's syntax highlighting doesn't appropriately account for the fact that the second backslash is escaped by the first.
    if(stack[stack.length-1] === '\\ '){ 
      stack.pop();
      continue;
    } else if(valueStr[i] === '\\ '){
      stack.push('\\ ');
    }
    if(stack[stack.length-1] === valueStr[i] && stack[stack.length-1] !== '"' || 
      stack[stack.length-1] === valueStr[i] && valueStr[i] === '"'){
      stack.pop();
    } else if(openings[valueStr[i]] && stack[stack.length-1] !== '"'){
      stack.push(openings[valueStr[i]]);
    }
    if (!stack.length && valueStr[i] === ',' || i === valueStr.length) {
      const curVal = JSONParser(valueStr.slice(start, i));
      output.push(curVal);
      start = i+1;
    }
  }
  return output;
}

finally, we'll deal with objects

Posted on by:

tttaaannnggg profile

tang

@tttaaannnggg

"Poets do not go mad; but chess-players do. Mathematicians go mad, and cashiers; but creative artists very seldom." -GK Chesterton

Discussion

pic
Editor guide