DEV Community

aohibbard
aohibbard

Posted on

Manual Implementation of JSONPath

A few weeks ago, I was given a live coding challenge to write a function that would parse a pathway of a JSON object. The question was effectively a manual implementation of JSONPath. For the purposes of this post, we will work with this object:

json_obj = {
    “hello”: [
        “world”,
        {“hello”: true},
        [3, 4, 5]
    ],
    “world”: {
        “from”: “Your Friend”
    }
}
Enter fullscreen mode Exit fullscreen mode

The idea is that if fed the json_obj and a path, one would return the following results:

get_json_value_by_path("$.hello[1].hello", json_obj) // true
get_json_value_by_path("$.world.from", json_obj) // “Your Friend”
get_json_value_by_path("$.hello[2][0]", json_obj) // 3
get_json_value_by_path("$.does.not[3].exist", json_obj) // “Key does not exist”
Enter fullscreen mode Exit fullscreen mode

I stumbled performing this in JavaScript, coming up with a very clumsy answer that relied on a lot of Googled Regex. But as it turns out, it is very easy to do in Python.

In working with the object in Python, we need to ensure that we are treating the object as a JSON object. This means calling on Python’s native json capabilities by writing import JSON at the top of the file. We then call on json.dumps to serialize the object into JSON format and then json.loads to support reading the JSON object in Python (this is similar to JSON.stringify and JSON.parse in JavaScript).

For me, the challenge here is working with the index values in bracket. For the second example, which should return “Your Friend,” the function is just a matter of splitting and parsing the file path string and interpolating them in a for loop. To do this, we would call on our JSON function as above, and then split the path name string and the period, pulling out the terms we need. Because all of the path names begin with a dollar sign, we can ignore the value at index zero and then interpolate the path keys as needed. This would look like:

def get_json_value_by_path(path, obj):
  json_dump = json.dumps(json_obj)
  data = json.loads(json_dump)
  paths = path.split('.')
  for i in range(1, len(paths)):
    data = data[str(paths[i])]
  print(data)

Enter fullscreen mode Exit fullscreen mode

Unfortunately, the index values are there. I’m fairly certain this is not the best solution, but my approach was to inspect every path value in the loop to see if contains numbers. To do this, we use the findall function in python and check each string to see if it includes any numbers. If it does, these numbers are placed in a list called res, and the existence of values in res determines how we proceed. If it has a length of zero, we proceed as above, if not, we iterate over each of the indices in res. I have also called upon a regex function to check if a value is a digit, which requires importing another library: re. Note that I have also introduced a value called key in this loop. We have to work with the root string in the path first before we attend to index values.

import re

def get_json_value_by_path(path, obj):
  json_dump = json.dumps(json_obj)
  data = json.loads(json_dump)
  paths = path.split('.')
  for i in range(1, len(paths)):
    res = re.findall(r'\d+', paths[i])
    #if there are index values in the string
    if len(res) > 0:
      key = paths[i].split('[')[0]
      data = data[str(key)]
      for i in range(len(res)):
        idx = int(res[i])
        data = data[idx]
    else: 
      data = data[str(paths[i])]
  print(data)

Enter fullscreen mode Exit fullscreen mode

The scalability of this solution is not great as the regex function assumes that all index values will be standalone, single value integers. We could easily modify this if needed by finding all numbers between brackets say, but it works for our test cases.

Finally, we need to account for a situation where we are fed a path where something does not exist. This requires a simple if statement. Because the JSON data is treated as a dictionary, we can use the get function to see if the key exists. It it does not, we simply break the loop and return the statement “Key does not exist.”

In total, our function looks like this. Not the prettiest, but functional:

import re
import json 

def get_json_value_by_path(path, obj):
  json_dump = json.dumps(json_obj)
  data = json.loads(json_dump)
  paths = path.split('.')
  for i in range(1, len(paths)):
    res = re.findall(r'\d+', paths[i])
    if len(res) > 0:
      key = paths[i].split('[')[0]
      if not data.get(str(key)):
        return "Key does not exist"
      data = data[str(key)]
      for i in range(len(res)):
        idx = int(res[i])
        data = data[idx]
    else: 
      if not data.get(str(paths[i])):
        return "Key does not exist"
      data = data[str(paths[i])]
  print(data)
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
yoganandam profile image
Yogananda Muthaiah

Nice but we get Undefined variable 'true' from json_obj.. do you know why ??