DEV Community

UponTheSky
UponTheSky

Posted on

[C++] Making a Simple JSON Parser from Scratch

Introduction

As I was working on my personal C++ exercise(it’s intended to be shown as a half-portfolio), I had to make a fixture file for the hyper-parameters that the program will use. However, not like those languages such as Node.js or Python that have own STLs for the JSON format, C++ has none of them, and it is somewhat tricky to add a third party library to C++.

Therefore, I decided to make my own C++ JSON parser, albeit simpler than the usual parsers that are used for production code. I thought it would be another very good exercise for my C++ skills.

General Designs and Strategies

Remember that I don’t want to make this into another big project, so I would like to keep it as simple as possible. In details,

  • I won’t use any arrays.
  • I will assume that the file contains a valid JSON value, as handling invalid values introduces too many edge cases.
  • I will also assume that the JSON file starts and ends with curly brackets({, })
  • There should be only three data types for my need: strings(which I will parse using std::string), integers, and doubles. There will be no negative values for these numeric types.

So what should our parser strategies be like?

  • We use ParseJson() as a main parsing function. The function must be recursive for handling nested json values.
  • We recognize each key using double quote marks(), and semicolon(:). Values either start with {(nested values) or consist of plain numeric characters and dot(.)(either integer or double).
  • There will be only two nonnegative primitive types: double and int.
  • Due to the nested values, defining the type of the return values is a little difficult. Here, our approach is to adopt union type in C++:


// json_parser.h

union JsonValue {
  int i;
  double d;
  std::map<std::string, JsonValue>* json;  
};


Enter fullscreen mode Exit fullscreen mode

Note that because we use pointers for the recursiveness, we have the type std::map<std::string, JsonValue>* as a part of the union type here. Also, since the value is determined on runtime, we can’t use templates here and instead resort to union.

Now, let’s dive into the code.

Implementation

The main function ParseJson() has its logical flow as follows, which has only two steps.

  1. Read the fixture JSON file given the path of the file.
  2. Parse the JSON text data into a std::map object .
  3. We will use additional helper functions that are called recursively for handling nested values.
  4. For the base cases of the recursion, we only need to parse either int or double type value.

1. Read files

First, our parser requires an I/O system: we read text from the JSON file as an input, and as an output we return JsonValue data to the caller function. To achieve this, we define ReadFile() that reads a text file and returns a std::string object.



// json_parser.cpp

void JsonParser::ReadFile(const std::string& filepath, std::string& output) {
  std::ifstream file(filepath);
  std::string line;

  while (std::getline(file, line)) {
    output.append(line); // append() copies the argument passed as a reference(std::string&)
  }
}


Enter fullscreen mode Exit fullscreen mode

2. Parse values

Next, we define ParsePrimitive() which will parse only the primitive(either double or int) values.



// json_parser.cpp

// Remark: test_it is defined in json_parser.h, and it is an
// alias of std::string::iterator

JsonValue JsonParser::ParsePrimitive(const std::string& text, text_it start, text_it end) {
  std::string substr = text.substr(start - text.begin(), end - start);
  size_t float_point_index = substr.find(".");

  if (float_point_index >= (end - start)) { // integer
    return {.i = std::stoi(substr)};
  } else { // float(double)
    return {.d = std::stod(substr) };
  }
}


Enter fullscreen mode Exit fullscreen mode

Then we implement the main logic, ParseJsonHelper() that takes care of the recursiveness of this parsing process.



// json_parser.cpp

JsonValue JsonParser::ParseJsonHelper(const std::string& text, text_it& it) {
  assert(*it == '{'); // must start with the left curly bracket
  it++;

  std::map<std::string, JsonValue>* json_map = new std::map<std::string, JsonValue>;

  do {
    const auto [key, value] = RetriveKeyValuePair(text, it);
    (*json_map)[key] = value;

    while (*it == ' ' || *it == '\n') {
      it++;
    }
  } while (*it != '}');

  it++; // after '}'

  return { .json = json_map };
}


Enter fullscreen mode Exit fullscreen mode

But to retrieve the key-value pairs inside ParseJsonHelper(), we will separate the retrieval logic into another helper function RetrieveKeyValuePair(), which returns a (key, value) pair.



// json_parser.cpp

std::pair<std::string, JsonValue> JsonParser::RetriveKeyValuePair(
  const std::string& text,
  text_it& it
) {
  assert(it != text.end());

  // ignore white spaces & line breaks
  while (*it == ' ' || *it == '\n') {
    it++;
  }

  text_it curr_it;
  std::string key;
  JsonValue value;
  // if hit a double quote for the first time, it is a key
  if (*it == '\"') {
    curr_it = ++it;
    while (*it != '\"') {
      it++;
    }

    key = text.substr(curr_it - text.begin(), it - curr_it);
    assert(*(++it) == ':'); // assert that we are parsing the key string
    it++;
  }

  // now we need to have its corresponding value
  while (*it == ' ' || *it == '\n') {
    it++;
  }

  if (*it == '{') {
    // another json format
    value = ParseJsonHelper(text, it);
  } else {
    // primitive value(double or int)
    curr_it = it;
    while (isdigit(*it) || *it == '.') {
      it++;
    }
    value = ParsePrimitive(text, curr_it, it);
  }

  // after parsing the value, check whether the current iterator points to a comma
  if (*it == ',') {
    it++;
  }

  return std::make_pair(key, value);
}


Enter fullscreen mode Exit fullscreen mode

(So note that these two helper functions ParseJsonHelper() and RetrieveKeyValuePair() call each other recursively.)

Since our algorithm is basically the DFS algorithm, we pass the iterator of the text value(which is from the raw JSON text file) as a reference to both ParseJsonHelper() and RetrieveKeyValuePair(). This allows each call stack can track at which the function is called for the given text value.

3. The main parsing function

Finally, ParseJson() brings the functions implemented so far all together to parse a single JSON file.



// json_parser.cpp

JsonValue JsonParser::ParseJson(const std::string& filepath) {
  // 1. read the text data from the given file
  std::string text;
  ReadFile(filepath, text);

  // 2. parse the text with the helper function and return
  text_it start = text.begin();
  return ParseJsonHelper(text, start);
}


Enter fullscreen mode Exit fullscreen mode

4. Final code

Since it would take up too much space to show the entire code, I would like to leave links for the final code:

Testing

It would be great if we do some unit testings for this function. We use GoogleTest here.

For example, if we test a simple JSON text like this:



{
  "one": 1,
  "two": {
    "three": 3,
    "four": {
      "five": 5
    }
  },
  "six": {
    "seven": 7
  }
}


Enter fullscreen mode Exit fullscreen mode

The test code will be like this:



TEST(JsonParserTest, TestJsonWithNests) {
  std::string json_text = "{\n \"one\": 1,\n \"two\": {\n\"three\": 3, \n \"four\": { \n\"five\": 5 \n } \n }, \n \"six\": {\n\"seven\": 7\n } \n}";
  text_it start = json_text.begin();
  JsonValue parsed = ParseJsonHelper(json_text, start);

  JsonValue two = (*parsed.json)["two"];
  JsonValue four = (*two.json)["four"];
  JsonValue six = (*parsed.json)["six"];

  EXPECT_EQ((*parsed.json)["one"].i, 1);
  EXPECT_EQ((*two.json)["three"].i, 3);
  EXPECT_EQ((*four.json)["five"].i, 5);
  EXPECT_EQ((*six.json)["seven"].i, 7);
};


Enter fullscreen mode Exit fullscreen mode

Now see the result:

Image description

Conclusion

This is actually the first time I ever write a tool for my own use(never done anything like this before with other languages like Python or Node.js which I have substantially more experiences with). Still, I am not sure if my design decision or the way I implemented was right(or even close to the best practices in C++), however, I have learned a lot from this very simple project.

  • I made a program from scratch, without any help from external resources like tutorials or any third-party libraries and frameworks(except for the GoogleTest).
  • This program is for my actual use, not as a part of a homework assignment.
  • I used a few C++ STLs and implemented a fairly reasonable(at least it looks like to me!) recursive algorithm.

I must say this project is not a good one. I should have had more experience in C++, read more books and open source code, and practiced more algorithms, to make my project as perfect as I can.

Nevertheless, just sitting down and writing code is more important, in my humble opinion. You learn much more from your hands-on experiences than simply reading books and do some Leetcode questions.

Top comments (0)