Ahmed Hany Gamal

Posted on Feb 13

Implementing a JSON Schema Validator from Scratch - Week 4

#jsonschema #showdev #typescript #learning

This week was... challenging, but in a good way.
I didn't really implement any keywords (unless you count the makeshift type implementation from last week), but I did build the majority of the foundation for this project, and even though some parts took way more than what I'd expected, I feel like most of my issues from last week have been fixed.

Issue Resolution

If you want a more detailed explanation of each of the upcoming issues, you can check out last week's blog.

Typescript Strictness

This mostly bothered me when I was working on the types for each of the system components, and even though it took a lot of time and effort to finish, it's done now, and it's actually made working on other parts of the system easier, and to be clear, when I say that this part is "done", I don't mean that I will never go near it again, I probably will, but it shouldn't be anything drastic, but rather just minor tweaks.

The difficulty with this part was to write the types in a way that is both flexible enough for what it's needed for, whilst also being strict enough to actually support the project structure.

Building the Foundation

At this point, I've finished most of the project's foundation, giving me something to work with. I'm no longer writing code in an empty file, but rather, I have something solid that I can attach new code to.

Also, most of the foundational/architectural parts of the code are finished, so I don't need to keep thinking of "how might this make me miserable in the future" as much while writing code, which was pretty exhausting. Most of that exhaustion comes from uncertainty, you reach a point where you're unsure if you're future-proofing your code, or if you're just paranoid.

What I Did This Week

Most of what I did last week was architecture related, so I didn't really implement any keywords, but I did lay the foundation that would help me implement keywords more efficiently, with less potential bugs, and less difficulty.

JSON Pointers

Initially, I planned on just using an npm package for JSON Pointers, but most of the packages that I found had JSON Pointer classes with much more features and details than what I needed. I just wanted a class to track JSON Pointers, no need for JSON construction, JSON navigation, and all of these things that I found in the json-pointer, jsonpointer, and @hyperjump/json-pointer packages. So, I decided to just implement it myself.

This class took way more time than expected, it took over a full day of work just to finish the base class (which I spent even more time adding some extra methods to, but more on that in a bit).

One of the reasons this took more time than expected was the fact that it turned out I didn't fully understand JSON Pointers.
I understood how to escape and unescape, how a JSON Pointer worked, how it navigated a schema/instance, but I was a bit confused with the purpose of escaping and unescaping.
Initially, I escaped any input to the JSON Pointer object, and unescaped any output from it, but then I realized that it was the other way around, and then I realized that I shouldn't do that for every single input/output method, but rather just the ones that deal with JSON Pointer strings.
The reason for this is because the purpose of escaping and unescaping is to deal with string representations of JSON Pointers, and string representations are expected to use ~0 and ~1 to represent ~ and / respectively, so you unescape to do that conversion when creating a JSON Pointer object using a JSON Pointer string so that the object can structure the JSON Pointer correctly, and you escape when you output the string representation of that JSON Pointer object so that you have a correctly formatted JSON Pointer string, and you only ever escape and unescape when dealing with JSON Pointer strings, so if you directly push or pop a segment, you shouldn't escape or unescape.

Also, after I fully understood how JSON Pointers worked and finished working on the classes, I realized that I need to add some extra methods (the fork and reconstruct methods), since those methods would make the rest of my code easier to write and decrease the likelihood of me adding bugs that would potentially take days of debugging to fix (e.g. having a hidden bug because I forgot to pop a segment at the end of a random recursive run in a random keyword implementation).

Even though this took way more time than I expected, it actually deepened my understanding of how JSON Pointers work and gave me a clear mental model of what I need to think of and do when working on foundational parts of a system, and that's the main purpose of this project, learning.

Output Format

This is an interesting one.
After reading the specs and trying to implement the logic and structure for the Output result, I was unable to actually work on it, since some parts were unclear in the specs, so I asked the JSON Schema maintainers on slack, and I was given a blog post by Greg Dennis, who is one of the maintainers at JSON Schema, and the main person who worked on the Output Formatting section in the specs.
In that blog post, he said that there were problems with how Output results were structured, and that he had made an updated version of it that fixes those problems.

The new Output Format is just amazing, it's just super simple, clear, and pragmatic. I'd advise anyone interested in JSON Schema to take a look at it.

Also, after some back and forth with one of the maintainers, I decided to ditch the detailed output format in favor of the basic one. The detailed output format probably wouldn't be that hard to do, but the basic one is just trivial, and it's better to save as much time and energy as I can, since I still have a lot of things to do.

Visitor Architecture

I explained how the Visitor Architecture worked in last week's blog, so I won't re-explain it here, but I will talk about what I did and what I plan on doing.

The Visitor Architecture has 2 primary components that interact with one another, those being the ValidationContext and the individual keyword implementations.

The plan is to keep the individual keyword implementations "dumb" and give the ValidationContext pretty much all of the power. The reason for this is separation of concerns, so that I'm less likely to mess up when working on the keyword implementations (e.g. making a mistake when trying to save an error object in one of the keyword implementations, so the code for saving error objects would be in a method inside ValidationContext and saving errors in keyword implementations would only be done using that method).

The ValidationContext is the last part of this project's foundation that still needs a good amount of work.

Future Thoughts

These are some things that are on my mind relating to upcoming tasks in this project.

Testing

I'm definitely planning on using the official JSON Schema Test Suite for this project, no use having a bunch of code if it doesn't actually do what it's supposed to, and I can't be sure it does unless I test it.
But I'm planning on doing this when I actually have something tangible to test, so I'll probably wait until I finish implementing a couple of good keywords before I start testing my code.

Final Thoughts

When I started writing the code for this project, I was seriously overwhelmed, mostly because I just felt completely lost, but right now, that feeling is completely gone.
If I had to guess, I'd say things are only going to get more exciting from now on.

The code can be found on GitHub

DEV Community