Ahmed Hany Gamal

Posted on Apr 24

Implementing a JSON Schema Validator from Scratch - Week 9

#jsonschema #showdev #typescript #learning

This was a pretty interesting week, it was more about what I didn't do rather than what I did do.

What I Did

I started this week with the goal of implementing the reference keywords, more specifically $id and $anchor.
Unlike most of the keywords I've worked on lately, this required some serious foundational changes.

Schema Registration

In order to reference schemas, the schemas need to be saved somewhere, so a schema registry was added, as well as a couple of functions for interacting with that schema registry (registerSchema, unregisterSchema, etc.).

Schema Compilation

Keywords like $id and $anchor don't work during the evaluation process, but rather they're supposed to work before the evaluation process ever starts, as a result, a schema compilation process (also known as schema loading and schema pre-processing) is introduced, where the schema is traversed, keywords like $id and $anchor are searched for, and the schema registry is updated based on their values.

This schema compilation process is done each time a new schema is registered.

New Dependency

Implementing $id and $anchor require some URI operations that must follow RFC 3986, so instead of implementing it by hand, I just used the @hyperjump/uri library, which saved me a lot of time, energy, and potential bugs.

What I Didn't Do

At the beginning of this week, I attempted to write my code in a highly maintainable and extendable way, since architecture has been a key focus since the very beginning, but after some struggling, I decided it's better to not do that here.

What I Tried To Do

I tried to write the system in a way where each draft not only has its keyword handlers and phases, but also its compilation keywords and their handlers.
So draft 2020-12 for example would have two keywords here, with $id and $anchor as the keyword names, and their actual implementation mapped to those names.

Whichever draft is being used would have its personal keyword names and implementations for its compilation keywords, since each draft is different, both in keyword name (earlier drafts had id while newer drafts have $id) and behavior (there are subtle behavioral differences between the id/$id keywords in each draft).

The Problem

Implementing this in a decoupled way is surprisingly complex, since unlike the validation keywords, compilation keywords update the state of the schema registry, which is a part of the Validator object, so I'd need to pass to the handler function everything needed to do the required processing and also update the schema registry.

Also, different keywords update the state in different ways. So for example, $anchor just adds new entries to the schema registry, while $id adds to the schema registry, but also affects the behavior of any $anchor or $id that follows it (as its value is used to resolve relative URIs).
And since keyword handlers have separate functions, there would have to be a shared state/variable between them so that keywords like $id and $anchor can work correctly.

Another issue is that different drafts have different names for keywords, so I can't just say to the function that traverses the schema "look for a keyword named $id and use the draft's handler for that keyword", since it's called id in earlier drafts, so I'd also need the draft to know which keyword is responsible for schema identification.

To put it simply, compilation keywords are stateful and order-dependent, which increases the difficulty and complexity of abstraction.

My Rationale

Even if I can do all that, I don't believe it's worth the trouble.
It adds too much complexity for something that I just won't need. Extensibility and adding drafts easily was an important goal from the start of this project, but I feel like this is just too much effort fixing a problem that I'll never really deal with.

So in the end, I chose to just couple the compilation process with the Validator object, if I do find myself facing problems in the future because of this, I'll just fix it then, but I doubt that'll happen.

Conclusion

Not a lot of code was written this week, but I've learned a lot, and I've gained more experience in avoiding "tunnel vision" and trying too hard to fix problems that should just be ignored (which is a problem that I've faced a lot over the years).
I'm quite satisfied with this outcome.

As always, the code can be found on GitHub

DEV Community