Implementing a JSON Schema Validator from Scratch - Week 2

#jsonschema #showdev #typescript #learning

After two weeks of reading, I've finally finished the JSON Schema specifications (specifically the Core and Validation specs).

At this point, I have a pretty good idea of what a JSON Schema Validator should look like, and I have a couple of things to say about the specs.

My thoughts on the specs

Overall, I think the people who wrote the specs did a great job, pretty much every minute detail of the system was mentioned. How it should work, how the implementation should deal with certain cases, what the received schema should look like and what to do if it doesn't, etc.

I did however face some difficulties reading the specs, this could be a result of a problem with the specs, or a skill issue on my part, since this is my first time going through a specification.

The difficulties I faced in certain parts of the specs were mostly a result of ambiguity of the designated audience and lack of clarity.

ambiguity of the designated audience

Basically, there are three different entities that the specs address, those being:

Schema Authors: The people that write schemas to use the validator
Validator Implementers: The people that implement/write the code for the validator
Specification Extenders: The people that write their own custom keywords and/or vocabularies

My problem is that the specs don't explicitly tell you which one it's addressing, not only that, but the same paragraph, even the same sentence, could be talking to one of them at the start, and another one at the end.

The only way to actually know who is being addressed is to fully understand what is being said, and that's really hard to do if you don't understand anything and it's your first time reading a JSON Schema specification.

I imagine this won't be as prominent of a problem as it was if I read another JSON Schema specification (e.g. draft 2019-09), but it surely was a pain to deal with it this time.

That being said, I honestly can't blame the people that wrote the specification. It would be really unpleasant to be reading a sentence, only to be interrupted twice just so that the specs could explicitly tell you that, yes, this sentence that you're reading is in fact talking to you, especially if you're already familiar with the specs and know that from the start.

lack of clarity

This was a much bigger problem for me.

The majority of the specs were incredibly detailed about everything, but certain parts were a bit unclear to me, which resulted in me either being unable to understand a topic (e.g. lexical vs dynamic scopes), or worse, misunderstanding the topic (e.g. meta-schemas - which I went into in detail in last week's post - or the anyOf example from chapter 11). I just wish the people writing the specs would write a bit more to clarify what they really mean, and add one or two examples. Reading the specs would have been a much easier endeavor had that been the case.

Again, I want to clarify that all of the issues and problems that I've mentioned could very well be a skill issue on my part, and in any case, the specs were overall great and the authors did an amazing job.

Also, I'd like to shout out Google's notebookLM. I tried using multiple tools to help me understand the specs, and notebookLM was the absolute best.

Implementation scope

The specifications state that some keywords and features are mandatory to implement, while others are optional. I'll state here what I'm planning to do.

initial implementation

I will only support draft 2020-12
I will NOT support the $vocabulary keyword
I will only support the detailed output format
I will NOT support short-circuiting
I will only support the annotation functionality of the format keyword
I will NOT support dereferencing of JSON pointers that use the schema's parent or ancestor base URI (as per section 9.2.1 of the Core specs)
I will NOT support remote schema fetching

The idea here is that I want to lower the difficulty as much as possible in the initial implementation, since this is my first time implementing a fully spec compliant software system, which I imagine would take a good amount of time and effort to finish.

implementation extensions

When I do have a basic fully functional validator, I may choose to add any of the following:

another draft: in order to design an architecturally resilient validator, it shouldn't be incredibly difficult to add a new draft to the already existing system
short-circuiting: though I'd need to make sure I'd do it in a way that complies with the specs
format keyword assertions: if I see any learning value from doing this or find that it could be fun, I could do it

I'll be posting weekly updates on my journey here.
The code can be found on GitHub