DEV Community

Cover image for Implementing a JSON Schema Validator from Scratch - Week 3
Ahmed Hany Gamal
Ahmed Hany Gamal

Posted on

Implementing a JSON Schema Validator from Scratch - Week 3

I was pretty busy this week, so I didn't have time to do much, but I did make some foundational decisions and faced some minor issues which I'll talk about.

Difficulties I Faced

The issues I faced weren't that big, but they were pretty annoying in the moment.

Typescript Strictness

This is my first time using Typescript, and going into it, I expected it to be pretty simple, just Javascript with static types, but I was surprised that I needed to predetermine the static structure of the objects as well, as opposed to just saying defining the variable as an object and treating that object the same as in Javascript.

This was pretty annoying since I was trying to write the foundational part of my project, and that's pretty hard to do when every variable you write has red squiggly lines under it, which forces you to spend time and energy on writing the types for each variable, all that while you're in the middle of writing the project's foundation.

But apart from that, I actually think this would make the development experience more enjoyable later on, I was just surprised with how I had to pay time and effort for the types upfront.

Building The Foundation

As with any project, the beginning is relatively hard compared to the rest, because

  1. There's nothing to attach your code to or extend, it's just an empty file, and it's hard to determine where to start and how
  2. It's pretty daunting, since you have a broad image of how this system should look like, and all of the things that you need to do, which makes the project look very intimidating
  3. You need to constantly shift your perspective between the broad view of the architect and the narrow view of the implementer. Later on, most of the architectural decisions are already made, so you just view things as an implementer
  4. Your decisions now will have consequences later, and changing them then would be incredibly difficult

but after being a bit lost at the beginning, everything just clicked when I did some more research (I'll talk about this in detail in the next section).

Foundational Decisions

From very early on, before I even started reading the specs, I was told by the maintainers at JSON Schema to focus on the architecture of the system, and to design the system in a way that would facilitate adding new JSON Schema drafts.

I thought about it for a bit, and came up with an idea to have objects (let's call them keyword handler objects) that map the names of the keywords to functions containing the implementation for those keywords, and have an object for each supported draft.

I started implementing, and soon realized that I didn't know what to do in some parts, like sure, I have a general idea of how some things would be done, and I wrote the code for a draft 2020-12 keyword handler object, but then I froze, I didn't know what to write in the functions, nor how this keyword handler object would be integrated with the rest of the system. So, I did some research, and I found out that there are two main architectures for JSON Schema Validators, those being the recursive architecture and the visitor architecture.

The visitor architecture contained the missing pieces for me, it was basically what I was trying to do, but unlike me, it actually knew what to do in the keyword handler functions (I'll talk about it in more depth in a bit). I honestly didn't bother to fully read/understand the recursive architecture, but from what I understand, it's different from what I'm trying to do.

The Visitor Architecture

To be clear, both the recursive and visitor architectures are recursive, but in different ways.

Basically this goes as follows:

  1. You have the main/public function that's used by anyone and everyone, let's call it validate. This function uses a function that we'll call validateSchema
  2. The validateSchema function is responsible for controlling the flow of the validator, it determines which part of the schema we'll visit and it uses the keyword handler object to get the handler function for each keyword
  3. The handler functions take three arguments, those being the schema, instance, and validation context. The schema and instance should be pretty obvious, but the validation context deserves an explanation
  4. The validation context is an object that is used by the validator to help it work correctly. It takes notes on what's happening (by adding error objects whenever an error occurs, keeping track of the instance location and schema location, keeping track of the evaluated items and properties, etc.), and this information is later used to help the validator do its job. It could also contain references to functions that would be needed by the keyword handler functions
  5. When the validateSchema function uses a keyword handler function, the function executes its logic, and at its end, it uses the validateSchema function to give back control of the flow (the arguments should be children of the parent call to the validateSchema function, so the schema argument should be a sub-schema, and so on)
  6. steps 2-5 are repeated until the validation is complete, and the validation context is then used by the validate function to create the validation output result

This architectural design allows us to separate the flow of the validation from the actual implementation of each keyword handler, which allows us to more easily add, update, or remove keywords from the system

Tooling & Implementation Notes

I feel like it's worth mentioning that I've used and will probably continue to use LLMs to help me with my implementation.
That being said, I am NOT vibe coding. If something is written in my file, then I've either written it myself or thoroughly reviewed it beforehand.
I've mainly used LLMs to either help me with syntax issues (since I'm new to Typescript), or to help me find solutions when I'm stuck (like with the visitor architecture).
I'll probably use LLMs less often later on in the project, since my two main reasons for using LLMs should lessen over time.

The code can be found on GitHub

Top comments (0)