Implementing a JSON Schema Validator from Scratch - Week 5

#jsonschema #showdev #typescript #learning

This was a very insightful week, even though I didn't complete many visible tasks.

I properly implemented three keywords, though they're very simple ones, those being the type, required, and properties keywords.
Most of this week was spent either building the system's foundation or refactoring parts of it after realizing that changes were needed for a more stable system.

The JSON Schema maintainers weren't joking when they talked about the "'just one more thing and I'll be done!' 100x over" effect.
That being said, I just have one more thing to do, and then I'll be done with the foundation (lol).

Now into the specifics

What I Changed

These are the things that I've updated in the system's foundation.

The `validateHelper` Function Signature

The validateHelper function (formerly known as the validateSchema function) will now return a boolean value, representing whether the schema/sub-schema is valid or not.

This will make a big difference when implementing keywords that deal with multiple sub-schemas (e.g. allOf, anyOf, etc.).

The `PendingUnit`

I've created a new interface called BasicPendingUnit, which represents an output unit that is still being constructed. Since success and failure units are finalized objects, the pending unit is used during the phase where the keywords are still being processed, before the unit gets finalized at the end.

Implementing The New Output Format

As I mentioned last week, there's a new output format that addresses the issues of the original output format from the specs, and it's the one that I'll work with.

The validateHelper function is going to create a PendingUnit (more on that later) for each schema/sub-schema, which would then be used by all the keywords to populate the errors and annotations properties, which allows for one output unit to exist for each schema/sub-schema with all its relevant data.

Location Handling

The ValidationContext class no longer tracks the location (evaluationPath, schemaLocation, and instanceLocation), it now only contains the output units. The location is now separately passed to the validateHelper function (the function that orchestrates the validation process).

The reason for this design choice is to minimize mutability.
Originally, the ValidationContext class contained a shared mutable location instance, and that instance would be mutated every time the validateHelper function was (recursively) called, which required me to undo these changes at the end of every keyword implementation that called the validateHelper function, which is very error-prone.

The new design choice allows each validateHelper function call, and by extension each output unit to contain its own local location instance that it can mutate as much as needed without creating side-effects in other parts of the system.

The Things I Plan on Adding

Here are some of the things that I'm actively working on in the project's foundation

Updating The JSON Pointer classes

I think I've implemented these classes pretty well, but the fork methods have an issue, they can only take one segment as an argument, so keywords like properties which adds something like "/properties/property_name" into the JSON Pointer would require me to pass a very ugly and error prone argument similar to `/properties/${property_name}`. So this has to be fixed.

Integrate Keyword Phases

Not all keywords are created equal, some keywords must be used in the very beginning, some must be used at the very end, and some don't really care. As a result, keywords would be split into different phases, the keywords in the beginning phases would be the first to execute (think of keywords like $schema and $id), the keywords in the ending phases would be the last to execute (like the unevaluatedProperties and unevaluatedItems keywords), and so on.

Lessons From My Implementation

I meant what I said when I said this was an insightful week, but most of this insight was broad and mindset related rather than technical spec stuff

Don't Freeze! Code and Correct Later

Imagining the problems that you'll face a month into coding is a very hard task, especially when it's your first time implementing the system, but it's a necessary thing to do when you first start working on the system, but after that's done, you don't need to do that as much.

Once you have something concrete to work with and build upon, you get the ability to just write code and not worry about the future consequences as much, and that'll allow you to face those consequences first hand, which will let you see the problems, as opposed to guessing or imagining them. That way, you have a clear problem to solve, and don't need to spend a lot of time and effort simulating future problems to solve.

All that being said, you need to stop and actually solve those problems, otherwise you'd just be sacrificing the system's design and settling for spaghetti.

The code can be found on GitHub

DEV Community

Implementing a JSON Schema Validator from Scratch - Week 5