DEV Community

Cover image for Implementing a JSON Schema Validator from Scratch - Week 6
Ahmed Hany Gamal
Ahmed Hany Gamal

Posted on

Implementing a JSON Schema Validator from Scratch - Week 6

This was a good week
I implemented three new keywords, those being allOf, anyOf, and oneOf, and I changed the system's foundation for the really real final last time :^)
Let's get into the specifics

The System's Foundation

After I initially implemented the allOf, anyOf, and oneOf keywords, I started to visualize how I'd implement the unevaluated keywords (unevaluatedProperties and unevaluatedItems), so I started thinking, then I started doing some research, and then I realized something important, my understanding of the expected behavior of the unevaluated keywords was wrong.

The Unevaluated Keywords

I thought the unevaluated keywords caught all the properties/items that hadn't been evaluated in the same schema that the unevaluated keyword was being used in, as well as every sub-schema inside it.
The first half of this paragraph is correct, the second half is not.

The unevaluated keywords track the properties/items evaluated in the same schema as them, as well as the properties/items evaluated in the applicators adjacent to them, but that's it.

I know this might seem unclear, so let me give you a few examples for illustration:

{
  "type": "object",
  "properties": {
    "foo": { "type": "string" }
  },
  "unevaluatedProperties": false
}
Enter fullscreen mode Exit fullscreen mode

This is an incredibly basic example, the allowed instances would either be an object with a property called foo of type string (so something like {"foo": "this is a string"}) or an empty object ({}).
Here the unevaluatedProperties keyword is interchangeable with the additionalProperties keyword.

{
  "type": "object",
  "allOf": [
    {
      "properties": {
        "foo": { "type": "string" }
      }
    }
  ],
  "unevaluatedProperties": false
}
Enter fullscreen mode Exit fullscreen mode

This example shows the difference between the unevaluated keywords and something like additionalProperties, as well as the reason the unevaluated keywords exist.
This example is identical to the previous one, but the same can't be said if additionalProperties was used instead of unevaluatedProperties, as the only allowed instance would then be an empty object ({}), since additionalProperties only sees/tracks the properties from property keywords like properties and patternProperties and does not see or track anything inside any other type of applicator keyword

{
  "type": "object",
  "properties": {
    "foo": {
      "type": "object", 
      "properties": {
        "bar": {"type": "integer"}
      }
    }
  },
  "unevaluatedProperties": false
}
Enter fullscreen mode Exit fullscreen mode

This is the case where I misunderstood things. I thought the unevaluated keywords would track all of the keywords inside all of the sub-schemas inside the schema containing the unevaluated keyword, if that were the case, the only possible instance values would have been {}, { "foo": {} }, and "foo": { "bar": 123 } (or any other integer), but that's not the case, since the unevaluated keywords only check its adjacent applicator keywords.
As a result, the schema forbids the use of any property at the root level with a name other than "foo", and this "foo" property has to be an object, but the unevaluated keyword does not track whatever is inside that inner object inside the "foo" property, so { "foo": { "baz": "untracked prop" } } would be a valid instance.

I'll talk more about how this affected the code in the next section.

The Code

These are the additions and changes done to the code base last week

Evaluation Tracking

In order to support the unevaluated keywords later, some changes needed to be done in the system's foundation, primarily, the addition of evaluatedProperties and evaluatedItems to the BasicPendingUnit interface, and the creation of the EvaluationResult interface.

The BasicPendingUnit now has two additional mandatory properties, those being evaluatedProperties and evaluatedItems, which are used to track the evaluated properties and evaluated items respectively. So now keywords are not only responsible for updating the annotations in the BasicPendingUnit, but also the evaluation properties, which I expect could result in hidden bugs in the future as I forget to add some elements to the evaluated list in some random keyword, but this is the cleanest idea that came to my mind.

Also, the ValidationContext.evaluate method (formerly known as validateHelper) now returns an EvaluationResult object, which is an object containing a boolean property called valid denoting whether or not the evaluation was valid, and a BasicPendingUnit called unit, which contains the pending unit after the schema/sub-schema evaluation is done, so that it would contain all of the evaluation details, most importantly the evaluation properties.
The reason for this is that keywords like anyOf or allOf would have a copy of all the evaluated properties/items from all their sub-schemas, which they would then be able to use to aggregate them based on the logic of the individual keyword.

Keyword Handlers

As mentioned, I've implemented the keyword handlers for the allOf, anyOf, and oneOf keywords.

Not much can be said here, I saw how they were supposed to behave, and implemented them, but there is one thing that I think might be worth mentioning.

When working on a keyword, I believe the section of the specs talking about the keyword MUST be read, as even if you understand the behavior of the keyword, a lot of times there are subtle details that you might miss, I'm talking about things like annotation behavior and edge cases.
With all honestly, sometimes I'm just too lazy to read the specs, so I ask notebookLM instead, and since it only has the specs as its resources, I believe its answers are pretty accurate, but even so, I'll probably refrain from doing this, or at least do it less, and there are two reasons for this:

  1. As good as it is, I can't fully trust it. I actually went through the specs while writing this blog post, because I became aware that the specs are the definitive source of truth, and anything else could be wrong
  2. The main reason I even use things like notebookLM is to save time and energy, but honestly, reading the specs would have probably taken way less time and energy

Conclusion

The addition of evaluation tracking has added a lot of complexity to the code base, I feel like there are twice as much ways to create hard-to-find bugs now, but other than that, everything is great, I have a pretty solid foundation (that definitely won't see any changes in the future), there aren't really any major architectural changes that I'm expecting, I've implemented a bunch of the more difficult keywords, and I expect most of the remaining work to be fun.

I'll be dedicating some time over the next couple of weeks to qualification tasks and proposal work for Google Summer of Code, so validator progress will probably be smaller during that period.

The code can be found on GitHub

Top comments (0)