Hercules Lemke Merscher

Posted on Oct 28 • Originally published at bitmaybewise.substack.com

Tsonnet #25 - Chain me maybe, part 1

#tsonnet #jsonnet #compiler

Welcome to the Tsonnet series!

If you're not following the series so far, you can check out how it all started in the first post of the series.

In the previous post, I added support for top-level object access using the $ operator:

Tsonnet #24 - Getting to the root of things: top-level object access

Hercules Lemke Merscher ・ Oct 19

#tsonnet #jsonnet #compiler

Now it's time to tackle something that's been nagging at me -- chained field access. We need that to complete one more part of the Jsonnet tutorial: references.

The problem: one hop is never enough

Right now, Tsonnet can handle simple field access like self.answer or $.root_value. But what if you want to drill down further?

{
    answer: {
        value: 42
    },
    answer_to_the_ultimate_question: self.answer.value
}

And it gets more interesting with arrays:

{
    arr: [1, 2, 3],
    first: self.arr[0]
}

Or even mixing field chains with array indexing:

{
  'Tom Collins': {
    ingredients: [
      { kind: "Farmer's Gin", qty: 1.5 },
      { kind: 'Lemon', qty: 1 },
      { kind: 'Simple Syrup', qty: 0.5 },
    ],
  },
  Martini: {
    ingredients: [
      {
        kind: $['Tom Collins'].ingredients[0].kind,
        qty: 2,
      },
      { kind: 'Dry White Vermouth', qty: 1 },
    ],
  },
}

That last one is from Jsonnet's official tutorial on references! Time to catch up.

It all starts with the parser

To support chained field access, the parser needs to understand that after accessing a field, you might want to access another field, or index into an array, and then maybe access yet another field.

The tricky part? Making the grammar unambiguous. If we naively add .identifier or [expr] as repeatable patterns, Menhir gets confused -- should it parse .identifier as a complete field access, or wait to see if there's a [expr] coming next?

The solution is to explicitly tie the first bracketed expression to the scope (self or $), and then allow chaining from there. Here's what changed:

obj_field_access:
  | scope = obj_scope; chain = obj_field_chain { ObjectFieldAccess (with_pos $startpos $endpos, scope, chain) }
  (* The first bracketed expr when accessing an object field
     must be explicitly declared here, instead of being part
     of `object_field_expr`.

     Adding the bracketed expr there will make the grammar unclear
     since Menhir will need to decide between parsing one of the options:
     1) .identifier
     2) .identifier[expr]

     By tying to the scope, such as $[expr], the grammar is now clear
     and Menhir doesn't need to decide on its own.
  *)
  | scope = obj_scope;
    LEFT_SQR_BRACKET; e = assignable_expr; RIGHT_SQR_BRACKET;
    chain = obj_field_chain
    { ObjectFieldAccess (with_pos $startpos $endpos, scope, e :: chain) }
  ;

I added three new rules to support this:

obj_field_expr:
  | DOT; e = indexed_expr { e }
  | DOT; id = identifier { id }
  ;

obj_field_chain:
  | { [] }
  | id = obj_field_expr; ids = obj_field_chain { id :: ids }
  ;

obj_scope:
  | SELF { Self }
  | TOP_LEVEL_OBJ { TopLevel }
  ;

The obj_field_chain rule is the key -- it recursively builds a list of field expressions. An empty chain is valid (for simple cases like self.field), and each obj_field_expr can be either a dot-identifier or a dot-indexed expression.

I also extracted a few things to make the grammar cleaner:

indexed_expr:
  | varname = ID; LEFT_SQR_BRACKET; e = assignable_expr; RIGHT_SQR_BRACKET { IndexedExpr (with_pos $startpos $endpos, varname, e) }
  ;

identifier:
  | id = ID { Ident (with_pos $startpos $endpos, id) }
  ;

These helpers make it easier to reuse indexed expressions and identifiers in different contexts.

AST changes

Now the AST needs to reflect these changes. The big one: ObjectFieldAccess now takes an expr list instead of a single string:

type expr =
   | Unit
   | Null of position
@@ -37,9 +44,10 @@ type expr =
   | String of position * string
   | Ident of position * string
   | Array of position * expr list
-  | Object of position * object_entry list
-  | ObjectSelf of Env.env_id
-  | ObjectFieldAccess of position * object_scope * string
+  | ParsedObject of position * object_entry list
+  | RuntimeObject of position * Env.env_id * ObjectFields.t
+  | ObjectPtr of Env.env_id * object_scope
+  | ObjectFieldAccess of position * object_scope * expr list

There's more going on here! I also refactored how objects work:

Object became ParsedObject -- the static definition as it appears in the source
RuntimeObject is the new representation after interpretation, holding an environment ID and a set of field names
ObjectSelf became ObjectPtr because it now handles both self and $ references

The RuntimeObject is a crucial change. Instead of eagerly evaluating all fields and storing results, it keeps track of which fields exist and evaluates them lazily when requested. This is essential for proper lazy evaluation semantics and cycle detection.

I also introduced ObjectFields, a set type to track unique field names:

module StringSet = struct
  type t = string
  let compare = String.compare
end

module ObjectFields = Set.Make(StringSet)

Using a set means fields are automatically sorted alphabetically, which has an interesting side effect on JSON output that we'll see later.

The string_of_type function got beefed up to handle the new types with better debugging info:

-let string_of_type = function
+let rec string_of_type = function
   | Null _ -> "Null"
   | Number (_, number) ->
     (match number with
     | Int _ -> "Int"
     | Float _ -> "Float")
   | Bool _ -> "Bool"
-  | String _ -> "String"
-  | Ident _ -> "Identity"
-  | Array _ -> "Array"
-  | Object _ -> "Object"
+  | String (_, s) -> "\"" ^ s ^ "\""
+  | Ident (_, id) -> Printf.sprintf "Ident(%s)" id
+  | Array (_, items) ->
+    Printf.sprintf "[%s]"
+      (String.concat ", " (List.map string_of_type items))
+  | ParsedObject (_, fields) ->
+    Printf.sprintf "PlainObject{%s}"
+      (String.concat ", " (List.map string_of_object_entry fields))
+  | RuntimeObject (_, (Env.EnvId id), fields) ->
+    Printf.sprintf "obj<%d>{%s}" id
+      (String.concat ", " (ObjectFields.to_list fields))

Now when debugging, I can actually see what's inside arrays and objects, which has been invaluable.

Scope and test updates

The variant renames required small fixes in a couple of places. The scope validator needed to know about ParsedObject:

-  | Object (_, entries) ->
+  | ParsedObject (_, entries) ->
     (* Object validation - this is where scope context changes *)
     validate_object entries context

And the property-based test generator:

         (2, QCheck.Gen.map2
-          (fun pos entries -> Object (pos, entries))
+          (fun pos entries -> ParsedObject (pos, entries))
           pos_gen

Nothing exciting, just keeping things consistent.

Interpreting chains: where the magic happens

The interpreter is where all this comes together. First, it needs to distinguish between ParsedObject (what we parse) and RuntimeObject (what we execute):

   match expr with
   | Null _ | Bool _ | String _ | Number _ -> ok (env, expr)
   | Array (pos, exprs) -> interpret_array env (pos, exprs)
-  | Object (pos, entries) -> interpret_object env (pos, entries)
-  | ObjectFieldAccess (pos, scope, field) -> interpret_object_field_access env (pos, scope, field)
+  | ParsedObject (pos, entries) -> interpret_object env (pos, entries)
+  | RuntimeObject _ as runtime_obj -> ok (env, runtime_obj)
+  | ObjectFieldAccess (pos, scope, chain) -> interpret_object_field_access env (pos, scope, chain)

When we encounter a RuntimeObject during interpretation, we just return it as-is -- it's already been interpreted.

Refactoring interpret_object

The interpret_object function needed significant changes. It now builds a RuntimeObject instead of eagerly evaluating everything:

and interpret_object env (pos, entries) =
  let* obj_id = Env.Id.generate () in
  let self_expr = ObjectPtr (obj_id, Self) in
  let env' = Env.add_local "self" self_expr env in
  let env', toplevel_expr = Env.add_local_when_not_present "$" (ObjectPtr (obj_id, TopLevel)) env' in

First, we create pointer expressions for self and $. The add_local_when_not_present is crucial -- it only adds $ if it doesn't exist, which preserves the top-level reference in nested objects.

I also made a small change to add_local_when_not_present so it returns the value that ended up in the environment:

diff --git a/lib/env.ml b/lib/env.ml
index 01dbf14..2b4077a 100644
--- a/lib/env.ml
+++ b/lib/env.ml
@@ -41,8 +41,8 @@ let add_local = Map.add

let add_local_when_not_present name value env =
   match find_opt name env with
-  | Some _ -> env
-  | None -> add_local name value env
+  | Some expr -> (env, expr)
+  | None -> let env = add_local name value env in (env, value)

This lets us capture the actual $ reference that will be used, whether it's the new one we just added or one that was already there.

Next, we populate the environment with object fields, but we don't evaluate them yet:

  (* First add locals and object fields to env *)
  let* (env', fields) = List.fold_left
    (fun result entry ->
      let* (env', fields) = result in
      match entry with
      | ObjectExpr expr ->
        (* ObjectExpr holds a single local. Interpreting
          it will add the expr to the environment *)
        let* (env', _) = interpret env' expr in ok (env', fields)
      | ObjectField (name, expr) ->
        let env' = Env.add_obj_field name expr obj_id env' in
        ok (env', ObjectFields.add name fields)
    )
    (ok (env', ObjectFields.empty))
    entries
  in

Notice how we're building up an ObjectFields set while we add fields to the environment? This tracks which fields exist without evaluating them.

Now comes the tricky part -- evaluating fields lazily while maintaining the right environment:

  (* Then interpret object fields after env is populated *)
  let* env' = ObjectFields.fold
    (fun field acc ->
      let* env' = acc in
      let* (env', _expr) =
        Env.get_obj_field field obj_id env'
          ~succ:(interpret)
          ~err:(Error.error_at pos)
      in
      (* self is removed by object evaluation, for this reason
         we re-add self and $ to env' on each iteration here *)
      let env' = Env.add_local "self" self_expr env' in
      let env' = Env.add_local "$" toplevel_expr env' in
      ok env'
    )
    fields
    (ok env')
  in

This is subtle but important. When we interpret a field which is an object, it might remove self and $ from the environment (because that's what happens when an object finishes interpreting). So after each field evaluation, we re-add those bindings. This ensures that the next field can reference self or $ of the current object if it needs to.

Finally, we clean up and return the RuntimeObject:

  (* Remove self and $ from the resulting environment.
     Posterior interpretations shouldn't have references to them. *)
  let env' = Env.Map.remove "self" env' in
  let env' = Env.Map.remove "$" env' in

  ok (env', RuntimeObject (pos, obj_id, fields))

The environment we return shouldn't contain object-specific bindings -- those are only valid while interpreting the object itself.

Walking the chain

The interpret_object_field_access function is where we actually walk through the chain of field accesses:

and interpret_object_field_access env (pos, scope, chain_exprs) =
  let* obj =
    match Env.find_opt (string_of_object_scope scope) env with
    | Some (ObjectPtr _ as obj) -> ok obj
    | _ ->
      Error.error_at pos
        (match scope with
        | Self -> Scope.self_out_of_scope
        | TopLevel -> Scope.no_toplevel_object)
  in

First, we retrieve the object reference (self or $) from the environment. Then we fold over the chain:

  List.fold_left
    (fun acc field_expr ->
      let* (env', prev_expr) = acc in
      let get_obj_id =
        match prev_expr with
        | ObjectPtr (obj_id, _) -> ok obj_id
        | RuntimeObject (_, obj_id, _) -> ok obj_id
        | _ -> Error.error_at pos "Must be an object"
      in

Each iteration takes the previous result (which should be an object or object pointer) and extracts its environment ID. Then we handle three cases:

      match field_expr with
      | String (pos, field) | Ident (pos, field) ->
        let* obj_id = get_obj_id in
        Env.get_obj_field field obj_id env'
          ~succ:(interpret)
          ~err:(Error.error_at pos)

Simple field access -- just look up the field and interpret it.

      | IndexedExpr (pos, field, index_expr) ->
        let* obj_id = get_obj_id in
        let* (env', index_expr') = interpret env' index_expr in
        let* (env', indexable_expr) =
          Env.get_obj_field field obj_id env'
            ~succ:(interpret)
            ~err:(Error.error_at pos)
        in
          Result.fold
            (Indexable.get index_expr' indexable_expr)
            ~ok:(fun e -> interpret env' e)
            ~error:(Error.error_at pos)

Indexed access (like arr[0]) requires more work: interpret the index expression, get the field, then index into it. This is where I initially made a mistake -- I was calling Indexable.get inside the get_obj_field callback, which broke memoization. The fix was to interpret the field first, then do the indexing outside the callback.

      | _e ->
        Error.error_at pos "Invalid object lookup"
    )
    (ok (env, obj))
    chain_exprs

And that's it! Each iteration returns a result that becomes the input for the next iteration, building up the chain step by step.

String concatenation now needs the environment too

Since RuntimeObject now needs the environment to retrieve fields, the Json module can't just take an expression anymore -- it needs the environment too. This triggered a small refactoring in how string concatenation works:

@@ -21,14 +21,14 @@ let interpret_arith_op (op: bin_op) (n1: number) (n2: number) =
   | Divide, (Int a), (Float b) -> Float ((float_of_int a) /. b)
   | Divide, (Float a), (Float b) -> Float (a /. b)

-let interpret_concat_op (e1 : expr) (e2 : expr) : (expr, string) result =
+let interpret_concat_op env (e1 : expr) (e2 : expr) : (expr, string) result =
   match e1, e2 with
   | String (_, s1), String (_, s2) ->
     ok (String (dummy_pos, s1^s2))
   | String (_, s1), val2 ->
-    let* s2 = Json.expr_to_string val2 in ok (String (dummy_pos, s1^s2))
+    let* s2 = Json.expr_to_string (env, val2) in ok (String (dummy_pos, s1^s2))
   | val1, String (_, s2) ->
-    let* s1 = Json.expr_to_string val1 in ok (String (dummy_pos, s1^s2))
+    let* s1 = Json.expr_to_string (env, val1) in ok (String (dummy_pos, s1^s2))
   | _ ->
     error "Invalid string concatenation operation"


@@ -57,7 +58,7 @@ let rec interpret env expr =
     let* (env2, e2') = interpret env1 e2 in
     match op, e1', e2' with
     | Add, (String _ as v1), (_ as v2) | Add, (_ as v1), (String _ as v2) ->
-      let* expr' = interpret_concat_op v1 v2 in
+      let* expr' = interpret_concat_op env2 v1 v2 in
       ok (env, expr')
     | _, Number (pos, v1), Number (_, v2) ->
       ok (env2, Number (pos, interpret_arith_op op v1 v2))

Rendering RuntimeObjects

The Json module needed significant changes to handle RuntimeObject. The signature of value_to_yojson changed to accept both the environment and the expression:

-let rec value_to_yojson : Ast.expr -> (Yojson.t, string) result = function
+let rec value_to_yojson (env : expr Env.Map.t) (expr : Ast.expr) : (Yojson.t, string) result =
+  match expr with
   | Number (_, n) ->
     ok (match n with
     | Int i -> `Int i
@@ -10,22 +12,29 @@ let rec value_to_yojson : Ast.expr -> (Yojson.t, string) result = function
   | Bool (_, b) -> ok (`Bool b)
   | String (_, s) -> ok (`String s)
   | Array (_, values) ->
-    let expr_to_list expr' = to_list (value_to_yojson expr') in
+    let expr_to_list expr' = to_list (value_to_yojson env expr') in
     let results = values |> List.map expr_to_list |> List.concat in
     ok (`List results)
-  | Object (_, entries) ->
-    let eval' = fun entry ->
-      match entry with
-      | ObjectField (k, v) ->
-        let result = value_to_yojson v
-        in Result.map (fun val' -> (k, val')) result
-      | _ ->
-        error "Object expression(s) not representable as JSON"
-    in
-    let results = entries |> List.map eval' |> List.map to_list |> List.concat
-    in ok (`Assoc results)
-  | _ -> error "value type not representable as JSON"
+  | RuntimeObject (pos, context, fieldset) -> obj_to_yojson env (pos, context, fieldset)
+  | expr -> error ("value type not representable as JSON: " ^ string_of_type expr)

The key is obj_to_yojson, which retrieves fields from the environment:

and obj_to_yojson env (pos, obj_id, fieldset) =
  let* fields =
    ObjectFields.fold
      (fun field acc ->
        let* (_, expr) =
          Env.get_obj_field field obj_id env
            ~succ:(fun _ expr -> ok (env, expr))
            ~err:(Error.error_at pos)
        in
        let* yo_value = value_to_yojson env expr in
        let* fields = acc in
        ok ((field, yo_value) :: fields)
      )
      fieldset
      (ok [])
  in ok (`Assoc (List.rev fields))

We fold over the field set, retrieve each field from the environment, convert it to JSON, and build up a list. The List.rev at the end preserves alphabetical order (since we're prepending to the list as we go).

And the entry point change:

-let expr_to_string expr =
-  let yojson = value_to_yojson expr
+let expr_to_string (env, expr) =
+  let yojson = value_to_yojson env expr
   in Result.map Yojson.pretty_to_string yojson

Type checking: mirroring the interpreter

The type checker follows almost the same pattern as the interpreter. First, new type variants:

 type tsonnet_type =
   | Tunit
   | Tnumber
@@ -11,11 +11,15 @@ type tsonnet_type =
   | Tany
   | Tarray of tsonnet_type
   | Tobject of t_object_entry list
-  | TobjectSelf of Env.env_id
+  | TruntimeObject of Env.env_id * t_object_entry list
+  | TobjectPtr of Env.env_id * t_object_scope
   | Lazy of expr
 and t_object_entry =
   | TobjectField of string * tsonnet_type
   | TobjectExpr of tsonnet_type
+and t_object_scope =
+  | TobjectSelf
+  | TobjectTopLevel

The to_string function also got beefed up with better representations:

-  | TobjectSelf (Env.EnvId id) -> Printf.sprintf "self (%d)" id
+  | TruntimeObject (_, fields) ->
+    let field_to_string = function
+      | TobjectField (field, ty) -> field ^ " : " ^ to_string ty
+      | TobjectExpr ty -> to_string ty
+    in
+    "{" ^ (
+      String.concat ", " (List.map field_to_string fields)
+    ) ^ "}"
+  | TobjectPtr (Env.EnvId id, scope) ->
+    let s =
+      match scope with
+      | TobjectSelf -> "self"
+      | TobjectTopLevel -> "$"
+    in Printf.sprintf "%s (%d)" s id

The translate function handles the new variants:

-  | Object (pos, entries) -> translate_object venv pos entries
-  | ObjectFieldAccess (pos, scope, field) -> translate_object_field_access venv pos scope field
+  | ParsedObject (pos, entries) -> translate_object venv pos entries
+  | ObjectFieldAccess (pos, scope, chain) -> translate_object_field_access venv pos scope chain

Checking for cycles

Cycle detection got more sophisticated. It now checks not just the field itself, but also any indexed expressions:

and check_object_field_for_cycles venv (pos, scope, field_expr) seen =
  (match Env.find_opt (string_of_object_scope scope) venv with
  | Some (TobjectPtr (obj_id, _)) ->
    (match field_expr with
    | String (_, field) | Ident (_, field) ->
      let obj_field = Env.uniq_field_ident obj_id field in
      check_cyclic_refs venv obj_field seen pos
    | IndexedExpr (_, field, index_expr) ->
      let obj_field = Env.uniq_field_ident obj_id field in
      let* () = check_cyclic_refs venv obj_field seen pos in
      check_expr_for_cycles venv index_expr seen
    | _ -> ok ()
    )
  | _ -> ok ()
  )

For indexed expressions, we check both the field and the index expression itself. This catches cycles like:

{
    arr: [self.first],
    first: self.arr[0]
}

We also need to check chains:

and check_object_field_chain_for_cycles venv (pos, scope, exprs) seen =
  List.fold_left
    (fun result expr ->
      let* () = result in
      check_object_field_for_cycles venv (pos, scope, expr) seen
    )
    (ok ())
    exprs

This iterates through each expression in the chain, checking for cycles at each step.

Translating objects

The translate_object function mirrors interpret_object:

and translate_object venv pos entries =
  let* obj_id = Env.Id.generate () in
  let venv = Env.add_local "self" (TobjectPtr (obj_id, TobjectSelf)) venv in
  let venv, _ =
    Env.add_local_when_not_present "$" (TobjectPtr (obj_id, TobjectTopLevel)) venv
  in

  (* Translate locals *)
  let* venv = List.fold_left
    (fun result entry ->
      let* venv = result in
      match entry with
      | ObjectExpr expr ->
        let* (venv', _) = translate expr venv in (ok venv')
      | ObjectField (attr, expr) ->
        ok (Env.add_obj_field attr (Lazy expr) obj_id venv)
    )
    (ok venv)
    entries
  in

Then check for cycles:

  (* Check for cyclical references among object fields *)
  let* () = List.fold_left
      (fun ok' entry -> ok' >>= fun _ ->
        match entry with
        | ObjectField (attr, _) ->
          check_cyclic_refs venv (Env.uniq_field_ident obj_id attr) [] pos
        | _ -> ok'
      )
      (ok ())
      entries
  in

And translate all the fields:

  (* Then translate object fields *)
  let* entry_types = List.fold_left
    (fun result entry ->
      let* entries' = result in
      match entry with
      | ObjectField (attr, _) ->
        let* (_, entry_ty) = Env.get_obj_field attr obj_id venv
          ~succ:translate_lazy
          ~err:(Error.error_at pos)
        in ok (entries' @ [TobjectField (attr, entry_ty)])
      | _ ->
        result
    )
    (ok [])
    entries
  in
  ok (venv, TruntimeObject (obj_id, entry_types))

Translating field chains

The translate_object_field_access function got significantly more complex to handle chains:

and translate_object_field_access venv pos scope chain_exprs =
  let* obj =
    match Env.find_opt (string_of_object_scope scope) venv with
    | Some (TobjectPtr _ as obj) -> ok obj
    | _ ->
      Error.error_at pos
        (match scope with
        | Self -> Scope.self_out_of_scope
        | TopLevel -> Scope.no_toplevel_object)
  in

Then we fold through the chain, just like in the interpreter:

  List.fold_left
    (fun acc field_expr ->
      let* (venv, prev_ty) = acc in

      let get_obj_id =
        match prev_ty with
        | TobjectPtr (obj_id, _) -> ok obj_id
        | TruntimeObject (obj_id, _) -> ok obj_id
        | _ -> Error.error_at pos "Must be an object"
      in

      match field_expr with
      | String (_, field) | Ident (_, field) ->
        let* obj_id = get_obj_id in
        Env.get_obj_field field obj_id venv
          ~succ:translate_lazy
          ~err:(Error.error_at pos)

Simple field access is straightforward. Indexed access requires more checking:

      | IndexedExpr (pos, field, index_expr) ->
        let* (venv', index_expr_ty) = translate index_expr venv in
        let* () =
          match index_expr_ty with
          | Tnumber | Tstring -> ok ()
          | ty -> Error.error_at pos (to_string ty ^ " is a non-indexable type")
        in
        let* obj_id = get_obj_id in
        let* (venv', ty) =
          Env.get_obj_field field obj_id venv'
            ~succ:translate_lazy
            ~err:(Error.error_at pos)
        in
        (match ty with
        | (Tarray _) as array_ty -> ok (venv', array_ty)
        | Tstring as ty -> ok (venv', ty)
        | _ -> Error.error_at pos (field ^ " is a non-indexable value")
        )

We verify the index expression is a number or string, get the field, and verify the field is indexable (array or string).

      | _ ->
        Error.error_at pos ("Invalid object lookup key: " ^ string_of_type field_expr)
    )
    (ok (venv, obj))
    chain_exprs

Does it actually work?

Let's verify! Simple chained field access:

{
    answer: {
        value: 42
    },
    answer_to_the_ultimate_question: self.answer.value
}

$ tsonnet samples/objects/self_field_lookup_chain.jsonnet
{ "answer": { "value": 42 }, "answer_to_the_ultimate_question": 42 }

Array indexing:

{
    arr: [1, 2, 3],
    first: self.arr[0]
}

$ tsonnet samples/objects/self_field_indexed_access.jsonnet
{ "arr": [ 1, 2, 3 ], "first": 1 }

Chained access with bracket notation:

{
    answer: {
        value: 42
    },
    answer_to_the_ultimate_question: $['answer'].value
}

$ tsonnet samples/objects/toplevel_field_lookup_chain.jsonnet
{ "answer": { "value": 42 }, "answer_to_the_ultimate_question": 42 }

And the full cocktail example from Jsonnet's references tutorial:

{
  'Tom Collins': {
    ingredients: [
      { kind: "Farmer's Gin", qty: 1.5 },
      { kind: 'Lemon', qty: 1 },
      { kind: 'Simple Syrup', qty: 0.5 },
      { kind: 'Soda', qty: 2 },
      { kind: 'Angostura', qty: 'dash' },
    ],
    garnish: 'Maraschino Cherry',
    served: 'Tall',
  },
  Martini: {
    ingredients: [
      {
        kind: $['Tom Collins'].ingredients[0].kind,
        qty: 2,
      },
      { kind: 'Dry White Vermouth', qty: 1 },
    ],
    garnish: 'Olive',
    served: 'Straight Up',
  },
  'Gin Martini': self.Martini,
}

$ tsonnet samples/tutorials/references.jsonnet
{
  "Gin Martini": {
    "garnish": "Olive",
    "ingredients": [
      { "kind": "Farmer's Gin", "qty": 2 },
      { "kind": "Dry White Vermouth", "qty": 1 }
    ],
    "served": "Straight Up"
  },
  "Martini": {
    "garnish": "Olive",
    "ingredients": [
      { "kind": "Farmer's Gin", "qty": 2 },
      { "kind": "Dry White Vermouth", "qty": 1 }
    ],
    "served": "Straight Up"
  },
  "Tom Collins": {
    "garnish": "Maraschino Cherry",
    "ingredients": [
      { "kind": "Farmer's Gin", "qty": 1.5 },
      { "kind": "Lemon", "qty": 1 },
      { "kind": "Simple Syrup", "qty": 0.5 },
      { "kind": "Soda", "qty": 2 },
      { "kind": "Angostura", "qty": "dash" }
    ],
    "served": "Tall"
  }
}

Perfect! The Martini successfully references the Tom Collins' gin through the chain $['Tom Collins'].ingredients[0].kind.

Catching cycles in chains

The cycle detection now catches more complex scenarios. Simple indexed cycles:

{
    arr: [self.first],
    first: self.arr[0]
}

$ tsonnet samples/semantics/invalid_binding_cycle_indexed_field.jsonnet
samples/semantics/invalid_binding_cycle_indexed_field.jsonnet:3:11 Cyclic reference found for 1->arr

3:     first: self.arr[0]
   ^^^^^^^^^^^^^^^^^^^^^^

And nested field cycles:

{
    a: {
        value: $.b
    },
    b: self.a.value,
}

$ tsonnet samples/semantics/invalid_binding_cycle_object_nested_field.jsonnet
samples/semantics/invalid_binding_cycle_object_nested_field.jsonnet:5:7 Cyclic reference found for 1->a

5:     b: self.a.value,
   ^^^^^^^^^^^^^^^^^^^^

Beautiful! The type checker catches these before the interpreter ever runs.

The cram tests

All the new functionality is locked in with cram tests:

  $ tsonnet ../../samples/objects/toplevel_bracket_lookup.jsonnet
  { "answer": 42, "answer_to_the_ultimate_question": 42 }

+  $ tsonnet ../../samples/objects/self_field_lookup_chain.jsonnet
+  { "answer": { "value": 42 }, "answer_to_the_ultimate_question": 42 }
+
+  $ tsonnet ../../samples/objects/self_field_indexed_access.jsonnet
+  { "arr": [ 1, 2, 3 ], "first": 1 }
+
+  $ tsonnet ../../samples/objects/toplevel_field_lookup_chain.jsonnet
+  { "answer": { "value": 42 }, "answer_to_the_ultimate_question": 42 }

And the cycle detection tests:

+  $ tsonnet ../../samples/semantics/invalid_binding_cycle_object_nested_field.jsonnet
+  ../../samples/semantics/invalid_binding_cycle_object_nested_field.jsonnet:5:7 Cyclic reference found for 1->a
+  
+  5:     b: self.a.value,
+     ^^^^^^^^^^^^^^^^^^^^
+  [1]
+
+  $ tsonnet ../../samples/semantics/invalid_binding_cycle_indexed_field.jsonnet
+  ../../samples/semantics/invalid_binding_cycle_indexed_field.jsonnet:3:11 Cyclic reference found for 1->arr
+  
+  3:     first: self.arr[0]
+     ^^^^^^^^^^^^^^^^^^^^^^
+  [1]

The alphabetical side effect

Remember that ObjectFields set I mentioned? Using a set means fields are automatically sorted, which changes the JSON output ordering:

   $ tsonnet ../../samples/literals/object.jsonnet
   {
-    "int_attr": 1,
+    "array_attr": [ 1, false, {} ],
     "float_attr": 4.2,
-    "string_attr": "Hello, world!",
+    "int_attr": 1,
     "null_attr": null,
-    "array_attr": [ 1, false, {} ],
-    "obj_attr": { "a": true, "b": false, "c": { "d": [ 42 ] } }
+    "obj_attr": { "a": true, "b": false, "c": { "d": [ 42 ] } },
+    "string_attr": "Hello, world!"
   }

Is this a problem? Not really. JSON objects are unordered by specification, so this is technically correct. But it might surprise users who expect source order preservation, but probably not since this is Jsonnet's behaviour. If it becomes an issue, I could switch to a data structure that preserves insertion order while still detecting duplicates. For now, alphabetical ordering is a nice side effect.

Conclusion

Tsonnet can now handle arbitrarily deep field chains with array and object indexing, just like Jsonnet. This was one of the most challenging features to implement so far because of how many pieces had to work together:

The parser needed an unambiguous grammar for chains
The AST needed to change to represent object field chains
Objects needed to become dynamic (RuntimeObject) for proper lazy evaluation
The interpreter needed to walk chains step by step
The type checker needed to mirror all of this
Cycle detection needed to check both fields and index expressions
The JSON renderer needed access to the environment

But it all came together. The refactoring from Object to ParsedObject/RuntimeObject was a bit painful but necessary -- it sets us up well for future features and makes the semantics much clearer.

A few thoughts on what could be improved:

Perhaps adding a context to RuntimeObject where it holds the environment might simplify things. Right now, we're threading the environment through everything.
Or even better, not interpreting fields at all during interpret_object -- this is how lazy evaluation is supposed to work anyway. We could defer all field interpretation until they're actually accessed. Right now we eagerly evaluate behind the scenes, which is misleading under the hood, but the outcome, which is the most important, is still compliant with the lazy evaluated behaviour. Refactoring this could improve the performance and simplify the code.
Error messages are getting out of hand -- I guess it's time to group them for consistency. Having error strings scattered throughout the codebase is becoming unmaintainable.
Debugging the AST is becoming a must with the increased complexity. Maybe using the ppxlib package to auto-derive string representations would be better than implementing a stringified version of each variant type myself. I'll consider it in the upcoming changes.

There's still more to be explored about the changes made here. I haven't touched in a good chunk of testing, but that will stay for the next post. See you there!

Here is the entire diff.

Thanks for reading Bit Maybe Wise! Subscribe to follow along as I chain together compiler features faster than you can say $['Tom Collins'].ingredients[0].kind three times fast.

Photo by Devin Berko on Unsplash

DEV Community