Hercules Lemke Merscher

Posted on Oct 19 • Originally published at bitmaybewise.substack.com

Tsonnet #24 - Getting to the root of things: top-level object access

#tsonnet #jsonnet #compiler

Welcome to the Tsonnet series!

If you're not following the series so far, you can check out how it all started in the first post of the series.

In the previous post, I implemented object self-referencing using the self keyword:

Tsonnet #23 - Mirror, mirror on the wall, who's the most self-referential of them all?

Hercules Lemke Merscher ・ Sep 10

#tsonnet #jsonnet #compiler #ocaml

Now it's time to tackle another reference mechanism -- reaching the outermost object from anywhere in your nested structure.

When self isn't enough

The self keyword is great when you want to reference fields in the current object:

{
    answer: 42,
    answer_to_the_ultimate_question: self.answer
}

But what happens when you're nested several levels deep and need to reference something at the root?

{
    answer: {
        value: 42
    },
    answer_to_the_ultimate_question: {
      value: self.answer
    }
}

Wait, that won't work! The self.answer reference is not valid inside answer_to_the_ultimate_question because this object has no answer field.

This is where Jsonnet's $ operator comes in -- it always refers to the outermost object, no matter how deep you are in the nesting:

{
    answer: {
        value: 42
    },
    answer_to_the_ultimate_question: $.answer.value
}

Let's teach Tsonnet about the dollar sign.

A quick detour: skipping type checks

Before diving into the main feature, I added a small improvement -- a flag to skip type checking during development:

diff --git a/bin/main.ml b/bin/main.ml
index 6d17333..e51d36b 100644
--- a/bin/main.ml
+++ b/bin/main.ml
@@ -1,10 +1,13 @@
 let usage_msg = "tsonnet <file1> [<file2>] ..."
 let input_files = ref []
+let skip_typecheck = ref false
 let anonymous_fun filename = input_files := filename :: !input_files
-let spec_list = []
+let spec_list = [
+  ("--skip-typecheck", Arg.Set skip_typecheck, "Skip type checking step");
+]

 let run_parser filename =
-  match Tsonnet.run filename with
+  match Tsonnet.run ~skip_typecheck:!skip_typecheck filename with
   | Ok stringified_json -> print_endline stringified_json
   | Error err -> prerr_endline err; exit 1

The implementation is straightforward -- when skip_typecheck is true, we bypass the type checking step entirely:

diff --git a/lib/tsonnet.ml b/lib/tsonnet.ml
index 05c764f..614b5ed 100644
--- a/lib/tsonnet.ml
+++ b/lib/tsonnet.ml
@@ -16,8 +16,10 @@ let parse (filename: string) =
   close_in input;
   result

-let run (filename: string) : (string, string) result =
+let run ?(skip_typecheck = false) (filename: string) : (string, string) result =
+  if skip_typecheck then
+    prerr_endline "Warning: Type checking is skipped. This is not recommended as it may lead to runtime errors.\n";
   parse filename
-    >>= Type.check
+    >>= (if skip_typecheck then ok else Type.check)
     >>= Interpreter.eval
     >>= Json.expr_to_string

This is handy when you're iterating quickly and just want to see what the interpreter does, though you'll get a warning about potential runtime errors. I intentionally didn't add a shorter version of this parameter to avoid misuse, as this is intended for development only.

Now back to the main event!

Teaching $ to the compiler

First, we need to recognize the $ token:

diff --git a/lib/lexer.mll b/lib/lexer.mll
index 0ca99bd..9803a55 100644
--- a/lib/lexer.mll
+++ b/lib/lexer.mll
@@ -64,6 +64,7 @@ rule read =
   | '.' { DOT }
   | "local" { LOCAL }
   | "self" { SELF }
+  | "$" { TOP_LEVEL_OBJ }
   | id { ID (Lexing.lexeme lexbuf) }
   | _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }
   | eof { EOF }

Then we extend the parser to handle $ in field access expressions. Notice how we support both dot notation and bracket notation:

diff --git a/lib/parser.mly b/lib/parser.mly
index 3a35c36..959d029 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -19,7 +19,7 @@
 %token LEFT_CURLY_BRACKET RIGHT_CURLY_BRACKET
 %token COLON
 %token DOT
-%token SELF
+%token SELF TOP_LEVEL_OBJ
 %token PLUS MINUS MULTIPLY DIVIDE
 %left PLUS MINUS
 %left MULTIPLY DIVIDE
@@ -54,7 +54,7 @@ assignable_expr:
   | e1 = assignable_expr; op = bin_op; e2 = assignable_expr { BinOp (with_pos $startpos $endpos, op, e1, e2) }
   | op = unary_op; e = assignable_expr { UnaryOp (with_pos $startpos $endpos, op, e) }
   | varname = ID; LEFT_SQR_BRACKET; e = assignable_expr; RIGHT_SQR_BRACKET { IndexedExpr (with_pos $startpos $endpos, varname, e) }
-  | SELF; DOT; field = ID { ObjectFieldAccess (with_pos $startpos $endpos, field) }
+  | e = obj_field_access { e }
   ;

 scoped_expr:
@@ -94,6 +94,13 @@ obj_field_list:
   | f = obj_field; COMMA; fields = obj_field_list { f :: fields }
   ;

+obj_field_access:
+  | SELF; LEFT_SQR_BRACKET; field = STRING; RIGHT_SQR_BRACKET { ObjectFieldAccess (with_pos $startpos $endpos, Self, field) }
+  | SELF; DOT; field = ID { ObjectFieldAccess (with_pos $startpos $endpos, Self, field) }
+  | TOP_LEVEL_OBJ; LEFT_SQR_BRACKET; field = STRING; RIGHT_SQR_BRACKET { ObjectFieldAccess (with_pos $startpos $endpos, TopLevel, field) }
+  | TOP_LEVEL_OBJ; DOT; field = ID { ObjectFieldAccess (with_pos $startpos $endpos, TopLevel, field) }
+  ;
+
 %inline number:
   | i = INT { Int i }
   | f = FLOAT { Float f }

The new obj_field_access rule captures all four combinations: self.field, self['field'], $.field, and $['field'].

Distinguishing object scopes

Now we need to distinguish between self-references and top-level references in the AST. Enter the object_scope type:

diff --git a/lib/ast.ml b/lib/ast.ml
index 5f4ba20..7e4639d 100644
--- a/lib/ast.ml
+++ b/lib/ast.ml
@@ -39,7 +39,7 @@ type expr =
   | Array of position * expr list
   | Object of position * object_entry list
   | ObjectSelf of Env.env_id
-  | ObjectFieldAccess of position * string
+  | ObjectFieldAccess of position * object_scope * string
   | BinOp of position * bin_op * expr * expr
   | UnaryOp of position * unary_op * expr
   | Local of position * (string * expr) list
@@ -48,6 +48,9 @@ type expr =
 and object_entry =
   | ObjectField of string * expr
   | ObjectExpr of expr
+and object_scope =
+  | Self
+  | TopLevel

The ObjectFieldAccess now carries an object_scope to tell us whether we're looking at self.field or $.field.

We also need a helper to convert the scope to a string for error messages:

@@ -56,6 +59,10 @@ let pos_from_lexbuf (lexbuf : Lexing.lexbuf) : position =
     endpos = lexbuf.lex_curr_p;
   }

+let string_of_object_scope = function
+  | Self -> "self"
+  | TopLevel -> "$"
+
 let string_of_type = function
   | Null _ -> "Null"
   | Number (_, number) ->
@@ -88,7 +95,7 @@ let string_of_type = function
   | Seq _ -> "Sequence"
   | IndexedExpr _ -> "Indexed Expression"
   | ObjectSelf _ -> "self"
-  | ObjectFieldAccess (_, field) -> Printf.sprintf "Object field=%s" field
+  | ObjectFieldAccess (_, scope, field) -> Printf.sprintf "Object %s.%s" (string_of_object_scope scope) field

Keeping $ in its place

Just like with self, we need to ensure $ isn't used outside of objects. We extend the scope validator to handle both scopes:

diff --git a/lib/scope.ml b/lib/scope.ml
index 1a25d04..938d03f 100644
--- a/lib/scope.ml
+++ b/lib/scope.ml
@@ -12,6 +12,9 @@ type context = {
   current_locals: string list;
 }

+let self_out_of_scope = "Can't use self outside of an object"
+let no_toplevel_object = "No top-level object found"
+
 let empty_context = {
   in_object = false;
   object_depth = 0;
@@ -40,8 +43,8 @@ let rec _validate expr context =
   | Object (_, entries) ->
     (* Object validation - this is where scope context changes *)
     validate_object entries context
-  | ObjectFieldAccess (pos, _) ->
-    validate_object_field_access pos context
+  | ObjectFieldAccess (pos, scope, _) ->
+    validate_object_field_access pos scope context
   | Local (_, vars) ->
     validate_locals vars context
   | Seq exprs ->
@@ -57,9 +60,10 @@ let rec _validate expr context =
     ok ()

 and validate_ident pos varname context =
-  if varname = "self" && not context.in_object
-  then Error.trace ("Can't use self outside of an object") pos >>= error
-  else ok ()
+  match (varname, context.in_object) with
+  | ("self", false) -> Error.trace self_out_of_scope pos >>= error
+  | ("$", false) -> Error.trace no_toplevel_object pos >>= error
+  | _ -> ok ()

The key change is in validate_object_field_access -- we now check the scope and provide the appropriate error message:

-and validate_object_field_access pos context =
-  (* This catches cases like: local x = self.field; outside of objects *)
+and validate_object_field_access pos scope context =
+  (* This catches cases like:
+    local x = self.field;
+    local x = $.field;
+    outside of objects *)
   if not context.in_object
-  then Error.trace ("Can't use self outside of an object") pos >>= error
+  then
+    let with_error_msg = match scope with
+                        | Self -> self_out_of_scope
+                        | TopLevel -> no_toplevel_object
+    in
+    Error.trace with_error_msg pos >>= error
   else ok ()

Let's see it in action:

$ dune exec -- tsonnet samples/errors/object_outer_most_ref_out_of_scope.jsonnet
samples/errors/object_outer_most_ref_out_of_scope.jsonnet:2:13 No top-level object found

2: local _two = $.one + 1;
   ^^^^^^^^^^^^^^^^^^^^^^^

So far, so good!

Environment: the subtle magic of add_local_when_not_present

Here's where things get interesting. We need to add $ to the environment, but only once -- at the outermost object. Nested objects should keep the same $ reference pointing to the root.

First, we add a helper function to the environment module:

diff --git a/lib/env.ml b/lib/env.ml
index 1fe567b..01dbf14 100644
--- a/lib/env.ml
+++ b/lib/env.ml
@@ -39,6 +39,11 @@ let find_var varname env ~succ ~err =

 let add_local = Map.add

+let add_local_when_not_present name value env =
+  match find_opt name env with
+  | Some _ -> env
+  | None -> add_local name value env

This function only adds the binding if it doesn't already exist -- crucial for preserving the top-level $ reference in nested objects.

The key to making $ work correctly is the add_local_when_not_present function. Why do we need it?

Consider this nested structure:

{
    root_value: 1,
    nested: {
        self_value: 2,
        uses_root: $.root_value,
        uses_self: self.self_value
    }
}

When we enter the outer object, we add both "self" and "$" to the environment, both pointing to the outer object's ID (let's say EnvId 1).

When we enter the nested object, we want:

"self" to point to the nested object's ID (EnvId 2)
"$" to still point to the outer object's ID (EnvId 1)

By using add_local for "self", we shadow the outer self. But by using add_local_when_not_present for "$", we preserve the original top-level reference. It's only added to the environment if it doesn't exist yet -- which means only the outermost object sets it.

Neat, right?

Type checking: translating $ references

The type checker needs to handle both self and $. When translating objects, we add $ using add_local_when_not_present:

 and translate_object venv pos entries =
   let* obj_id = Env.Id.generate () in
-  let venv' = Env.add_local "self" (TobjectSelf obj_id) venv in
+  let obj = TobjectSelf obj_id in
+  let venv' = Env.add_local "self" obj venv in
+  let venv' = Env.add_local_when_not_present "$" obj venv' in
   (* Translate locals *)

See what happened there? We add self normally (which shadows any outer self), but we only add $ if it's not already present. This means the outermost object's ID gets locked in as the $ reference for all nested objects.

For field access translation, we use the scope to look up the right reference:

-and translate_object_field_access venv pos field =
-  match Env.find_opt "self" venv with
+and translate_object_field_access venv pos scope field =
+  match Env.find_opt (string_of_object_scope scope) venv with
   | Some (TobjectSelf obj_id) ->
     Env.get_obj_field field obj_id venv
       ~succ:translate_lazy
       ~err:(Error.error_at pos)
   | _ ->
-    Error.error_at pos "Can't use self outside of an object"
+    Error.error_at pos
+      (if scope = Self
+      then Scope.self_out_of_scope
+      else Scope.no_toplevel_object)

We also need to update cycle detection to handle the scope:

diff --git a/lib/type.ml b/lib/type.ml
index ef63dfc..b0d30ab 100644
--- a/lib/type.ml
+++ b/lib/type.ml
@@ -49,7 +49,7 @@ and check_expr_for_cycles venv expr seen =
   | Unit | Null _ | Number _ | String _ | Bool _ -> ok ()
   | Array (_, exprs) -> iter_for_cycles venv seen exprs
   | Object (_, entries) -> check_object_for_cycles venv entries seen
-  | ObjectFieldAccess (pos, field) -> check_object_field_for_cycles venv field pos seen
+  | ObjectFieldAccess (pos, scope, field) -> check_object_field_for_cycles venv (pos, scope, field) seen
   | Ident (pos, varname) -> check_cyclic_refs venv varname seen pos
   | BinOp (_, _, e1, e2) -> iter_for_cycles venv seen [e1; e2]
   | UnaryOp (_, _, e) -> check_expr_for_cycles venv e seen
@@ -69,8 +69,8 @@ and check_object_for_cycles venv entries seen =
     )
     (ok ())
     entries
-and check_object_field_for_cycles venv field pos seen =
-  (match Env.find_opt "self" venv with
+and check_object_field_for_cycles venv (pos, scope, field) seen =
+  (match Env.find_opt (string_of_object_scope scope) venv with
   | Some (TobjectSelf obj_id) ->
     let obj_field = Env.uniq_field_ident obj_id field in
     check_cyclic_refs venv obj_field seen pos

Interpretation: making $ work at runtime

The interpreter follows the same pattern as the type checker. When we enter an object, we add both self and $:

 @@ -104,7 +104,9 @@ and interpret_array env (pos, exprs) =

 and interpret_object env (pos, entries) =
   let* obj_id = Env.Id.generate () in
-  let env' = Env.add_local "self" (ObjectSelf obj_id) env in
+  let obj = ObjectSelf obj_id in
+  let env' = Env.add_local "self" obj env in
+  let env' = Env.add_local_when_not_present "$" obj env' in
   (* First add locals and object fields to env *)
   let* env'' = List.fold_left
     (fun result entry ->

And when accessing fields, we look up the appropriate scope reference:

diff --git a/lib/interpreter.ml b/lib/interpreter.ml
index c2ce44d..5a8e6f8 100644
--- a/lib/interpreter.ml
+++ b/lib/interpreter.ml
@@ -47,7 +47,7 @@ let rec interpret env expr =
   | Null _ | Bool _ | String _ | Number _ -> ok (env, expr)
   | Array (pos, exprs) -> interpret_array env (pos, exprs)
   | Object (pos, entries) -> interpret_object env (pos, entries)
-  | ObjectFieldAccess (pos, field) -> interpret_object_field_access env (pos, field)
+  | ObjectFieldAccess (pos, scope, field) -> interpret_object_field_access env (pos, scope, field)
   | Ident (pos, varname) ->
     Env.find_var varname env
       ~succ:(fun env' expr -> interpret env' expr)
@@ -140,8 +142,8 @@ and interpret_object env (pos, entries) =
   in
   ok (env, Object (pos, evaluated_entries))

-and interpret_object_field_access env (pos, field) =
-  let* (_, evaluated_expr) = Env.find_var "self" env
+and interpret_object_field_access env (pos, scope, field) =
+  let* (_, evaluated_expr) = Env.find_var (string_of_object_scope scope) env
     ~succ:(fun env' expr ->
       match expr with
       | ObjectSelf obj_id ->
@@ -149,7 +151,10 @@ and interpret_object_field_access env (pos, field) =
           ~succ:interpret
           ~err:(Error.error_at pos)
       | _ ->
-        Error.error_at pos "Can't use self outside of an object"
+        Error.error_at pos
+          (match scope with
+          | Self -> Scope.self_out_of_scope
+          | TopLevel -> Scope.no_toplevel_object)
     )
     ~err:(Error.error_at pos)
   in ok (env, evaluated_expr)

Does it actually work?

Let's verify our implementation with some tests. First, the identifier notation:

// samples/objects/toplevel_reference.jsonnet
{
    one: 1,
    two: $.one + 1
}

$ dune exec -- tsonnet samples/objects/toplevel_reference.jsonnet
{ "one": 1, "two": 2 }

Then, the bracket notation:

// samples/objects/toplevel_bracket_lookup.jsonnet
{
    answer: 42,
    answer_to_the_ultimate_question: $['answer']
}

$ dune exec -- tsonnet samples/objects/toplevel_bracket_lookup.jsonnet
{ "answer": 42, "answer_to_the_ultimate_question": 42 }

Beautiful! And what about using $ outside of objects?

// samples/errors/object_outer_most_ref_out_of_scope.jsonnet
local _one = 1;
local _two = $.one + 1;
{
    one: _one,
    two: _two
}

$ dune exec -- tsonnet samples/errors/object_outer_most_ref_out_of_scope.jsonnet
samples/errors/object_outer_most_ref_out_of_scope.jsonnet:2:13 No top-level object found

2: local _two = $.one + 1;
   ^^^^^^^^^^^^^^^^^^^^^^^

Exactly what we want!

Catching infinite loops and testing

Just like with self, we need to catch cycles involving $:

// samples/semantics/invalid_binding_cycle_outer_object_fields.jsonnet
{
    a: $.b,
    b: $.a,
}

$ dune exec -- tsonnet samples/semantics/invalid_binding_cycle_outer_object_fields.jsonnet
samples/semantics/invalid_binding_cycle_outer_object_fields.jsonnet:3:7 Cyclic reference found for 1->a

3:     b: $.a,
   ^^^^^^^^^^^

The cycle detection works perfectly -- it catches the circular dependency at type-check time, before the interpreter ever runs.

The cram tests capture all of these scenarios:

diff --git a/test/cram/errors.t b/test/cram/errors.t
index 3c28663..b70b266 100644
--- a/test/cram/errors.t
+++ b/test/cram/errors.t
@@ -92,3 +92,10 @@
   2: local _two = self.one + 1;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
   [1]
+
+  $ tsonnet ../../samples/errors/object_outer_most_ref_out_of_scope.jsonnet
+  ../../samples/errors/object_outer_most_ref_out_of_scope.jsonnet:2:13 No top-level object found
+  
+  2: local _two = $.one + 1;
+     ^^^^^^^^^^^^^^^^^^^^^^^
+  [1]
diff --git a/test/cram/objects.t b/test/cram/objects.t
index 97755c9..2af394c 100644
--- a/test/cram/objects.t
+++ b/test/cram/objects.t
@@ -1,2 +1,11 @@
   $ tsonnet ../../samples/objects/self_reference.jsonnet
   { "one": 1, "two": 2 }
+
+  $ tsonnet ../../samples/objects/self_bracket_lookup.jsonnet
+  { "answer": 42, "answer_to_the_ultimate_question": 42 }
+
+  $ tsonnet ../../samples/objects/toplevel_reference.jsonnet
+  { "one": 1, "two": 2 }
+
+  $ tsonnet ../../samples/objects/toplevel_bracket_lookup.jsonnet
+  { "answer": 42, "answer_to_the_ultimate_question": 42 }
diff --git a/test/cram/semantics.t b/test/cram/semantics.t
index 56d4cec..07e2837 100644
--- a/test/cram/semantics.t
+++ b/test/cram/semantics.t
@@ -64,6 +64,13 @@
      ^^^^^^^^^^^^^^
   [1]

+  $ tsonnet ../../samples/semantics/invalid_binding_cycle_outer_object_fields.jsonnet
+  ../../samples/semantics/invalid_binding_cycle_outer_object_fields.jsonnet:3:7 Cyclic reference found for 1->a
+  
+  3:     b: $.a,
+     ^^^^^^^^^^^
+  [1]
+
   $ tsonnet ../../samples/semantics/invalid_binding_cycle_object_field_and_local.jsonnet
   ../../samples/semantics/invalid_binding_cycle_object_field_and_local.jsonnet:2:14 Cyclic reference found for 1->b

Conclusion

We can now reference root values from anywhere in our object tree -- essential for real-world configuration files where you want to define common values once at the top level and reuse them throughout your structure.

Here is the entire diff.

The journey of implementing $ was a great exercise. It lets us maintain different scoping rules for different references without complicating the core environment logic.

Objects are yet too static, and we don't have field chain access. We'll tackle it next.

Thanks for reading Bit Maybe Wise! Subscribe for more tales of nested structures, scope resolution puzzles, and the occasional moment where you realize that one helper function makes everything click into place!

Photo by Reka Illyes on Unsplash