DEV Community

Cover image for Tsonnet #22 - Fixing a "lazy" bug
Hercules Lemke Merscher
Hercules Lemke Merscher

Posted on • Originally published at bitmaybewise.substack.com

Tsonnet #22 - Fixing a "lazy" bug

Welcome to the Tsonnet series!

If you haven't been following the series so far, you can check out how it all started in the first post of the series.

In the previous post, I implemented local variables scoped to objects:

While writing, I realized that Tsonnet wasn't interpreting local variables within objects in a lazy-evaluated manner. This is inconsistent with the properties of the language, so let's fix it!

The inconsistency

Let's add a slightly smaller version of the Jsonnet tutorial file introduced in the previous post:

// samples/variables/obj_variable.jsonnet
local house_rum = 'Banks Rum';
{
  local pour = 1.5,

  Daiquiri: {
    ingredients: [
      { kind: house_rum, qty: pour },
      { kind: 'Lime', qty: 1 },
      { kind: 'Simple Syrup', qty: 0.5 },
    ],
    served: 'Straight Up',
  },
}
Enter fullscreen mode Exit fullscreen mode

This is perfectly parsed and interpreted, both by Jsonnet and Tsonnet:

dune exec -- tsonnet samples/variables/obj_variable.jsonnet
{
  "Daiquiri": {
    "ingredients": [
      { "kind": "Banks Rum", "qty": 1.5 },
      { "kind": "Lime", "qty": 1 },
      { "kind": "Simple Syrup", "qty": 0.5 }
    ],
    "served": "Straight Up"
  }
}

jsonnet samples/variables/obj_variable.jsonnet
{
   "Daiquiri": {
      "ingredients": [
         {
            "kind": "Banks Rum",
            "qty": 1.5
         },
         {
            "kind": "Lime",
            "qty": 1
         },
         {
            "kind": "Simple Syrup",
            "qty": 0.5
         }
      ],
      "served": "Straight Up"
   }
}
Enter fullscreen mode Exit fullscreen mode

Now let's move the pour local variable declaration after its usage:

// samples/variables/obj_variable_late_binding_access.jsonnet
local house_rum = 'Banks Rum';
{
  Daiquiri: {
    ingredients: [
      { kind: house_rum, qty: pour },
      { kind: 'Lime', qty: 1 },
      { kind: 'Simple Syrup', qty: 0.5 },
    ],
    served: 'Straight Up',
  },

  local pour = 1.5,
}
Enter fullscreen mode Exit fullscreen mode

This currently fails in Tsonnet:

dune exec -- tsonnet samples/variables/obj_variable_late_binding_access.jsonnet
samples/variables/obj_variable_late_binding_access.jsonnet:5:30 Undefined variable: pour

5:       { kind: house_rum, qty: pour },
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Enter fullscreen mode Exit fullscreen mode

I wrote earlier in Tsonnet 16 - Late binding and Jsonnet inconsistency about how Jsonnet evaluates local variables top to bottom. Funnily enough, Jsonnet has no problem whatsoever in interpreting this file:

jsonnet samples/variables/obj_variable_late_binding_access.jsonnet
{
   "Daiquiri": {
      "ingredients": [
         {
            "kind": "Banks Rum",
            "qty": 1.5
         },
         {
            "kind": "Lime",
            "qty": 1
         },
         {
            "kind": "Simple Syrup",
            "qty": 0.5
         }
      ],
      "served": "Straight Up"
   }
}
Enter fullscreen mode Exit fullscreen mode

What meme

I was definitely not expecting this, but I'll take it -- this is what we would expect in an object-oriented language anyway.

Let's update the variables.t cram tests to capture this with the expected results:

diff --git a/test/cram/variables.t b/test/cram/variables.t
index 2889c7f..abbb767 100644
--- a/test/cram/variables.t
+++ b/test/cram/variables.t
@@ -16,3 +16,26 @@
   $ tsonnet ../../samples/variables/late_binding_array.jsonnet
   [ "apple", "banana" ]

+  $ tsonnet ../../samples/variables/obj_variable.jsonnet
+  {
+    "Daiquiri": {
+      "ingredients": [
+        { "kind": "Banks Rum", "qty": 1.5 },
+        { "kind": "Lime", "qty": 1 },
+        { "kind": "Simple Syrup", "qty": 0.5 }
+      ],
+      "served": "Straight Up"
+    }
+  }
+
+  $ tsonnet ../../samples/variables/obj_variable_late_binding_access.jsonnet
+  {
+    "Daiquiri": {
+      "ingredients": [
+        { "kind": "Banks Rum", "qty": 1.5 },
+        { "kind": "Lime", "qty": 1 },
+        { "kind": "Simple Syrup", "qty": 0.5 }
+      ],
+      "served": "Straight Up"
+    }
+  }
Enter fullscreen mode Exit fullscreen mode

Now to the implementation.

Interpreting late binding within objects

We must first interpret the ObjectExpr entries. These hold the Local expressions that will add the variables to the environment. The ObjectField entries will be added to the environment for later interpretation:

and interpret_object env (pos, entries) =
  let* obj_id = Env.Id.generate () in
  (* First add locals and object fields to env *)
  let* env' = List.fold_left
    (fun result entry ->
      let* env' = result in
      match entry with
      | ObjectExpr expr ->
        (* ObjectExpr holds a single local. Interpreting
          it will add the expr to the environment *)
        let* (env'', _) = interpret env' expr in (ok env'')
      | ObjectField (attr, expr) ->
        ok (Env.add_obj_field attr expr obj_id env')
    )
    (ok env)
    entries
  in
Enter fullscreen mode Exit fullscreen mode

Adding local variables to the environment with names that were previously added overrides the outer scope -- a concept called shadowing in programming language design. With the current Env implementation and its purely functional semantics, it never destroys or updates the environment, it always generates a new one within the local scope, while the outer scope remains untouched.

For object fields, however, we must be careful not to override locally scoped variables when adding fields to the environment. Here's where the obj_id comes into play. Local bindings are only identified by their names in the local scope, but we can have object fields that have the same name of local variables, but they shouldn't override local bindings when added to the environment. To mitigate this, we must have a unique identifier for every single object, and their attributes will be scoped to that ID.

Only after adding local bindings and fields to the environment can we interpret all ObjectField entries in the object:

  (* Then interpret after env is populated. This allows locals
    and object fields to be accessed in a lazy evaluated manner. *)
  let* evaluated_entries = List.fold_left
    (fun result entry ->
      let* entries' = result in
      match entry with
      | ObjectField (attr, _) ->
        let* (_, entry) = Env.get_obj_field attr obj_id env'
          ~succ:(fun env'' expr -> interpret env'' expr)
          ~err:(Error.error_at pos)
        in ok (entries' @ [ObjectField (attr, entry)])
      | _ ->
        (* Ignore previously evaluated expressions *)
        result
    )
    (ok [])
    entries
  in
  ok (env, Object (pos, evaluated_entries))
Enter fullscreen mode Exit fullscreen mode

This strategy is similar to the type-checking method I used for checking invalid cyclical references, where we fill the environment with references and then dig in until we reach the bottom of the evaluation chain. The translation of objects by the type checker does exactly that too. It has been extracted to the function translate_object:

and translate_object venv pos entries =
  let* obj_id = Env.Id.generate () in
  (* Translate locals *)
  let* venv' = List.fold_left
    (fun result entry ->
      let* venv = result in
      match entry with
      | ObjectExpr expr ->
        let* (venv', _) = translate expr venv in (ok venv')
      | ObjectField (attr, expr) ->
        ok (Env.add_obj_field attr (Lazy expr) obj_id venv)
    )
    (ok venv)
    entries
  in
  (* Then translate object fields *)
  let* entry_types = List.fold_left
    (fun result entry ->
      let* entries' = result in
      match entry with
      | ObjectField (attr, _) ->
        let* (_, entry_ty) = Env.get_obj_field attr obj_id venv'
          ~succ:(fun venv'' texpr ->
            match texpr with
            | Lazy expr -> translate expr venv''
            | ty -> Error.error_at pos ("Invalid type " ^ to_string ty)
          )
          ~err:(Error.error_at pos)
        in ok (entries' @ [TobjectField (attr, entry_ty)])
      | _ ->
        result
    )
    (ok [])
    entries
  in
  ok (venv, Tobject entry_types)
Enter fullscreen mode Exit fullscreen mode

After type-checking, we reset the environment, so the interpreter gets a clean slate to work with:

let check expr =
  translate expr Env.empty >>= fun _ -> Env.Id.reset (); ok expr
Enter fullscreen mode Exit fullscreen mode

Now to the lib/env.ml file changes. Starting with the new env_id type Id module:

type env_id = EnvId of int

module Id = struct
  let counter = ref 0

  let generate () =
    if !counter < max_int
    then (incr counter; Result.ok (EnvId !counter))
    else Result.error "Too many uniquely identifiable expressions added to the environment!"

  let reset () = counter := 0
end
Enter fullscreen mode Exit fullscreen mode
  1. We don't want to confuse regular ints with unique IDs generated by the environment, so we type our newly generated IDs as EnvId.
  2. The new Id module encapsulates the ID generation and reset through the ref counter.
  3. It will fail if it hits the maximum integer ceiling. This can be worked around in multiple ways, but I want to keep it as simple as possible for now. It's unlikely to max out 64-bit integers (2^62 - 1), to say the least.

And the helper functions:

let add_local = Map.add

let uniq_field_ident (EnvId id) name =
  Printf.sprintf "%d->%s" id name

let add_obj_field name expr obj_id env =
  add_local (uniq_field_ident obj_id name) expr env

let get_obj_field name obj_id env ~succ ~err =
  find_var (uniq_field_ident obj_id name) env ~succ ~err
Enter fullscreen mode Exit fullscreen mode
  1. add_local is just a helper function providing more semantic meaning than the previous Env.Map.add. This makes it clear that we are dealing with local variables only.
  2. add_obj_field makes use of another helper function, uniq_field_ident, that will get the obj_id and merge with the field name, and then we use it to add to the environment.
  3. get_obj_field reuses find_var to retrieve the value for the ObjectField given its obj_id. ## Invalid local binding cycles

We should expect invalid local binding cycles within objects to be treated as usual "invalid binding cycles":

// samples/semantics/invalid_binding_cycle_object_locals.jsonnet
{
    local a = b,
    local b = c,
    local c = b,
    c: a,
}
Enter fullscreen mode Exit fullscreen mode

The cram test captures it:

diff --git a/test/cram/semantics.t b/test/cram/semantics.t
index f3fe070..b26b131 100644
--- a/test/cram/semantics.t
+++ b/test/cram/semantics.t
@@ -29,6 +29,13 @@
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   [1]

+  $ tsonnet ../../samples/semantics/invalid_binding_cycle_object_locals.jsonnet
+  ../../samples/semantics/invalid_binding_cycle_object_locals.jsonnet:3:14 Cyclic reference found for c
+  
+  3:     local b = c,
+     ^^^^^^^^^^^^^^^^
+  [1]
+
   $ tsonnet ../../samples/semantics/invalid_binding_cycle_binop.jsonnet
   ../../samples/semantics/invalid_binding_cycle_binop.jsonnet:1:10 Cyclic reference found for b
Enter fullscreen mode Exit fullscreen mode

Conclusion

With this implementation, Tsonnet now properly handles lazy evaluation of local variables within objects, bringing it in line with Jsonnet's behavior. This fix ensures that local variables can be referenced before they're declared within an object, making Tsonnet's evaluation model truly consistent with its lazy evaluation principles.

The entire diff can be seen here.


Thanks for reading Bit Maybe Wise! If you enjoyed diving into the lazy depths of Tsonnet's evaluation mysteries, subscribe to catch more tales of late binding, scoping shenanigans, and configuration wizardry!

Photo by Luis Villasmil on Unsplash

Top comments (0)