Welcome to the Tsonnet series!
If you're just joining, you can check out how it all started in the first post of the series.
In the previous post, Tsonnet has been augmented with property-based tests:

Tsonnet #18 - Property-based testing saves the day (and my sanity)
Hercules Lemke Merscher ・ Jun 6
This allows us to add new features to the language while being relatively certain that things will continue working in a myriad of scenarios.
Now it's time to take the first step towards type safety in Tsonnet.
Before starting
Before continuing, I want to clean up lib/tsonnet.ml
. There's too much code in there and I prefer that it stays as a clean orchestrator, therefore, I'm moving the interpreter related functions to its own module.
I won't bother copying the change here, but you can see the diff here.
The code now lives in a new file: lib/interpreter.ml
. This makes lib/tsonnet.ml
much more comprehensible:
open Result
open Syntax_sugar
(** [parse s] parses [s] into an AST. *)
let parse (filename: string) =
let input = open_in filename in
let lexbuf = Lexing.from_channel input in
Lexing.set_filename lexbuf filename;
let result =
try ok (Parser.prog Lexer.read lexbuf)
with
| Lexer.SyntaxError err -> (Error.trace err (Ast.pos_from_lexbuf lexbuf)) >>= error
| Parser.Error -> (Error.trace "Invalid syntax" (Ast.pos_from_lexbuf lexbuf)) >>= error
in
close_in input;
result
let run (filename: string) : (string, string) result =
parse filename
>>= Interpreter.eval
>>= Json.expr_to_string
Semantic analysis
While Tsonnet's feature set is still modest -- primitive types, locals, and binary/unary operations -- there are semantic issues we need to address that go beyond simple type verification.
The most pressing issue is detecting invalid binding cycles. Currently, Tsonnet can get stuck in infinite loops during evaluation -- something a proper semantic analysis phase should catch. Consider this example:
// samples/semantics/valid_binding_cycle.jsonnet
local a = b;
local b = d;
local c = a;
local d = 1;
a
Jsonnet evaluates top to bottom, and if a variable is not found predefined earlier, it fails immediately:
$ jsonnet samples/semantics/valid_binding_cycle.jsonnet
samples/semantics/valid_binding_cycle.jsonnet:1:11-12 Unknown variable: b
local a = b;
As we explored in Tsonnet 16 - Late binding and Jsonnet inconsistency, Jsonnet's evaluation model prevents it from resolving forward references, despite claiming lazy evaluation.
Tsonnet supports late binding, and this binding cycle, although seemingly confusing, is valid:
dune exec -- tsonnet samples/semantics/valid_binding_cycle.jsonnet
1
Here's an actually invalid case:
// samples/semantics/invalid_binding_cycle.jsonnet
local a = c;
local b = d;
local c = a;
local d = 1;
a
Lazy evaluation cannot help here, because there's an infinite cycle.
Jsonnet does behave correctly here, due to its eagerly evaluated binding nature:
jsonnet samples/semantics/invalid_binding_cycle.jsonnet
samples/semantics/invalid_binding_cycle.jsonnet:1:11-12 Unknown variable: c
local a = c;
In Tsonnet though, we have an infinite loop -- this should be fixed by semantic analysis performed by the type checker.
A minimal type checker
Before evaluating our code, we'll type-check our program:
diff --git a/lib/tsonnet.ml b/lib/tsonnet.ml
index 10a860b..6a55f4b 100644
--- a/lib/tsonnet.ml
+++ b/lib/tsonnet.ml
@@ -17,5 +17,6 @@ let parse (filename: string) =
let run (filename: string) : (string, string) result =
parse filename
+ >>= Type.check
>>= Interpreter.eval
>>= Json.expr_to_string
See how clean it is to add intermediate steps to the compiler now?!
The refactored architecture makes adding compiler phases remarkably clean -- this is why separating concerns matters.
I'm also adding a new helper function to Error
module to reuse across different modules. This is a pretty common pattern we've encountered so far in the interpreter and now will be shared with the type checker:
let error_at pos = fun msg -> trace msg pos >>= error
The Type.check
function serves two purposes: it translates expressions to their corresponding types and performs semantic validation. Here's the core type system:
(* lib/type.ml *)
open Ast
open Result
open Syntax_sugar
type tsonnet_type =
| Tunit
| Tnull
| Tbool
| Tnumber
| Tstring
| Tobject of (string * tsonnet_type) list
| Tarray of tsonnet_type
| Lazy of expr
let translate_late_binding translate_fun = fun venv expr ->
match expr with
| Lazy expr -> translate_fun expr venv
| _ -> error "Expected a lazy evaluated expression"
let rec translate expr venv =
match expr with
| Null _ -> ok (venv, Tnull)
| Bool _ -> ok (venv, Tbool)
| Number _ -> ok (venv, Tnumber)
| String _ -> ok (venv, Tstring)
| Ident (pos, varname) ->
Env.find_var varname venv
~succ:(translate_late_binding translate)
~err:(Error.error_at pos)
| Local (_, vars) ->
let venv' =
(List.fold_left
(fun venv (varname, var_expr) ->
(* Adds an expr to the env to be evaluated at a later point in time (when required) *)
Env.Map.add varname (Lazy var_expr) venv
)
venv
vars
)
in ok (venv', Tunit)
| Seq exprs ->
List.fold_left
(fun acc expr -> acc >>= fun (venv, _) -> translate expr venv)
(ok (venv, Tunit))
exprs
| _ -> error "Not yet implemented"
let check expr =
translate expr Env.empty >>= fun _ -> ok expr
These types mirror Tsonnet's values but exist purely for compile-time analysis. The Lazy
constructor is particularly important -- it represents unevaluated expressions that we'll resolve when needed, preserving our lazy evaluation semantics during type checking.
We need a bucket to store our types, our environment. Here we are re-using the Env
module, but instead of having Env of string * expr
, we'll have instead Env of string * tsonnet_type
.
The translate
function recursively descents expressions until it reaches the bottom -- the concrete (primitive) types.
This code is straightforward to understand. A crucial detail: we can't immediately type-check local bindings because of lazy evaluation. Instead, we store them as Lazy
expressions in the environment. Only when an Ident
forces evaluation do we recursively type-check the referenced expression.
This is enough to make the valid binding case to pass:
dune exec -- tsonnet samples/semantics/valid_binding_cycle.jsonnet
1
But this actually breaks existing code, because we aren't type-checking some literals like arrays and operations yet:
dune exec -- tsonnet samples/literals/array.jsonnet
Not yet implemented
dune exec -- tsonnet samples/binary_operations.jsonnet
Not yet implemented
Recursing on the rest of types
We need a Tany
type for cases where precise type inference isn't possible or necessary. This commonly occurs with heterogeneous arrays -- while JSON allows mixed-type arrays, we'll track them as Tany
rather than creating complex union types for now:
diff --git a/lib/type.ml b/lib/type.ml
index e6f4bf3..ecd152a 100644
--- a/lib/type.ml
+++ b/lib/type.ml
@@ -8,10 +8,25 @@ type tsonnet_type =
| Tbool
| Tnumber
| Tstring
| Tobject of (string * tsonnet_type) list
+ | Tany
| Tarray of tsonnet_type
| Lazy of expr
And a to_string
function to make error messages more user-friendly:
let rec to_string = function
| Tunit -> "()"
| Tnull -> "Null"
| Tbool -> "Bool"
| Tnumber -> "Number"
| Tstring -> "String"
| Tany -> "Any"
| Tarray ty -> "Array of " ^ to_string ty
| Tobject fields ->
"{" ^ (
String.concat ", " (List.map (fun (field, ty) -> field ^ " : " ^ to_string ty) fields)
) ^ "}"
| Lazy ty -> string_of_type ty
Most of the types are direct translations. Product types such as array and object are the ones carrying more complexity:
let rec translate expr venv =
match expr with
| Unit -> ok (venv, Tunit)
| Null _ -> ok (venv, Tnull)
| Bool _ -> ok (venv, Tbool)
| Number _ -> ok (venv, Tnumber)
| String _ -> ok (venv, Tstring)
| Ident (pos, varname) ->
Env.find_var varname venv
~succ:(translate_late_binding translate)
~err:(Error.error_at pos)
| Array (_pos, elems) ->
(* As of now, we compare each element and if all have the same type,
it is an array of this type, otherwise it will be an array of any.
Since JSON doesn't care about types, this is the simpler way of
handling such case in compatibility with Jsonnet. However, when we
actually use the items, given the flexibility of JSON and Jsonnet
regarding types, we may need to support polymorphic arrays in a more
flexible way, but I'm leaving this problem unsolved for now until I
finally need to.
*)
(match elems with
| [] -> ok (venv, Tany)
| elem :: rest ->
let ty = Lazy elem in
let* (venv, ty) =
List.fold_left
(fun acc elem -> acc >>= fun (venv, ty) ->
let elem_ty = Lazy elem in
if ty = elem_ty
then ok (venv, elem_ty)
else ok (venv, Tany)
)
(ok (venv, ty))
rest
in ok (venv, Tarray ty)
)
| Object (_pos, elems) ->
let* fields =
List.fold_left
(fun acc (attr, expr) ->
let* attrs = acc in
let* (_, ty) = translate expr venv in
ok ((attr, ty) :: attrs)
)
(ok [])
elems
in ok (venv, Tobject fields)
| Local (_, vars) ->
let venv' =
(List.fold_left
(fun venv (varname, var_expr) ->
(* Adds an expr to the env to be evaluated at a later point in time (when required) *)
Env.Map.add varname (Lazy var_expr) venv
)
venv
vars
)
in ok (venv', Tunit)
| Seq exprs ->
List.fold_left
(fun acc expr -> acc >>= fun (venv, _) -> translate expr venv)
(ok (venv, Tunit))
exprs
| BinOp (pos, op, e1, e2) ->
(let* (venv', e1') = translate e1 venv in
let* (venv'', e2') = translate e2 venv' in
match op, e1', e2' with
| _, Tnumber, Tnumber -> ok (venv'', Tnumber)
| Add, _, Tstring | Add, Tstring, _ -> ok (venv'', Tstring)
| _ -> Error.trace "Invalid binary operation" pos >>= error
)
| UnaryOp (pos, op, expr) ->
(let* (venv', expr') = translate expr venv in
match op, expr' with
| Plus, Tnumber | Minus, Tnumber | BitwiseNot, Tnumber -> ok (venv', Tnumber)
| Not, Tbool | BitwiseNot, Tbool -> ok (venv', Tbool)
| _ -> Error.trace "Invalid unary operation" pos >>= error
)
| IndexedExpr (pos, varname, index_expr) ->
(let* (venv', index_expr') = translate index_expr venv in
match index_expr' with
| Tnumber ->
Env.find_var varname venv'
~succ:(fun venv' expr' ->
match expr' with
| (Tarray _) as ty -> ok (venv', ty)
| Tstring as ty -> ok (venv', ty)
| Lazy expr -> translate expr venv
| ty -> error (to_string ty ^ " is a non indexable value")
)
~err:(Error.error_at pos)
| ty -> Error.trace ("Expected Integer index, got " ^ to_string ty) pos >>= error
)
All primitive types are now covered, and invalid binary and unary operations can now be checked by the type checker too.
We take a pragmatic approach to array typing: if all elements have the same type, we infer a homogeneous array type. Otherwise, we fall back to Tany
.
This works great for all tests but the invalid cyclic reference.
Checking cyclic references
Now for the main event: detecting those infinite cycles that could crash our interpreter.
After adding all local bindings to the environment, we need to validate that none create circular dependencies. The algorithm walks the dependency graph, maintaining a "seen" list to detect cycles:
| Local (pos, vars) ->
let venv' = List.fold_left
(* Adds an expr to the env to be evaluated at a later point in time (when required) *)
(fun venv (varname, var_expr) -> Env.Map.add varname (Lazy var_expr) venv)
venv
vars
in
let* () = List.fold_left
(fun ok' (varname, _) -> ok' >>= fun _ -> check_cyclic_refs venv' varname [] pos)
(ok ())
vars
in ok (venv', Tunit)
The check_cyclic_refs
function will retrieve the variables from the environment and check if they've been seen before, if yes it means we have a cyclic reference:
let rec check_cyclic_refs venv varname seen pos =
if List.mem varname seen
then
Error.trace ("Cyclic reference found for " ^ varname) pos >>= error
else
match Env.Map.find_opt varname venv with
| Some (Lazy expr) -> check_expr_for_cycles venv expr (varname :: seen)
| _ -> ok ()
and check_expr_for_cycles venv expr seen =
match expr with
| Unit | Null _ | Number _ | String _ | Bool _ -> ok ()
| Array (_, exprs) -> iter_for_cycles venv seen exprs
| Object (_, fields) -> iter_for_cycles venv seen (List.map snd fields)
| Ident (pos, varname) -> check_cyclic_refs venv varname seen pos
| BinOp (_, _, e1, e2) -> iter_for_cycles venv seen [e1; e2]
| UnaryOp (_, _, e) -> check_expr_for_cycles venv e seen
| Seq exprs -> iter_for_cycles venv seen exprs
| _ -> ok ()
and iter_for_cycles venv seen exprs =
List.fold_left
(fun ok' expr -> ok' >>= fun _ -> (check_expr_for_cycles venv expr seen))
(ok ())
exprs
The cycle detection must recursively examine every expression because lazy evaluation means cycles can hide deep within complex expressions. A variable might reference another variable buried inside an array or object literal, creating indirect cycles.
Now the invalid binding is caught by the compiler:
dune exec -- tsonnet samples/semantics/invalid_binding_cycle.jsonnet
samples/semantics/invalid_binding_cycle.jsonnet:1:10 Cyclic reference found for c
1: local a = c;
^^^^^^^^^^^^
And many other invalid cases. Such as:
- Binding to itself:
// samples/semantics/invalid_binding_itself.jsonnet
local a = a;
a
dune exec -- tsonnet samples/semantics/invalid_binding_itself.jsonnet
samples/semantics/invalid_binding_itself.jsonnet:1:10 Cyclic reference found for a
1: local a = a;
^^^^^^^^^^^^
- Invalid cyclic binding in array:
// samples/semantics/invalid_binding_cycle_array.jsonnet
local a = b,
b = c,
c = (local d = 1, e = 2; [a, b, c, d, e]);
b
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_array.jsonnet
samples/semantics/invalid_binding_cycle_array.jsonnet:3:30 Cyclic reference found for a
3: c = (local d = 1, e = 2; [a, b, c, d, e]);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Invalid cyclic binding in object:
// samples/semantics/invalid_binding_cycle_object.jsonnet
local obj = { a: 1, b: 2, c: obj };
obj
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_object.jsonnet
samples/semantics/invalid_binding_cycle_object.jsonnet:1:29 Cyclic reference found for obj
1: local obj = { a: 1, b: 2, c: obj };
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Invalid cyclic binding in binary operation:
// samples/semantics/invalid_binding_cycle_binop.jsonnet
local a = b;
local b = a + 1;
a
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_binop.jsonnet
samples/semantics/invalid_binding_cycle_binop.jsonnet:1:10 Cyclic reference found for b
1: local a = b;
^^^^^^^^^^^^
- Invalid cyclic binding in unary operation:
// samples/semantics/invalid_binding_cycle_unaryop.jsonnet
local a = b;
local b = ~a;
a
dune exec -- tsonnet samples/semantics/invalid_binding_cycle_unaryop.jsonnet
samples/semantics/invalid_binding_cycle_unaryop.jsonnet:1:10 Cyclic reference found for b
1: local a = b;
^^^^^^^^^^^^
Conclusion
As of the time of writing this, Tsonnet remains a relatively small language, featuring primitive types, local bindings, binary and unary operations. The type checker represents a significant improvement that will prove even more valuable as we introduce upcoming features.
So far, when it comes to type-checking, we need only one environment -- the value environment -- to translate inferred types from expressions. At some point, type annotations will be added, thus requiring a second environment: the type environment. I'm deferring this complexity until we actually need it -- it will become relevant for functions, typed objects, and likely any disambiguation scenarios when the compiler can't infer the type.
I've been enjoying OCaml since I started Tsonnet; however, introducing the type checker made me realize the language was truly an excellent choice for writing a compiler. The pattern matching and algebraic data types make expressing type relationships and transformations remarkably clean and intuitive.
Thanks for reading Bit Maybe Wise! Subscribe now -- my newsletter passes all semantic analysis checks!
Top comments (0)