Welcome to the Tsonnet series!
If you're not following along, check out how it all started in the first post of the series.
In the previous post, we added warnings for unused variables and untouched bindings:
Now let's get to the most exciting feature in any programming language: functions!
What are we even working with here?
A simple inline function with a positional parameter:
// samples/functions/positional_params.jsonnet
local my_function(x) = x * 2;
my_function(3)
And a function with a multiline body:
// samples/functions/multiline.jsonnet
local simple_function(x, y) = x + y;
local multiline_function(x) =
local temp = x * 2;
[temp, temp + 1];
multiline_function(
simple_function(1, 2)
)
Parsing (a.k.a. making sense of text)
We need two new AST variants -- one for function definition, one for function call:
diff --git a/lib/ast.ml b/lib/ast.ml
index ace6241..106f37c 100644
--- a/lib/ast.ml
+++ b/lib/ast.ml
@@ -90,6 +90,8 @@ type expr =
| Local of position * (string * expr) list
| Seq of expr list
| IndexedExpr of position * string * expr
+ | FunctionDef of position * (string * string list * expr)
+ | FunctionCall of position * string * expr list
and object_entry =
| ObjectField of string * expr
And the parser rules to match:
diff --git a/lib/parser.mly b/lib/parser.mly
index 7adb1ee..102d202 100644
--- a/lib/parser.mly
+++ b/lib/parser.mly
@@ -59,6 +59,7 @@ assignable_expr:
| op = unary_op; e = assignable_expr { UnaryOp (with_pos $startpos $endpos, op, e) }
| e = indexed_expr { e }
| e = obj_field_access { e }
+ | e = funcall { e }
;
indexed_expr:
@@ -173,7 +174,28 @@ var:
varname = ID; ASSIGN; e = assignable_expr { (varname, e) };
vars:
- LOCAL; vars = separated_nonempty_list(COMMA, var) { Local (with_pos $startpos $endpos, vars) };
+ | LOCAL; vars = separated_nonempty_list(COMMA, var) { Local (with_pos $startpos $endpos, vars) }
+ | LOCAL; def = fundef { FunctionDef (with_pos $startpos $endpos, def) }
+ ;
single_var:
- LOCAL; var_expr = var { Local (with_pos $startpos $endpos, [var_expr]) };
+ | LOCAL; var_expr = var { Local (with_pos $startpos $endpos, [var_expr]) }
+ ;
+
+fundef:
+ | fname = ID;
+ LEFT_PAREN; params = separated_nonempty_list(COMMA, ID); RIGHT_PAREN;
+ ASSIGN;
+ body = fundef_body { (fname, params, body) }
+ ;
+
+fundef_body:
+ | e = assignable_expr { e }
+ | local_bindings = vars; SEMICOLON; body = fundef_body { Seq [local_bindings; body] }
+ ;
+
+funcall:
+ | fname = ID;
+ LEFT_PAREN; params = separated_nonempty_list(COMMA, assignable_expr); RIGHT_PAREN
+ { FunctionCall (with_pos $startpos $endpos, fname, params) }
+ ;
fundef_body deserves a note: it lets us nest local bindings inside the function body, which is what makes multiline_function work. A function body is either a plain expression or a local binding followed by a semicolon and the rest of the body, recursively.
Interpreting: where things actually happen
While working on the interpreter, I noticed that evaluating_fields was a misleading name — interpret_ident handles not just object fields, but regular local bindings too. Renamed it to evaluating_bindings:
diff --git a/lib/interpreter.ml b/lib/interpreter.ml
index c8755bc..d512c0f 100644
--- a/lib/interpreter.ml
+++ b/lib/interpreter.ml
@@ -2,7 +2,14 @@ open Ast
open Result
open Syntax_sugar
-let evaluating_fields = ref ObjectFields.empty
+let evaluating_bindings = ref ObjectFields.empty
I also needed a helper to run a function in a fresh evaluation environment, then restore the previous state afterwards:
let with_fresh_evaluating_bindings fn =
let saved_evaluating_bindings = !evaluating_bindings in
evaluating_bindings := ObjectFields.empty;
let result = fn () in
evaluating_bindings := saved_evaluating_bindings;
result
This matters for function calls: we don't want the caller's evaluation state leaking into the function body.
The new pattern-matching cases route to their respective handlers:
(** [interpret expr] interprets and reduce the intermediate AST [expr] into a result AST. *)
let rec interpret env expr =
@@ -19,6 +26,8 @@ let rec interpret env expr =
| Local (_, vars) -> interpret_local env vars
| Seq exprs -> interpret_seq env exprs
| IndexedExpr (pos, varname, index_expr) -> interpret_indexed_expr env (pos, varname, index_expr)
+ | FunctionDef (pos, def) -> interpret_function_def env (pos, def)
+ | FunctionCall (pos, fname, params) -> interpret_function_call env (pos, fname, params)
and interpret_indexed_expr env (pos, varname, index_expr) =
let* (env', index_expr') = interpret env index_expr in
Function definition just registers the function in the environment:
and interpret_function_def env (pos, (fname, params, body)) =
let env' = Env.add_local fname (FunctionDef (pos, (fname, params, body))) env in
ok (env', Unit)
Function call is where things get interesting:
and interpret_function_call env (pos, fname, call_params) =
match Env.find_opt fname env with
| Some (FunctionDef (pos, (_, def_params, body))) ->
if List.compare_lengths call_params def_params <> 0
then Error.error_at pos "wrong number of param(s)"
else
let bindings =
List.mapi
(fun index value ->
let param_name = List.nth def_params index in
(param_name, value)
)
call_params
in
let env' = List.fold_left
(fun env (k, v) -> Env.add_local k v env)
env
bindings
in
let* (_, result) = with_fresh_evaluating_bindings
(fun () -> interpret env' body)
in
ok (env, result)
| _ ->
Error.error_at pos (Error.Msg.var_not_found fname)
Step by step:
- Check that the number of arguments matches the definition. I could skip this in the interpreter and rely on the type checker alone, but I'm keeping it in both for now -- type checking can be bypassed during development for faster iteration.
- Pair up each call argument with its corresponding parameter name.
- Add those bindings to the environment.
- Interpret the function body in that environment, being careful to return the caller's environment, not the one modified inside the body.
Making sure this isn’t completely illegal
We don't have type annotations yet, so the type checker needs to infer parameter types and the return type. Here's the motivating example -- the first call is fine, the second should error:
// samples/semantics/invalid_function_call_type.jsonnet
local my_function(x) = x * 2;
my_function(3) && my_function("oops")
We need three new type variants:
diff --git a/lib/type.ml b/lib/type.ml
index cad9ad1..bdbf357 100644
--- a/lib/type.ml
+++ b/lib/type.ml
@@ -16,6 +16,9 @@ type tsonnet_type =
| TruntimeObject of Env.env_id * t_object_entry list
| TobjectPtr of Env.env_id * t_object_scope
| Lazy of expr
+ | Tunresolved
+ | TfunctionDef of (string * tsonnet_type) list (* params: name * type *) * expr (* body *) * tsonnet_type (* return *)
+ | TfunctionCall of tsonnet_type list * tsonnet_type
and t_object_entry =
| TobjectField of string * tsonnet_type
| TobjectExpr of tsonnet_type
@@ -54,6 +57,17 @@ let rec to_string = function
| TobjectTopLevel -> "$"
in Printf.sprintf "%s (%d)" s id
| Lazy ty -> string_of_type ty
+ | TfunctionDef (params, _, return) ->
+ Printf.sprintf "function(%s) -> %s"
+ (List.map (fun (name, ty) -> name ^ ": " ^ to_string ty) params
+ |> String.concat ", "
+ )
+ (to_string return)
+ | TfunctionCall (params_type, return) ->
+ Printf.sprintf "function(%s) -> %s"
+ (List.map to_string params_type |> String.concat ", ")
+ (to_string return)
+ | Tunresolved -> "<unresolved>"
let rec collect_free_idents = function
| Unit | Null _ | Number _ | String _ | Bool _ -> []
Tunresolved is the key one. At declaration time, we don't know what types the parameters will have -- that only becomes clear at the call site. So we use Tunresolved as a placeholder and fill it in later.
Two new error messages to go with it:
diff --git a/lib/error.ml b/lib/error.ml
index e287af9..e08c670 100644
--- a/lib/error.ml
+++ b/lib/error.ml
@@ -28,6 +28,10 @@ module Msg = struct
let type_non_indexable_type ty = ty ^ " is a non-indexable type"
let type_non_indexable_field field = field ^ " is a non-indexable value"
let type_invalid_lookup_key expr = "Invalid object lookup key: " ^ expr
+ let type_wrong_number_of_params expected got =
+ Printf.sprintf "Expected %d argument(s), got %d" expected got
+ let type_mismatch ~expected ~got =
+ Printf.sprintf "Expected type %s, got %s" expected got
(* Interpreter messages *)
let interp_division_by_zero = "Division by zero"
The new cases in translate forward to specialised functions, same pattern as always:
@@ -162,6 +176,8 @@ let rec translate venv expr =
| BinOp (pos, op, e1, e2) -> translate_bin_op venv pos op e1 e2
| UnaryOp (pos, op, expr) -> translate_unary_op venv (pos, op, expr)
| IndexedExpr (pos, varname, index_expr) -> translate_indexed_expr venv (pos, varname, index_expr)
+ | FunctionDef (pos, def) -> translate_function_def venv (pos, def)
+ | FunctionCall (pos, fname, params) -> translate_function_call venv (pos, fname, params)
| expr' ->
error (Error.Msg.type_invalid_expr (string_of_type expr'))
At declaration time, all types are Tunresolved:
and translate_function_def venv (pos, (fun_name, params, body)) =
(* As of now, we don't know the input types at declaration *)
let params_typed = List.map (fun name -> (name, Tunresolved)) params in
(* We also don't know the result type *)
let fun_def = TfunctionDef (params_typed, body, Tunresolved) in
(* So, function declaration will have an unresolved type definition,
that only later it will be translated: before function call translation!
After first function call, concrete types are set and subsequent calls will
type check against the initial type assignment(s). *)
let venv' = Env.add_local fun_name fun_def venv in
ok (venv', fun_def)
At call time, Tunresolved gets replaced with concrete types -- and subsequent calls are checked against those:
and translate_function_call venv (pos, fname, call_params) =
(* 1. retrieve TfunctionDef from venv *)
match Env.find_opt fname venv with
| Some (TfunctionDef (def_params, body_expr, return_type)) ->
(* check arity *)
if List.compare_lengths call_params def_params <> 0
then
Error.error_at pos
(Error.Msg.type_wrong_number_of_params
(List.length def_params) (List.length call_params))
else
(* 2. type check each positional parameter passed in the function call *)
let* (venv', resolved_params) =
List.fold_left2
(fun acc call_param (param_name, def_param_type) ->
let* (venv', params') = acc in
let* (venv'', call_param_type) = translate venv' call_param in
match def_param_type with
| Tunresolved ->
(* 2a. unresolved: accept and record the concrete type *)
ok (venv'', params' @ [(param_name, call_param_type)])
| expected ->
(* 2b. resolved: type check against the concrete type *)
if call_param_type = expected
then ok (venv'', params' @ [(param_name, expected)])
else Error.error_at pos
(Error.Msg.type_mismatch
~expected:(to_string expected)
~got:(to_string call_param_type))
)
(ok (venv, []))
call_params
def_params
in
(* 3. type check return *)
let body_venv = List.fold_left
(fun env (name, ty) -> Env.add_local name ty env)
venv'
resolved_params
in
(* translate the body with resolved param types in scope *)
let* (_, body_type) = translate body_venv body_expr in
let* resolved_return = match return_type with
| Tunresolved ->
(* 3a. first call: infer return type from body *)
ok body_type
| expected ->
(* 3b. subsequent calls: check body type matches *)
if body_type = expected
then ok expected
else Error.error_at pos
(Error.Msg.type_mismatch
~expected:(to_string expected)
~got:(to_string body_type))
in
(* 4. update env with the now-resolved function type *)
let resolved_fun = TfunctionDef (resolved_params, body_expr, resolved_return) in
let venv_with_resolved_fun = Env.add_local fname resolved_fun venv' in
ok (venv_with_resolved_fun, resolved_return)
| _ ->
Error.error_at pos (Error.Msg.var_not_found fname)
One limitation worth mentioning: the first call wins. If the first call passes a value with the wrong type for the intended use, the type checker won't catch it -- that's what type annotations are for. They're coming once Tsonnet reaches a reasonable level of Jsonnet compliance.
Proof that it (mostly) works
Ta-da!
$ dune exec -- tsonnet samples/functions/positional_params.jsonnet
6
$ dune exec -- tsonnet samples/functions/multiline.jsonnet
[ 6, 7 ]
$ dune exec -- tsonnet samples/semantics/invalid_function_call_type.jsonnet
ERROR: samples/semantics/invalid_function_call_type.jsonnet:2:18 Expected type Number, got String
2: my_function(3) && my_function("oops")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The error could be more precise -- ideally it would point at the offending argument rather than the entire expression. That's a detail I'll get to later; not something that adds much right now.
The cram tests:
diff --git a/test/cram/functions.t b/test/cram/functions.t
new file mode 100644
index 0000000..0ad7e43
--- /dev/null
+++ b/test/cram/functions.t
@@ -0,0 +1,5 @@
+ $ tsonnet ../../samples/functions/positional_params.jsonnet
+ 6
+
+ $ tsonnet ../../samples/functions/multiline.jsonnet
+ [ 6, 7 ]
diff --git a/test/cram/semantics.t b/test/cram/semantics.t
index de60a1b..c9a790e 100644
--- a/test/cram/semantics.t
+++ b/test/cram/semantics.t
@@ -289,3 +322,16 @@
^^
---
1
+
+ $ tsonnet ../../samples/semantics/invalid_function_call_type.jsonnet
+ ERROR: ../../samples/semantics/invalid_function_call_type.jsonnet:2:18 Expected type Number, got String
+
+ 2: my_function(3) && my_function("oops")
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ [1]
+
Conclusion
Basic functions are in: positional parameters, multiline bodies, arity checks, and type inference that resolves on the first call. Not bad for a first round.
Here is the entire diff.
Next up, we make function calls a little more forgiving — default arguments are coming.
Thanks for reading Bit Maybe Wise! First caller wins the type. Subscribe to lock in yours.
Photo by Annie Spratt on Unsplash
Top comments (0)