DEV Community

Haile Lagi
Haile Lagi

Posted on • Originally published at hailelagi.com

It's legos all the way down

Introduction

Often as folks who create useful software things we tend to think of ourselves as people who write software for the mythical "user". A "user" clicks a button and
something magical happens. This is commonly reffered to as an abstraction.
Abstractions are all around us in software and clever programmers create good abstractions for other programmers often to manage complexity.

A really common example of this is an Application Programming Interface which allows two "applications" to share useful data with each other over some transport while being platform-agnostic as to how this data is used. Like an API, there are other interesting kinds of abstractions -- let's peel back the abstraction between the language creator and language user by inventing syntax!

This involves a subtle shift in the paradigm used to understand computation, at the core is the idea of viewing computation as data. I would guess for most people,
the typical mental model when reading at first is mostly procedural, a top-down scan with familiarity of syntax and semantics, then another important shift occurs in
understanding runtime execution with the introduction of concurrency and parallelism, here we'll be peeling back at the layer between compile time and runtime.

Compile time occurs when program text is being "parsed and transformed" into many forms all the way to bits and runtime
is when the program is actually executing ie "running", in this paradigm of viewing programs as textual input to other programs and to the program itself while running, is known as metaprogramming.

 This distinction between what is "compile" and "runtime" is
 a useful mental model illustrated here for simplicity, odds
 are what's happening in your favorite language is probably 
 more interesting![11]
Enter fullscreen mode Exit fullscreen mode

Before we begin, a caveat. Although this technique applies broadly to most modern languages -- implementations vary in feature parity, I'll try to primarily include alternate examples with go's reflection and rust's macro system while providing nods to Cpython[1], Ruby MRI[2] and some javascript [3]) but not typescript[4]

Computation is data

Consider for example the humble eval() function:

// javascript
console.log(eval('2 + 2'));
Enter fullscreen mode Exit fullscreen mode
# python
print(eval('2 + 2'))
Enter fullscreen mode Exit fullscreen mode
# ruby
puts eval('2 + 2')
Enter fullscreen mode Exit fullscreen mode

The computation 2 + 2 is represented as data, in this case a string. That's kind of neat isn't it? however we can take this much futher.
If you're interested in further details of what's happening here, checkout crafting interpreters.

Since programming languages contain all sorts of elements like expressions and statements, we need some
way to hold information about what the program or computation is trying to do, this internal representation is most commonly known as an Abstract Syntax Tree.

At the risk of oversimplification, think of an AST as a way to meaningfully represent the textual source of a program that sometimes allows you to do something resembling string interpolation operations on your program's text source.

Prelude, why meta?

To illustrate this concept, lets see how one might add syntax to create a constructor for a dynamic array in elixir.

First, some background. Elixir is a (mostly) functional language with (mostly) immutable datastructures, it doesn't encourage the use of
or provide a dynamic array out of the box like most functional languages, as the implementation of one
requires random access read/write via mutable state. Nor does it have "constructors", a typical pattern is creating structured data returned from
a function and "piping" it through several other functions:

defmodule MyApp.Array do
  alias MyApp.Array

  defstruct field: nil

  def new(options \\ []) do
    %Array{field: options}
  end
end

iex(1)> MyApp.Array.new() |> do_stuff() |> do_other_stuff()
Enter fullscreen mode Exit fullscreen mode

For this example, we're going to piggyback off the rust standard library's Vector by
creating a foreign function interface in elixir and alternatively a data structure implemented in the erlang stdlib in order to re-create something like vec!

As we'll see the "backend" implementation of the data structure is not important, the fact that it's in rust or erlang doesn't matter, what we're focused on is providing an easy to use syntactic abstraction
of a common datastructure.

Here's a simplified version pulled straight from the rust book of vec!:

#[macro_export]
macro_rules! vec {
    ( $( $x:expr ),* ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}
Enter fullscreen mode Exit fullscreen mode

Here we see a pattern match ( $( $x:expr ),* ) like our humble eval('2 + 2') instead of representing the computation as a string, it's a tree like data-structure
where we can assert at compile time, if some code looks like what we think it looks like, replace it with what's in the match arm,
a process known as macro expansion.

In elixir, we can write something similar, the pattern match is a three-element style tuple[5]:

{node, execution_context or meta_data, arguments}

Go and ruby share some superficial similarities as their metaprogramming api doesn't give you direct access to the AST. In ruby libraries like RSpec,rails router and erb html templates often use metaprogramming techniques via "monkey patching" -- modifying at runtime various
properties of an object[6] and since in ruby's extremely dynamically typed[7] interpreted world there is no notion of "compile time expansion" instead you have powerful introspection and malleability at runtime giving rise to patterns like hooks[8] to alter nearly almost anything about the language via object properties, syntax or not.

Take this small excerpt[9] from rspec-core of the describe public api:

# @private
def self.expose_example_group_alias(name)
  return if example_group_aliases.include?(name)

  example_group_aliases << name

  (class << RSpec; self; end).__send__(:define_method, name) do |*args, &example_group_block|
    group = RSpec::Core::ExampleGroup.__send__(name, *args, &example_group_block)
    RSpec.world.record(group)
    group
  end

  expose_example_group_alias_globally(name) if exposed_globally?
end
Enter fullscreen mode Exit fullscreen mode

There's alot happening but the important thing to note is RSpec::Core::ExampleGroup is an object that is being modified at the test-runner's runtime which specifies the linguistic structure of the domain's specific language of what describe means.

In go like ruby we have reflection that allows runtime introspection, unlike ruby it is statically typed and compiled. Reflection gives a temporary "escape hatch" out of the rigid type system and allows modification based on dynamic interfaces the most idiomatic example of this I can find are the printing family[10] functions like fmt.Sprintf.

func (p *pp) doPrint(a []any) {
 prevString := false
 for argNum, arg := range a {
  isString := arg != nil && reflect.TypeOf(arg).Kind() == reflect.String
  // Add a space between two non-string arguments.
  if argNum > 0 && !isString && !prevString {
   p.buf.writeByte(' ')
  }
  p.printArg(arg, 'v')
  prevString = isString
 }
}
Enter fullscreen mode Exit fullscreen mode

Building a (Dynamic) Array "constructor" in Elixir

Now, let's get hands on. Everything here lives in a mix project called ExVec and defining the macro's public api:

defmodule ExVec do
  alias ExVec.{Array, Vector}

  defmacro __using__(implementation: impl) do
    quote do
      import unquote(__MODULE__)
      Module.put_attribute(__MODULE__, :implementation, unquote(impl))
    end
  end

  defmacro vec!([_h | _t] = args) do
    quote bind_quoted: [args: args] do
      dispatch_constructor(@implementation, args)
    end
  end

  defmacro vec!({:.., _, [first, last]}) do
    args = Range.new(first, last) |> Enum.to_list()

    quote bind_quoted: [args: args] do
      dispatch_constructor(@implementation, args)
    end
  end

  def dispatch_constructor(impl, args) do
    case impl do
      :rust -> Vector.new(args)
      :erlang -> Array.new(args)
      _ -> raise "invalid constructor type, did you mean :rust?"
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

The ex_vec library has two backends ExVec.Array which is a thin wrapper around :array and ExVec.Vector which is a NIF wrapper that leverages rustler's Encoder and Decoder to encode an elixir List as a Vec then implementing interfaces for what an array might look like in elixir by defining:

  1. The Access behaviour
  2. A protocol implementation of Enumerable

By implementing these specifications we can safely use things from stdlib like Enum and even Stream and just like that in any other elixir project
and letting the client choose the backend while keep the macro's syntax:

defmodule MyApp.DoStuff do
  alias ExVec.Vector
  use ExVec, implementation: :rust


  def len do
    # serialised as a rust Vec<i32>
    vec!(1..4) |> Enum.count()
    vec!([1, 2, 3, 4]) |> Enum.count()

    # plain old linked-list
    [1, 2, 3, 4] |> Enum.count()
  end

  def random_access do
    # O(1) read
    my_array = vec!(0..10)
    my_array[5]

     # serialised random write access
    Vector.get_and_update(my_array, 0, fn n -> {n, 42} end)
  end
end

defmodule MyApp.DoOtherStuff do
  use ExVec, implementation: :erlang

  def len do
    # this is an erlang :array!
    vec!([1, 2, 3, 4]) |> Enum.count()
  end
end
Enter fullscreen mode Exit fullscreen mode

unfortunately as of the time of this writing, rustler does not support generic type intefaces so I
guess this is impossible?

#[derive(Debug, NifStruct)]
#[module = "ExVec.Vector"]
pub struct Vector<T> {
   fields: Vec<T>,o
   size: isize
}
Enter fullscreen mode Exit fullscreen mode

Therefore a serious limitation of this toy example is it only works for i32 integers :) I also glossed over some behaviours and protocols with defaults.

You can find the full source for this example here, please let me know if you have a comment, found a bug or typo. Thanks for reading!

References

[1] Python3's excellent ast library: https://docs.python.org/3/library/ast.html

[2] RubyVM::AST : https://ruby-doc.org/core-trunk/RubyVM/AST.html

[3] Javascript(since ECMAScript6): https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Reflect

[4] Typescript: https://basarat.gitbook.io/typescript/overview

[4] Go's AST : https://pkg.go.dev/go/ast

[5] Elixir's AST: https://github.com/elixir-lang/elixir/blob/d8f1a5d6b653c14ae44c6eacdbc8e9df7006d284/lib/elixir/pages/syntax-reference.md#the-elixir-ast

[6] The one true (useful) object to rule them all: https://ruby-doc.org/3.2.1/Object.html

[7] Ruby Extensions: https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Basic+Knowledge

[8] Awesome example of the hook pattern into ruby's object lifecyle: https://github.com/rspec/rspec-core/blob/main/lib/rspec/core/hooks.rb

[9] RSpec public DSL module: https://github.com/rspec/rspec-core/blob/main/lib/rspec/core/dsl.rb

[10] doPrint: https://cs.opensource.google/go/go/+/refs/tags/go1.20:src/fmt/print.go;drc=261fe25c83a94fc3defe064baed3944cd3d16959;l=1204

[11] Just in Time compilation: https://en.wikipedia.org/wiki/Just-in-time_compilation

Top comments (0)