Renato de Pontes Pereira

Posted on Jul 15, 2023 • Originally published at r2p.dev

Designing a SHT Language

#programming #language #functional #python

Introduction

I am a programmer by trade and by hobby, and as such, most of my time is spent in the front of the computer, immersed in code. On a daily basis, I have to work with numerous tools, technologies, techniques and processes, which often demand a lot of studying and catching up.

However, I never gave too much attention to the main tool of every programmer - the language itself. I mean, I read about theory now and then, and I surely have some strong opinions about them, but I never learned the why's and how's of their internals.

This is becoming crucial to me because I want to be more effective in work, but with the rise of the modern compiled programming languages, such as Go and Rust, I began feeling like a relic in the field.

So, how does one gain deeper knowledge in programming language theory? Reading documentations? Technical blogs? Asking ChatGPT? Nope, not for me. The best way to learn is by designing my own language, hit myself in the face with all the problems people have been solving in the past few decades, regret my decisions, and then reading documentations, technical blogs and appealing to chat bots.

Thats why I am working on my own language: to learn in first hand all the difficulties we got when designing and implementing a language.

So, SHT is a dynamic procedural scripting language that I’m designing to be concise, expressive, intuitive and intentionally direct. Just like every other person that design their own language want.

Should you use this SHT language? If want a well-designed, well-maintained, well-tested, well-documented, fast and safe tool, my answer is NO - if the name isn't clear enough. But if you are a big nerd and want to have some fun learning some tool that will never be useful to you, give it a try. Also feel free to contact me to discuss design and implementation choices.

Through remainder of this article I will present some high-level decisions and overall guidelines and then give an overview of some key features of the language. I'm not intending to go deep into technical details neither bringing too much of design discussion, which I hope to address in future articles.

General Decisions and Directions

Ok, so I want to create my own language, now what? The possibilities are infinite, specially for someone clueless like me.

To reduce the number of possible choices, I decide to create a language that would be simple syntactically, inspired by Go and Python, that is expressive inspired by functional languages, in particular Haskell and F#, and with some few ideas I would like to try it out, mainly inspired by Rust.

Given this general direction, the first thing that comes to my mind is Iterators. I love how iterators bring clarity - abstracting the internal mess of their data structure and providing a simple thing: a list of elements. Since this is the main intersection point of my references, I decided to give them the protagonism of the language.

Since this is my first language (it's actually the second, but the first turn out to be a real mess too quickly), I judged that would be easier to have a minimal type control and only during runtime. I don't know yet if that is entirely true, but I hope it alleviate the analysis I would have to perform in strong typed language.

Similarly, I decided to create a dynamic language because I believe it would be easier to execute the statements than translating them into byte codes. I am also avoiding transpilation to intermediate representation or other languages because I want to learn the difficulties that is the evaluation step.

Notice that I am not concerned about speed, I am using a tree walk approach for its simplicity without any kind of optimization, field of study that is not of my interest. To reduce complexity and scope, I won't think about asynchronous execution either.

The syntax family is closer to Go than Python, because I believe there are some inherent problems with identation-based scopes that I wouldn't like to address, but Python syntax would fit in the language as well. Additionally, I would like to add some functional touches that I believe are elegant syntactically

At last, I will follow a procedural paradigm because object orientations is crap ;)

Features

Types and Variables

The primitive types are simple and generic, there are numbers, strings, functions, errors etc. Variables can be defined using Go’s expression, and can be reassigned to any other value.

# comments are like python
# declarations are like go
number := 1 + 1.5 - 5e-1
string := 'Hello, World'
boolean := true or false
tuples := 1, 2, 3
lists := List { 1, 2, 3 }
hashs := Dict { a:1, b:2 }

# variables can be reassigned
number = 'well, not a number'
a,b := 1,2
lists[0], lists[1] = lists[1], lists[0]

I thought about blocking assignment to other types, but that would add an annoying constraint, specially because there is no null value in the language and some situations would require a lot of additional work, for example, how to handle empty blocks and functions? My solution is quite ugly, but is simple: every situation that would result in null or undefined, like empty blocks, the default value will be false.

I removed null purposefully because I want to see the impact of not having the major offender of runtime errors in any language in history. This will impact the error handling and custom data types, as described later, but I doubt it will make the language safer since the types are dynamics.

Functions are naturally first class and there are two types of functions: scoped and no scoped (arrow functions), similar to javascript:

fn add(a, b) {
  return a + b
}

fn scaler(x) {
  return (a, b) => add(a, b)*x
}

scaler(2)(3, 1) # 8

The returns are optional. Functions also accepts spread arguments that are converted to lists automatically, and default arguments.

fn headTail(h=false, ...b, t=false) {
  return h, t
}

headTail(1, 2, 3, 4) # (1, 4)

Operators are simple:

# Arithmetic
1 + 2
2 - 3
4 * 5
6 / 7
8 // 9 # int division
10**2
3 % 4

# Boolean
a or b
b and c
!d

# Concat (forcing string convertion)
a .. b .. c

# Meta function dependent
a in b
a is b

I planned - and implemented - xor, nand, nor and nxor too, then I realized that and and or operator usually have an special property: they only evaluate the next value if that value will change the expression result (e.g. false and f() won't evaluate f() because the first value is false), and they may not return booleans, but the elements of expression (for example, 1 or 3 would return 1).

Error Handling

This is a topic that annoys me a lot in most languages. I believe the “try catch” constructs are archaic and only make code harder to read. Go does a better job using the tuple (response, err) , but the amount of ifs err == nil is ridiculous, the same annoyment for functions that return an ok value for conditions.

I would like to test a mechanism that I'm calling “wrapping” and “unwrapping”:

file := fs.open('invalid')?
if file! as err return err

Every time you need to check for errors, you can wrap the expression by adding a “?” symbol in the end. The result - error or value - will be wrapped into a Maybe object, which can only be retrieved by unwrapping it.

The unwrapping is performed by the postfix symbol “!”, which, returns the error or false when used, and converts the maybe object to its real content - the error or the value.

This is the only way to keep errors from propagating, and together with other syntactically fancy constructs, I believe the code will be much cleaner to read.

fn readConfig(filename) {
  file := fs.open(filename)?
  if file! return false

  data := json.load(file)?
  if data! return false

  res := validate(data)?
  if res! return false

  return data
}

I thought that I could force some functions return the Maybe object by default, for example this fs.open would be defined as fn open(name)? { ... }, but then it would hide this behavior from the user.

I would love to see how this would work out in a typed language that could validate this expression in compiler time, but I believe statical analysis in the dynamic language could do the job too.

Iterators

Let's talk about the star of the language: Iterators.

Iterator is an interface to access and traverse the elements of a collection of objects, which is commonly associated to lists and strings. In SHT, you can iterate over most primitive types such as numbers and booleans, which will generate a single value: themselves.

SHT iterators have a next function that could be called manually and will return an Iteration type, containing the value of the iterations or the flags for error or done. However, this behavior is transparent most of the time. The easiest way to create iterators is by declaring generator functions:

fn oneTwoThree {
  yield 1
  yield 2
  yield 3
}

iter := oneTwoThree()
iter.next() # Iteration<(1,)>
iter.next() # Iteration<(2,)>
iter.next() # Iteration<(3,)>
iter.next() # Iteration<Done>

Functions with yield statements returns an iterator by default, without executing its body at first, then, for each next call, the body will run until it finds a yield or the end of the function. Notice that by this behavior, iterators are processed element-wise and are lazy by default.

The iterator is scoped to the function call alone, so multiple call to a generator will result in multiple iterators that are independent.

Iterators can be “spreaded out” by using "...", resulting, by default, in a tuple:

a := 'sht'
a... # ('s', 'h', 't')

Notice that the spread out operator forces the iteration of the target object.

Now the fun part: iterators can be piped into pipe expressions:

data LinkedNode {
  val = 0
  prev = false
  next = false
}

head, tail := range(100)
| filter num: num % 2 == 0
| map num: LinkedNode { value = num }
| window(2) prv, nxt: { prv.next = nxt; nxt.prev = prv; }
| firstLast head, tail: head[0], tail[1]

Let's break it down:

The first line is a function that returns an iterator, which generates number from 0 to 100.
The second line starts with a pipe, forcing the previous expression to return an iterator. Internally it means that the eval will call a meta function “iter” on the previous object. Since the previous object is already an iterator, it will simply return itself.
The range iterator is then passed as the first argument to the filter function.
Following the filter function is a new function declaration in the form param[, param]*: <expression>, notice that the function declaration itself is optional.
The filter itself returns an iterator that will be used by the next pipe.
The window function receives additional parameter, internally it would be equivalent to window(prev_iterator, 2, (prv, nxt) => { ... }).
By the end there is a firstLast function, that will get the first element of the previous iterator and the last, forcing the previous iterator to complete. Since the elements are a tuple (prv, nxt) generated by the window function, we return the first element of head and last element of tail to get the first and last node, respectively.

My idea is that, by using iterators and pipe expressions, the user is forced to show intentions clearly and directly, avoiding the per-application level of abstractions - thus an indirection of intent - often caused by design patterns.

That's the reason I decided to use a clean syntax for pipes, without inserting too many symbols.

Custom Data Types

I've experimented a lot of options beforehand to see what would fit in the language and, to be honest, nothing stepped up. I believe the most interesting ideas in recent languages is coming from static languages but nothing really apply to dynamic languages properly. So I went to a somewhat comfortable position.

SHT custom data types simply defines abstract data types (ADTs), i.e., a bunch of variables and operations over these variables. To define a new data type you can use:

data MyADT {
  property = 'default value'

  fn say(this) {
    print(this.property)
  }
}

This is a definition of properties and functions, so you cannot execute statements inside a data definition. The definition of properties should come with a default value. Since we don’t have null, I judged it would be important to have an explicit default value for them.

Also notice that functions that will work over instances should contain a special keyword this as the first parameter, which tells the interpreter that this function needs a reference of the object, and can only be called on instances of that type. Naturally, functions that does not contains “this” as first parameter is considered a static function and can only be called in the type itself.

Initialization

The initialization of the instance is a bit fancy:

data Vector {
  x = 0
  y = 0
  id = randomName()

  on new(this, x=0, y=0) {
    this.x = x
    this.y = y
  }
}

v1 := Vector(3, 2) {
    id: "My Vector"
}
v2 := Vector(3, 2)
v3 := Vector { id: 'Other vector' }

A data type can be initialized by using parenthesis and/or braces. They have a slightly different behavior. Whe SHT initializes the Vector object, the first thing it does its to check the values in braces, if they are available. This forces the data to use these values without evaluation the default values provided in the property definition, thus, randomName() function will never be executed in v1 and v3. However, if the property is not declared inside the braces, the default expression will be used. Parenthesis call the new meta function after the instance have be created and initialized. So you can add other initialization logic inside, such as database connections.

Meta programming

Meta programming is similar to python, you just declare a special function inside the data definition, and the interpreter handles everything else. Instead of using special names, I decided to change the function keyword, so, to declare a meta function, you must use on instead of fn. This will help checking if there is any invalid meta function declaration.

A data type can implement the meta functions described below, following the expression they are hooked:

call: v1()
set: v1.x = 3
get: v1.x
new: Vector()
index: v1[0]
iter: v1...
pipe: iterator | v1
bool: if v1 { ... }
string: 'Vector is:' .. v1
repr: — used in the REPL
eq: v1 == v2
neq: v1 != v2
add: v1 + v2
… all other arithmetic operators

Most of this can be implemented in the Type itself (just omit the this parameter), so they give us opportunities to be very flexible in how the type system works. However, these are specially useful when you think about Iterators and how they can interact inside pipe expressions. An additional usage for these meta functions is to create proxies, mocks, and stubs.

Reuse

This topic is my least concern about data types, so I’m not giving to much thoughts about it. The general idea is copying another data type definition inside its own, without child-parent relation of OO languages.

data Vector3 like Vector2 {
    z = 0
}

Control Flows

Nothing really new here, but I believe it worth noting some points that I’m adding to keep language less verbose:

if <cond> return <exp>
if <cond> raise <exp>
if <cond> yield <exp>
if <cond> {
  <exp>
}

match <value> {
  <exp>: <exp>
}

for { <exp> }
for <cond> { <exp> }

pipe <iterator> as <var> { <exp> }

Notice that pipe loops is the iterator version of for’s, similar to how Python’s for works. I decided to create a new keyword because I couldn't find a good way to avoid ambiguity in for's, for example, for var in iterator {} wouldn't work because in is already a meta operator keyword and would fit the for <cond> {} statement.

Examples

FizzBuzz

for i in range(100) {
  match (i%3, i%5) {
    (0, 0): print('FizzBuzz')
    (0, _): print('Fizz')
    (_, 0): print('Bizz')
    (_, _): print(i)
  }
}

Fibonacci

# Recursive
fn fib(n) {
  if n < 2 return n
  return fib(n-1) + fib(n-2)
}

# Generator
fn fib(n) {
  i := 0
  i, a, b := 0, 0, 1
  yield a
  yield b
  for i < n {
    a, b = b, a+b
    yield (i, b)
  }
}

# or use the default generator
math.fibonacci()
| takeWhile x: x < 10000

Sum of multiple of 3 and 5

v := range(3, 1000)
| filter n: n%3 == 0 or n%5 == 0 
| sum

print(v)

Conclusion?

This SHT language is still under development, after finishing the basic features and builtin modules, I would like the use a bit in my daily automation tasks to have a better feel of its state. I also will try to solve coding challenges, such as Euler Challenges, to check it out too.

If you are interested, check the repository in github. Feel free to contact me for any question or discussion :)

DEV Community