DEV Community

Cover image for Tail Call Optimization by Base Jumping off the Stack
Justin Ethier
Justin Ethier

Posted on

Tail Call Optimization by Base Jumping off the Stack

Tail calls are the only means of iteration in most functional languages. So implementation support is absolutely critical.

For example, Scheme requires tail call optimization as part of its language specification (PDF):

3.5 Proper tail recursion

Implementations of Scheme are required to be properly tail-recursive. A Scheme implementation is properly tail-recursive if it supports an unbounded number of active tail calls. A call is active if the called procedure might still return.

Over the years various solutions have been developed to meet this requirement such as trampolines, goto's, compiler optimizations, etc. One of the most unusual was proposed by Henry Baker in his paper CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A. (PDF).

Cheney on the M.T.A.

Cheney on the M.T.A. requires writing a compiler and runtime. Code is converted into continuation passing style and compiled into a series of C functions. When the program runs these functions are called like normal C functions but they never return. The stack keeps growing and growing...

The program periodically checks to see if a stack size limit is reached. When that happens a Cheney copying garbage collector is used to copy live data from the stack to the heap. Once all the live objects on the stack are copied the program calls longjmp to go back to the bottom of the stack, calls into the next function, and keep going.

One way to think of this is that instead of using a trampoline we are going to BASE jump off the stack every so often:

Image description

On the one hand this is kind of insane 😮. Programs are not supposed to work this way!

Then again, recursive calls are guaranteed to never cause a stack overflow because everything is periodically cleared from the stack. Thus Cheney on the M.T.A guarantees tail call optimization.

In fact this approach provides all the "hard" stuff that a functional programming language runtime needs - tail call optimization, generational garbage collection, and continuations.

Implementation

Here are code examples from Cyclone Scheme demonstrating how Cheney on the M.T.A works in practice.

Cyclone's runtime uses the following function to set up a new trampoline. The bottom of the stack is explicitly marked using setjmp. After a garbage collection code will resume execution on the next line after setjmp. At that point the code finds the next function - gc->cont, our continuation - and calls into it:

void Cyc_start_trampoline(gc_thread_data * thd)
{
  // Tank, load the jump program
  setjmp(*(thd->jmp_start));

  if (obj_is_not_closure(thd->gc_cont)) {
    Cyc_apply_from_buf(thd, thd->gc_num_args, thd->gc_cont, thd->gc_args);
  } else {
    closure clo = thd->gc_cont;
   (clo->fn)(thd, clo, thd->gc_num_args, thd->gc_args);
  }

  fprintf(stderr, "Internal error: should never have reached this line\n");
  exit(1);
}
Enter fullscreen mode Exit fullscreen mode

The snippet below demonstrates how C functions may be written using Baker's approach. All compiler-generated and primitive functions that call into other functions must be written in this style:

object Cyc_make_vector(object cont, object len, object fill) {
  object v = NULL;
  int i;
  Cyc_check_int(len);

  // Memory for vector can be allocated directly on the stack
  v = alloca(sizeof(vector_type));

  // Populate vector object
  ((vector)v)->tag = vector_tag;
  ... 

  // Check if GC is needed, then call into
  // continuation with the new vector
  return_closcall1(cont, v);
}
Enter fullscreen mode Exit fullscreen mode

A macro return_closcall1 is used to check if the stack limit has been reached - if so a minor collection is triggered via GC(). Otherwise another macro closcall1 is used to call into the next function:

#define return_closcall1(td, clo, a1) {
 char top;
 object buf[1]; buf[0] = a1;
 if (stack_overflow(&top, (((gc_thread_data *)data)->stack_limit))) {
     GC(td, clo, buf, 1);
     return; // Never reached
 } else {
     closcall1(td, (closure) (clo), buf);                                      
     return; // Never reached
 }
}
Enter fullscreen mode Exit fullscreen mode

And here is GC, the wrapper for Cyclone's minor garbage collector. The code retrieves locations of the top/bottom of the stack, calls into gc_minor to collect objects using Cheney's algorithm, and the calls longjmp to base jump back to the start of the trampoline:

void GC(void *data, closure cont, object * args, int num_args)
{
  char tmp;
  object low_limit = &tmp;      // This is one end of the stack...
  object high_limit = ((gc_thread_data *) data)->stack_start;

  int alloci = gc_minor(data, low_limit, high_limit, cont, args, num_args);
  ...

  // Let it all go, Neo...
  longjmp(*(((gc_thread_data *) data)->jmp_start), 1);
}
Enter fullscreen mode Exit fullscreen mode

There is enough going on it gc_minor that it is outside the scope of this article. But the code is available on GitHub if you are curious.

Wrapping Up

That's all for now. Thanks for reading! 🎉

Top comments (0)