Brian Neville-O'Neill

Posted on Jul 8, 2019 • Originally published at blog.logrocket.com on Jan 2, 2019

Using TypeScript transforms to enrich runtime code

#ast #typescript #rtti #type

When I started developing no one was talking about “compiling” their JavaScript sources. Everyone just wrote a couple of scripts and referenced them directly.

Over time, minification and further enhancements became the de-facto standard in web development, so doing some processing after having written the code became a fairly normal practice. These days, everyone accepts that front-end web development requires some processing, mostly using a bundler that helps us with taming the web beast.

As far as JavaScript “processors” are concerned there are two major streams. We either use TypeScript or Babel with some plugins. Some people may actually use TypeScript and Babel (or these days exclusively Babel for their TypeScript code). However, usually, you only see one or the other depending on if the project is using TypeScript or something else that can be transpiled using Babel. Of course, there are other languages and tools available but these two are the dominant ones.

Babel is a success story. It started with a very simple mission; opening a JavaScript parser for extensibility. Over time, the ecosystem grew and became the standard tool for providing compatible output to our target browsers without any restriction on what features to use during development.

Great! Even better, it also fostered some metaprogramming and syntactic sugar on top of standard JavaScript, such as JSX, which enabled writing elegant frontend code in React or similar UI libraries and frameworks.

So far it seems that this greatness is exclusive to Babel, however, this was changed by the introduction of TypeScript transforms. Now, TypeScript can be enhanced externally. In this post, we’ll see why (and when) extending TypeScript makes sense and how to do it.

Motivation

Even though TypeScript took a different approach than Babel it also became quite the success story. Initially, many people doubted the aspirations of Microsoft in the JavaScript world. However, due to its open source nature and its problem-solving approach (“starts with JavaScript and ends with JavaScript”) people started transitioning pretty quickly for many projects.

Personally, I’ve been using TypeScript for any JavaScript related project, however, I can understand why many people would only use it for larger projects.

The opponents of TypeScript usually give us the following two arguments against using it:

The guarantees that are made by TypeScript are superficial at best — they can be easily circumvented and are too limited to work with
All of the work you put into creating your typings is lost at compile time

While the former is a side effect of TypeScripts very strong (100%) compatibility with “ordinary” JavaScript (i.e., here a trade-off has been picked and the trade-off has been very well chosen), the latter is definitely valid.

Currently, there is no way to bring in some metadata (type system) information from compile time to runtime. There are many reasons why we would be interested in such a runtime type information ( RTTI ) mechanism despite the dynamic nature of JavaScript. Primarily, it would allow us to check types from foreign code at runtime (e.g., incoming data in APIs of service endpoints, the JSON response from a request).

Complaining about TypeScript’s lack of RTTI won’t help us much. Instead, we’ll try to use TypeScript transforms for providing RTTI on demand. Ideally, code can then look something like this:

import { generateRtti } from 'typescript-rtti';

interface MyType {
  id: string;
}

generateRtti<MyType>();

where the additional information can then be retrieved like this:

import { obtainRtti } from 'typescript-rtti';

// get type information as JSON schema
const schema = obtainRtti('MyType').schema;

// get validation results for an instance
const errors = obtainRtti('MyType').validate({ id: 7 });

Just to be clear — in this post, we will only focus on the first part, the second part (exposing the RTTI) will not be discussed.

So let’s try to approach a solution for generating the RTTI by introducing TypeScript transforms.

TypeScript transforms

A TypeScript transform is just a simple function that is defined as follows:

import { TransformationContext, Transformer, SourceFile } from 'typescript'

interface TypeScriptTransform {
  (ctx: TransformationContext): Transformer<SourceFile>;
}

Essentially, what we get is some context information to return a function that could be used for any kind of transformation on a source file. Here is an example of such a function:

function sampleTransformer(ctx: TransformationContext) {
  return (sf: SourceFile) => visitNode(sf, visitor(ctx, sf, opts))
}

Before we move on, we need to look at what we transform. Since TypeScript is a compiler (or transpiler) we are dealing with data structures handled by a compiler. A compiler pipeline on the high-level consists of a frontend (making sense of source code and transforming it to the core data structures) and a backend (applying transformations such as optimizations to the data structures and serializing them in an output format).

To reach the core data structures (usually available in the form of a tree, called the abstract syntax tree or AST ), some transformations have already been applied:

Transform a linear stream of characters into a linear stream of well-defined tokens
Transform the linear stream of tokens into a tree structure obeying the language’s grammar

This is also illustrated here:

Data transformations (pipeline) for parsing source code

Using TypeScript transforms we can only have a backend transformer allowing us to perform modifications on the existing AST.

AST basics

The AST is a tree that consists of nodes representing groups of syntax information and edges representing their relationships. For instance, a trivial example code like this:

const a = 3 + 4;
console.log(a);

transforms into an AST like this:

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "kind": "const",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "a"
          },
          "init": {
            "type": "BinaryExpression",
            "operator": "+",
            "left": {
              "type": "Literal",
              "value": 3,
            },
            "right": {
              "type": "Literal",
              "value": 4,
            }
          }
        }
      ]
    },
    {
      "type": "ExpressionStatement",
      "expression": {
        "type": "CallExpression",
        "callee": {
          "type": "MemberExpression",
          "computed": false,
          "object": {
            "type": "Identifier",
            "name": "console"
          },
          "property": {
            "type": "Identifier",
            "name": "log"
          }
        },
        "arguments": [
          {
            "type": "Identifier",
            "name": "a"
          }
        ]
      }
    }
  ]
}

Here we’ve chosen the so-called ES Tree specification for representing the output in JSON form. There are two things, which are common among all AST implementations:

Since there are many different nodes we need a type discriminator to distinguish between the different types of nodes. Usually, the related property is called type
The root node is the program itself, which has a body consisting of all statements. A statement is composed out of expressions. Expressions do compute things and can be composed, statements are instruction only and do not allow composition (only aggregation, e.g., in a program, a block statement, or function body)

The most effective way to work with an AST is to go back to the classic software design patterns and use the visitor pattern. This pattern allows us to traverse a tree without having to implement/know all of the information in the tree.

Consider the example tree above. In order to traverse it without the visitor pattern, we would need to know that:

A binary operation has two child nodes, one sitting in left and one in right
A variable declarator has two child nodes, one in id and one in init, the latter being optional
A variable declaration has an array of declarations inside exposed by the declarations property
There is an expression statement (consisting of only an expression without any other instruction), which contains the expression in its expression property
A call expression consists of two properties callee (expression to define the called value) and arguments (array of expressions to compute the arguments)
A member expression consists of two expressions contained in object and property

This is just a simple example, but it already highlights that a lot of information would be required to traverse it. Generalizing this would require a huge effort which is mitigated by using the visitor pattern.

For instance, the following code would be all that is necessary to change every identifier related to a into b:

tree.visit({
  Identifier(node) {
    if (node.name === 'a') {
      node.name = 'b';
    }
  },
})

This example code illustrates that having a visitor function being applied on the tree will reduce the code on our side to only the part that we are interested in. Instead of doing all of the foundational work of traversing the tree we can focus on what should happen to identifier nodes only.

Real-world implementations may look different. For instance, the one exposed in TypeScript is used as follows:

import * as ts from 'typescript';

function visit(ctx: ts.TransformationContext, sf: ts.SourceFile) {
  const visitor: ts.Visitor = (node: ts.Node): ts.VisitResult => {
    // here we can check each node and potentially return 
    // new nodes if we want to leave the node as is, and 
    // continue searching through child nodes:
    return ts.visitEachChild(node, visitor, ctx);
  };
  return visitor;
}

export default function() {
  return (ctx: ts.TransformationContext): ts.Transformer => {
    return (sf: ts.SourceFile) => ts.visitNode(sf, visit(ctx, sf))
  }
}

For performance reasons, a TypeScript visitor requires helper methods to trigger the traversal.

A simple transformer

The first step for us is to create the transformer that is being exported. Since we work with types it makes sense to obtain a type checker instance from the evaluated source code. The code looks something like this:

import * as ts from 'typescript';

export interface TransformerOptions {}

export function transformer(program: ts.Program, opts?: TransformerOptions) {
  function visitor(ctx: ts.TransformationContext, sf: ts.SourceFile) {
    const typeChecker = program.getTypeChecker();

    const visitor: ts.Visitor = (node: ts.Node) => {
      // Implementation here
      return ts.visitEachChild(node, visitor, ctx);
    };

    return visitor;
  }

  return (ctx: ts.TransformationContext) => {
    return (sf: ts.SourceFile) => ts.visitNode(sf, visitor(ctx, sf));
  };
}

So we create a simple function that takes the whole program as transformation input. Furthermore, we accept some options (which we do not use at this point, but it's nice to know the location where a user-defined configuration may be passed in). Finally, we return the previously mentioned function accepting the transformation context and returning the actual source file transformer.

The implementation of the source file transformer uses the visitNode function given from TypeScript as well as our own implementation. The very specific visitor that is applied to each node still needs to be implemented. Looking at our sample code, we are interested in calls to a generateRtti function.

The following code represents a sound implementation for what we want:

if (ts.isCallExpression(node) && node.typeArguments && node.expression.getText(sf) == 'generateRtti') {
  const [type] = node.typeArguments;
  const [argument] = node.arguments;
  const fn = ts.createIdentifier('__rtti__generate');
  const typeName = type.getText();
  const typeSource = getDescriptor(type, typeChecker);
  return ts.createCall(fn, undefined, [argument || ts.createStringLiteral(typeName), typeSource]);
}

If we see a node that is a call expression to generateRtti with some type arguments we transform the node to become:

__rtti__generate(name_of_the_type, descriptor_of_the_type);

At this point two questions should arise:

What does the implementation of the getDescriptor look like?
Where does the __rtti__generate (or whatever name we give this function) come from?

Let’s try to answer these questions one by one. First, the implementation of getDescriptor needs to work against all the types we expect. Most importantly, it needs to deal with type references and interfaces. In our example above we got a type reference, which is bound to an interface.

A simple implementation may look something like this:

function getDescriptor(type: ts.Node, typeChecker: ts.TypeChecker): ts.Expression {
  switch (type.kind) {
    case ts.SyntaxKind.PropertySignature:
      return getDescriptor((type as ts.PropertySignature).type, typeChecker);
    case ts.SyntaxKind.TypeLiteral:
    case ts.SyntaxKind.InterfaceDeclaration:
      return ts.createObjectLiteral(
        (type as ts.InterfaceDeclaration).members.map(
          (m): ts.ObjectLiteralElementLike => ts.createPropertyAssignment(m.name || '', getDescriptor(m, typeChecker)),
        ),
      );
    case ts.SyntaxKind.TypeReference:
      const symbol = typeChecker.getSymbolAtLocation((type as ts.TypeReferenceNode).typeName);
      const declaration = ((symbol && symbol.declarations) || [])[0];
      return getDescriptor(declaration, typeChecker);
    case ts.SyntaxKind.NumberKeyword:
    case ts.SyntaxKind.BooleanKeyword:
    case ts.SyntaxKind.AnyKeyword:
    case ts.SyntaxKind.StringKeyword:
      return ts.createLiteral('string');
    case ts.SyntaxKind.ArrayType:
    default:
      throw new Error('Unknown type ' + ts.SyntaxKind[type.kind]);
  }
}

For simplicity, we omitted many cases (e.g., yielding boolean) and only kept what is really necessary to make our example work. We see that primitives are output directly (e.g., a 'string' literal is given when we see a string type being used), while combined types such as an interface declaration are recursively using the getDescriptor function.

As far as the second question is concerned — there are multiple ways. We could either:

define the function in our library and add a reference to it within the import statements, or
define the function within the new module (if the function has been used/referenced)

The first one may be simple to change or play around with, however, it comes with some great problems. For once our transform is after the binding phase, i.e., we would need to take care of any module binding ourselves. Usually, this is not a big deal if we know the target (especially for ES6 modules), however, supporting all possible module systems this may be rather cumbersome. Furthermore, this will leave the runtime dependency in there, which may still bring our transformer code to the bundle (depending on the power of the used tree shaker).

Therefore, we should opt-in for the second option, which is also the one used by TypeScript. As we know TypeScript likes to generate code for introduced functions as well. We can do the same. A simple modification to our visitor function allows us to do some changes to the source file after we applied all transformations:

return (ctx: ts.TransformationContext) => {
  return (sf: ts.SourceFile) => {
    const result = { seen: false };
    const newSf = ts.visitNode(sf, visitor(ctx, sf, result));

    if (result.seen) {
      const fn = createGenerateFunction();
      return ts.updateSourceFileNode(newSf, [fn, ...newSf.statements]);
    }

    return newSf;
  };
};

The value of the seen property should be changed within our visitor if we need to generate the function later. The function itself will (in this example) be just placed on the top of the file.

Due to hoisting, we could also place it at the bottom without any negative impact. The function declaration itself is pretty unspectacular — it is only lengthy and essentially compiles to the following JavaScript code:

function __rtti__generate(typeName, typeDefinition) {
  var ctx = typeof window === "undefined" ? global : window;
  var types = ctx.__rtti__types || (ctx.__rtti__types = {});
  types[typeName] = typeDefinition;
}

More details can be seen in the sample repository on GitHub.

What our simple transformer does not cover at the moment are cyclic references. So an interface that self-references(potentially indirectly via some other referenced interface) will definitely not be caught and leads to the traditional stack overflow. Circumventing this would be possible by putting the contained interfaces at top-level as well.

Using the transformer

Right now the usage of transformers in TypeScript is unfortunately limited. There are two issues:

The tsc application does not consume/consider the transformers, e.g., via tsconfig.json
The emit pipeline for declaration files (d.ts) is independent of the transformer

Both issues are on the radar of the TypeScript team, however, at the moment do not have the highest priority. While issue #2 is not easy to tackle efficiently, #1 can be circumvented in multiple ways.

The most popular way is to provide an additional runner script. This can also be generalized with a small wrapper script being the result. Now instead of calling tsc we call ttsc. Hopefully, in the future, TypeScript supports the transforms/plugins directly letting us drop the extra dependency.

If we are mainly interested in using our transforms within a bundler such as webpack then there is some good news. Standard loaders such as ts-loader or the awesome-typescript-loader support transformers out of the box - no need for ttsc. Nevertheless, since these loaders can also work with "custom versions of TypeScript" we could also use them in conjunction with ttsc, giving us the freedom to either bundle or directly transpile.

Conclusion

Now we’ve seen how TypeScript transforms can be used and what they could bring to the table. In our example, we explored the ability to annotate the code with additional function calls to declare for which types we want to generate additional information. We stopped at the generation without looking at the creation of some API to access the contained RTTI.

The possibilities that come from such plugins are rather limitless. We’ve seen many creative and smart folks coming up with great enhancements already in the Babel ecosystem (and also in TypeScript). While using type system information may be one way, another could be a simple extension using some introspection on the used code.

As an example, there exists a plugin(typescript-plugin-inner-jsx) to put the components that have been used within a component on the component, enabling an easier (implicit) way of styling the component. Many other great TypeScript transformers exist, even though the TypeScript ecosystem will not be dependent on these like the Babel eco-system is.

What ideas do you have for TypeScript transformers? Do you think these are useful? What is currently missing that would make them much more useful? Let us know in the comments!

Plug: LogRocket, a DVR for web apps

LogRocket is a frontend logging tool that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single page apps.

Try it for free.

The post Using TypeScript transforms to enrich runtime code appeared first on LogRocket Blog.