DEV Community

Mark Lennox
Mark Lennox

Posted on

Abstract Syntax Trees for fun and profit

Part One - an overview

This article is a cross-post from my blog

This is part one of a series of articles about abstract syntax trees and their use in javascript. The scope of this article is a quick introduction to ASTs, babel plugins and some simple 'toy' examples.

I will present the information and instruction using description, code/json and a diagram wherever possible in a bid to make the subject matter easier to comprehend for a wider range of types of learners.

Scope of this article

This is a very light introduction to abstract syntax trees and the transformation of very simple code. Future articles will deal with real-world code, investigate static analysis, and look at approaches to creating babel plugins that could be useful in your own codebase, also - codemods.

Code

A github repo accompanies this article https://github.com/mlennox/abstractsyntaxforfunandprofit

what are abstract syntax trees

These useful data structures represent the abstract structure of source code regardless of the language. This is possible because despite the syntactic differences, all languages have a very large overlap in terms of the code structure they express: variable assignment, conditions, logic branching etc.

An abstract syntax tree can be used to facilitate static analysis of code, rewriting code, compiling code (transform from one language to another), or very commonly in web development - transpiling code (transform from one language to another with similar level of abstraction ie. typescript to javascript, or es6+ to es5).

In this article I'll show some examples of simple source code presented as abstract syntax trees, and also give a working example (see the repo) by building simple babel plugins to transform basic code

Purity of abstraction

The AST examples I'll be showing are not pure abstractions as they contain metadata relating to the source code and the elements are named to reflect javascript syntax. In all respects they are abstract syntax trees and closely follow the EStree spec

The JSON representations in this article were generated by the AST explorer listed in the useful resources section below.

Useful references

Javascript AST viewer - https://astexplorer.net/

Developer docs for babel plugin development - babel plugin handbook

Babel type reference - https://babeljs.io/docs/en/next/babel-types.html

AST examples

I'll provide some examples here to help visualise the resulting structure when code is parsed into an abstract syntax tree.

The first will change instances of var to const and the second will transform an array into an object.

Simple variable assignment

If we take the simple javascript code snippet below and process it with an AST parser.

const willIt = true;

The resulting AST can be expressed in a number of ways, most usefully as JSON . The snippet of code above transformed to an AST is represented by the following JSON.

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 20,
      "loc": {
        "start": {
          "line": 1,
          "column": 0
        },
        "end": {
          "line": 1,
          "column": 20
        }
      },
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 19,
          "loc": {
            "start": {
              "line": 1,
              "column": 6
            },
            "end": {
              "line": 1,
              "column": 19
            }
          },
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 12,
            "loc": {
              "start": {
                "line": 1,
                "column": 6
              },
              "end": {
                "line": 1,
                "column": 12
              },
              "identifierName": "willIt"
            },
            "name": "willIt"
          },
          "init": {
            "type": "BooleanLiteral",
            "start": 15,
            "end": 19,
            "loc": {
              "start": {
                "line": 1,
                "column": 15
              },
              "end": {
                "line": 1,
                "column": 19
              }
            },
            "value": true
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "module"
}

The JSON is composed of a series of nodes each with a type property. The JSON below strips all but the type properties from the JSON above.

{
  "type": "Program"
  "body": {
    "type": "VariableDeclaration"
    "declarations": [
      {
        "type": "VariableDeclarator",
        "id": {
          "type": "Identifier"
        },
        "init": {
          "type": "BooleanLiteral"
        }
      },
    ]
  }
}

You'll also notice each node contains location data that refers to the position of the associated expression in the source code.

{
  "type": "VariableDeclaration",
  "start": 0,
  "end": 20,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
      "line": 1,
      "column": 20
    }
  },
}

Also note, and this is the key point of interest for us, the variable declaration signifies the kind of variable - in this case a const.

{
  "type": "VariableDeclaration",
  "kind": "const"
}

The graphical representation of the hierarchical nature of the tree is much easier to grasp.

Assigning an object

let favouriteBelt = {
  material: "leather",
  length: 40
};

The JSON in this case is much more complex. I've omitted some of the properties for clarity.

{
  "type": "VariableDeclaration",
  "declarations": [
    {
      "type": "VariableDeclarator",
      "id": {
        "type": "Identifier",
        "name": "favouriteBelt"
      },
      "init": {
        "type": "ObjectExpression",
        "properties": [
          {
            "type": "ObjectProperty",
            "key": {
              "type": "Identifier",
              "name": "material"
            },
            "value": {
              "type": "StringLiteral",
              "value": "leather"
            }
          },
          {
            "type": "ObjectProperty",
            "key": {
              "type": "Identifier",
              "name": "length"
            },
            "value": {
              "type": "NumericLiteral",
              "value": 40
            }
          }
        ]
      }
    }
  ],
  "kind": "let"
}

And the graphical representation.

You can see that the hierarchy breaks down into familiar arrangements of nodes despite the relative increase in complexity compared to the simple value assignment.

Transforming code

Hopefully, now you have some idea of what an AST looks like and how it relates to the source code. Next, I'll show how you can transform the source code using the AST. The familiar babel library provides all the tools necessary to parse, transform and re-generate source code so for simplicity the examples provided will be babel plugins.

One caveat, due to the way babel references plugins, these particular plugins can't easily be integrated into your codebase - ideally you would need a publish npm package. The alternative would be to write scripts to move the 'plugin' to a location accessible to babel.

Babel plugins and traversing the AST

Babel plugins use the visitor pattern, an abstraction which facilitates adding extra functionality to objects without requiring a refactor of the original object. The idea is that the object can 'accept' a visitor function that can alter the properties and, as we'll see, the structure of the object.

As the AST is traversed each node gets passed to the babel plugin, a simplified version of which is shown below - an object literal implementing a visitor property which consists of an object of methods named to match the node it should process. The example here has implemented a visitor that will act on all ArrayExpression nodes.

const ourCustomVisitor = {
  visitor: {
    ArrayExpression(path, state) {
      // do stuff
    },
  },
};

When the AST is traversed, data about all corresponding nodes will be passed into the corresponding handler method - the order in which they are passed in, how at what point in the hierarchy and how previous transformations might affect the code are concepts that need to be addressed with real-world code, but the simple, 'flat' examples in this article are chosen to focus on basic concepts.

It is not true to say that the each matching node itself is passed to the handler, each node handler is passed two parameters, path (which does contain the node) and state,which are detailed below.

Path

The path is an object that represents the link between nodes. As you alter the AST babel will update the paths between all nodes.

If we take the following example of an ObjectProperty and the child StringLiteral value

{
  type: "ObjectProperty",
  value: {
    type: "StringLiteral",
    value: "gummi bears"
  }
}

The path that represents the relationship between the nodes would be:

{
  "parent": {
    "type": "ObjectProperty",
      :
  },
  "node": {
    "type": "StringLiteral",
    "value": "gummi bears"
      :
  },
    :
}

In this case node is the current element being handled in a StringLiteral handler in the plugin's visitor:

{
  visitor: {
    StringLiteral(path) {
      // path is:
      // {
      //   "parent": {
      //     "type": "ObjectProperty",
      //       :
      //   },
      //   "node": {
      //     "type": "StringLiteral",
      //     "value": "gummi bears"
      //        :
      //   }
      //    :
      // }
    }
  }
}

Metadata and methods

The path also contains metadata and methods to allow deleting, adding, or updating nodes within the tree.

In the arrayToObject example in the accompanying repo we use path.replaceWith inside an ArrayExpression handler to replace the node defining an array with a node defining an object.

ArrayExpression(path) {
  // ArrayExpression has a property 'elements' that contains the array elements
  const objectProps = path.node.elements.map((element, index) => {
    return new t.objectProperty(new t.stringLiteral(`${index}`), element);
  });

  path.replaceWith(new t.objectExpression(objectProps));
}

State

This contains details of the plugin - including the visitor declaration, pre and post methods. It also contains details of the file being parsed, the state of the AST etc. These can all be accessed within the plugin. The most commonly used state property is opts.

Options

If you are running the plugin as part of your babel stack, rather than through the 'runner' in the associated repo, you can provide options to the babel plugins using your .babelrc file

{
  plugins: [
    ["customPlugin", {
      "doIt": true,
      "decorate": "paint"
    }]
  ]
}

These options will be available in the plugin from state.opts.

state.opts === {
  doIt: true,
  decorate: "paint"
}

Plugin Examples

Keep in mind these are very simple examples that use a single variable assignment meaning we don't need to worry about scope, depth of code blocks etc. Future examples in other articles will use more complex code.

A good starting template for babel plugins is shown below

module.exports = function({ types: t }) {
  return {
    visitor: {
      // add handlers here
    },
  };
};

Convert var to const

In this example, I want to build a simple babel plugin to replace any instance of var with const in the example code - only var should be affected.

// this 'var' should be replaced with a 'const'
var itemOne = ['items', 'things', 'gizmos', 'widgets'];

// this will stay a 'let'
let itemTwo = ['tchotchke', 'stuff', 'yokes'];

The AST for the itemOne variable assignment is presented below. The AST below has all the location information removed for clarity.

{
  "type": "VariableDeclaration",
  "kind": "var"
  "declarations": [
    {
      "type": "VariableDeclarator",
      "id": {
        "type": "Identifier",
        "name": "itemOne"
      },
      "init": {
        "type": "ArrayExpression",
        "elements": [
          {
            "type": "StringLiteral",
            "value": "items"
          },
          {
            "type": "StringLiteral",
            "value": "things"
          },
          {
            "type": "StringLiteral",
            "value": "gizmos"
          },
          {
            "type": "StringLiteral",
            "value": "widgets"
          }
        ]
      }
    }
  ],
  "leadingComments": [
    {
      "type": "CommentLine",
      "value": " this 'var' should be replaced with a 'const'",
    }
  ]
}

The node we are interested in is the top-level node VariableDeclaration, so lets add a handler for that in the babel plugin

module.exports = function({ types: t }) {
  return {
    visitor: {
      VariableDeclaration(path) {
      },
    },
  };
};

We need to recall that the path is not the node, but the relationship between nodes and metadata etc. To get at the VariableDeclaration node we reference path.node.

Let's have a quick look at the AST again, focussing on the point of interest for us

{
  "type": "VariableDeclaration",
  "kind": "var",
    :
}

We want to update the kind of variable declaration from a var to const. The only other valid option is of course let. Babel will let you update that to anything you like, which seems like an oversight, I'm actually not sure why they don't throw an error, or limit the values in some way.

The updated plugin that updates the variable declaration to const and ensures that only var will be affected. I've removed the types destructuring as I don't use it in this plugin.

module.exports = function() {
  return {
    visitor: {
      VariableDeclaration(path) {
        if (path.node.kind === 'var') {
          path.node.kind = 'const';
        }
      },
    },
  };
};

You can run this example yourself from the accompanying repo. Assuming you have installed the dependencies with npm install the command to run the transformation is

node compile.js varToConst vars.source.js

Try messing with the code, adding console.log to see the structure of the path, change the code in vars.source.js to see how the result is affected.

Object from Array

While this is slightly more complex than the 'var to const' example, it is still fairly simple. I'll include some diagrams to be sure the transformation is clear.

First, the source code that we will transform.

// we'll convert this from an array to an object literal
// that uses the position in the list as the key
const coins = ['thrupenny', { name: 'penny', value: 'a penny, ya dope' }, 2];

Once the transformation is complete we want to end up with the following.

const coins = {
  "0": 'thrupenny',
  "1": { name: 'penny', value: 'a penny, ya dope' },
  "2": 2
};

This means that we will need to replace the ArrayExpression with an ObjectExpression and convert each of the elements of the ArrayExpression into an ObjectProperty.

The AST of the source code is below, with some properties removed for clarity.


{
  "type": "VariableDeclaration",
  "declarations": [
    {
      "type": "VariableDeclarator",
      "id": {
        "type": "Identifier",
        "name": "coins"
      },
      "init": {
        "type": "ArrayExpression",
        "elements": [
          {
            "type": "StringLiteral",
            "value": "thrupenny"
          },
          {
            "type": "ObjectExpression",
            "properties": [
              {
                "type": "ObjectProperty",
                "key": {
                  "type": "Identifier",
                  "name": "name"
                },
                "value": {
                  "type": "StringLiteral",
                  "value": "penny"
                }
              },
              {
                "type": "ObjectProperty",
                "key": {
                  "type": "Identifier",
                  "name": "value"
                },
                "value": {
                  "type": "StringLiteral",
                  "value": "a penny, ya dope"
                }
              }
            ]
          },
          {
            "type": "NumericLiteral",
            "value": 2
          }
        ]
      }
    }
  ],
  "kind": "const"
}

Also, a simplified diagram of the AST showing each element - the ObjectExpression in the second element has also been simplified for clarity.

I'm interested in the elements of the ArrayExpression. I will take each element and construct an ObjectProperty that uses a StringLiteral of the array index of the element as the Identifier and uses the element itself as the value. Focussing on the first element in the array

// const coins = ['thrupenny', { name: 'penny', value: 'a penny, ya dope' }, 2];

  {
    "type": "StringLiteral",
    "value": "thrupenny"
  },

The index is zero, so the ObjectProperty - constructed here using babel.types - looks like

const firstArrayElement = path.node.elements[0];
const firstObjectProperty = new t.objectProperty(new t.stringLiteral(`0`), firstArrayElement);

Although the other elements are different types, the approach is the same. The elements don't need any extra processing to convert them to a different type so we can convert the Array elements into Object properties in one step, using Array.map

const objectProps = path.node.elements.map((element, index) => {
  return new t.objectProperty(new t.stringLiteral(`${index}`), element);
});

A simplified diagram of the resulting AST is shown below. The blue elements have all been created by the code outlined above:

The last step is to replace the ArrayExpression node with an ObjectExpression constructed using the new array of ObjectProperty. Luckily the path includes a number of methods to help in transforming the AST, including replaceWith(replacementNode) which swaps the current node for the node provided as a param.

Constructing the ObjectExpression is simple

const objectExpression = new t.objectExpression(objectProps);

Then I can use the replaceWith method to swap out the ArrayExpression for the new ObjectExpression

path.replaceWith(objectExpression);

Which will generate the expected result

const coins = {
  "0": 'thrupenny',
  "1": { name: 'penny', value: 'a penny, ya dope' },
  "2": 2
};

You can run this example yourself from the accompanying repo. Assuming you have installed the dependencies with npm install the command to run the transformation is

node compile.js arrayToObject array.source.js

Top comments (2)

Collapse
 
simonhaisz profile image
simonhaisz

Interesting...my exposure to ASTs has been either custom parsers or using ANTLR to analyze source code. Though now that I think about it, that's likely because when I'm doing this sort of stuff it's either a custom language or something less popular than JS so I haven't had babel or tsc to lean on.

Since this is 'Part One' and the first example replaced var with const is this series going to go in a linting/static-analysis direction?

Collapse
 
mlennox profile image
Mark Lennox

Hi Simon, thanks for the comment. I plan on looking at static analysis alright, probably by showing how cyclomatic complexity might be measured. I'm also going to look at codemods with relation to changing individual files and also refactoring a codebase. I'd also like look at how codemods can help reduce the amount of boilerplate that sometimes needs to be written. Finally, I'll eventually look at aspect oriented programming. I'm going to working on part two from this week - the others might be a while! :)