Jude Agboola

Posted on Oct 21, 2023

Abstract Syntax Trees and Practical Applications in JavaScript

#javascript #computerscience #datastructures #babel

Abstract Syntax Tree (AST) sounds like one of those daunting computer science terms at first but it becomes more approachable once you grasp the basics. The goal of this post is to give you a gentle introduction to AST while exploring practical applications in JavaScript.

If you are trying to understand the basics of AST and its practical application then this article is for you, No prior assumptions about your knowledge of AST are made here, as we'll take a straightforward approach to explain the concepts.

Instead of delving into the various stages that a program goes through before execution, this article is dedicated to enhancing your grasp of ASTs and demonstrating their practical applications in your JavaScript development journey. We'll achieve this by delving into tools that heavily rely on ASTs.

To effectively follow along, a foundational understanding of JavaScript is required. We will explore various JavaScript tools and engage in hands-on coding in the later sections of this post.

Disclaimer: If you've already developed Babel or ESLint plugins, this article may not be as beneficial for you, as you're likely already familiar with the majority of the content covered here.

What is an Abstract Syntax Tree (AST)

An abstract syntax tree (AST) is a hierarchical data structure used in computer science and programming language theory to represent the syntactic structure of source code or expressions in a programming language. It is often used as an intermediate representation during the compilation or interpretation of code.

That's a lot of words, right? Let's make it simple

Every piece of source code you write, whether it's intended for interpretation or compilation, undergoes a process known as parsing. During this process, the code is transformed into an Abstract Syntax Tree (AST), which serves as a structured, hierarchical representation of the code's underlying structure.

Having established that, let's look at some code and its corresponding AST:

Head over to astexplorer to get a clearer view

From the AST on the right, you'll notice the tree-like structure, we start out with the root node of type Module which represents the whole file, and in that, we have the body which holds other nodes of type ImportDeclaration, VariableDeclaration, and VariableDeclarator which clearly describes each part of the code.

Here, I'm using the swc parser to turn my JavaScript code into an AST.

Please note that the AST may be a little different when you use a different parser but the idea is the same, A tree-like structure that represents the source code

Remember that we earlier established that every source gets parsed into an AST at some point before it gets compiled or interpreted. For example, platforms like Nodejs and chromium-based browsers use Gooogle's V8 engine behind the scenes to run JavaScript and of course, some AST parsing is always involved before the interpreter kicks in. I looked V8's source and I discovered it uses its own internal parser to achieve this.

Why do we then have other JavaScript parsers like babel parser, swc parser, acorn, espree and the likes since JavaScript engines have their own internal parsers?

They exist to provide a baseline for other tools to work with. For example, Transpilers, Minifiers, Linters, Codemods, Language Processors, and Obfuscators, use a parser behind the scenes to parse your code into an AST before applying transformations or performing any analysis whatsoever.

When it comes to the practical usage of Abstract Syntax Trees (ASTs), our primary emphasis in this article will be on two widely used applications: Transpilers and Linters, particularly within the context of JavaScript.

Here, we will see how ASTs play a crucial role in these applications, enabling developers to transform and analyze code effectively.

Code Transpililation

A transpiler, short for "source-to-source compiler", is a software tool that translates source code written in one programming language into equivalent source code in the same language. Transpilers are commonly used for various purposes, such as language compatibility, syntax conversion, and code optimization.

A frequent scenario involves syntax conversion, especially when dealing with compatibility issues. Imagine we have an application, and some of our users are on older web browsers. If we've adopted new syntax features like the Nullish coalescing operator (??), this could render our app unusable for those users. To address this, we must transform our code into an older, compatible syntax before deploying it to production. This ensures that our app remains accessible and functional for users with older browsers.

As depicted in the image above, the Transpiler begins by parsing the code into an Abstract Syntax Tree (AST). Following this, it proceeds to transform the AST as needed before finally generating code based on the modified AST.

Babel is a very common JavaScript Transpiler in the ecosystem and you may have used it directly or indirectly. We'll talk about how it uses AST in detail as we proceed.

Code Linting

Another software tool that is quite heavy on AST is Linters. A linter automatically analyzes and checks source code for potential errors, style violations, and programming best practices, helping developers identify and correct issues in their code during development.

Before a linter can perform static analysis on your source code, it begins by parsing the code into an Abstract Syntax Tree (AST). Once this parsing is complete, the linter then proceeds to traverse the AST to identify and address potential issues within the code.

ESLint is a widely adopted Linter within the JavaScript community. It boasts a robust plugin system, a comprehensive library of plugins, editor extensions, and presets (which are groups of plugins) that you can easily integrate into your project. We'll be talking about Eslint in detail later in this post.

AST in Transpliers (Babel)

Now that you've seen a couple of use cases for AST, we'll talk about Transpilers in detail. Specifically, we'd be using Babel and building a plugin.

Babel provides us with the toolchains to transpile our code, it has a CLI, a parser, and a plugin system, which means that you can write a plugin that applies some transformation to your code. You can also ship that to npm so that anyone can install and use it.

Code transpilation isn't specific to JavaScript, You can also add a level of transformation to your CSS source using tools like post-css. Most languages with a fairly mature ecosystem will probably have some tools to help with code transformation.

Babel takes each of your files, generates an Abstract Syntax Tree (AST) based on your code, and passes this AST along with additional information to a Plugin. The Plugin can then apply the required transformations to the AST. After the transformations are complete, the resulting AST is converted back into code. It is important to note that without a plugin, Babel does absolutely nothing. You simply get the same code as output.

To go more practical on our knowledge of ASTs, we'll write a simple Babel plugin that removes console logs from our code. Most JavaScript developers are guilty of littering their console.log while debugging. Our plugin will remove console.log from our source code completely.

Most of the time, you'd want to use a linter to catch console logs before you commit changes to your repository instead of just removing them at build time.

Cloning Template

As you may have noticed, this isn't a comprehensive "How to Create a Babel Plugin" tutorial, so we won't spend too much time talking about how to Create a plugin. Instead, we'll begin with a template that I have exclusively designed for this article. This template is hosted in a monolithic repository, which simplifies the management of multiple packages within a single project. The template has two plugins in its plugin directory, one Babel plugin and another Eslint plugin. You can access the repository on GitHub.

Let's start off by cloning the repository:

https://github.com/marvinjude/ast-and-practical-js-applications.git

Checkout to the starter branch:

git checkout starter

This repository comprises two branches: main and starter. The starter branch serves as an empty template that we'll progressively build upfon throughout this article. On the other hand, the main branch captures every modification we make to our starter branch. Feel free to cross-reference the main branch with your ongoing updates as necessary.

Installing Dependencies

Before installing dependencies and eventually running the project, you must have node and pnpm installed.

To Install dependencies, run:

pnpm install

Template and Files

As earlier mentioned, Our template is a monolithic repository managed by pnpm. The plugins directory contains two packages that we'll be working on; babel-plugin-remove-console and eslint-plugin-emojify-array. I've also installed both packages in our project's root as you can see in package.json using the pnpm's workspace protocol.

We have a handful of files and folders in our project, but we'll be focusing on a few of them in this section:

src - source files to be transplied

plugins - contains the plugins that we'll mostly be working on

.babelrc.js - Babel configuration file where we specified the plugin(s) to be used(more details below)

pnpm-workspace.yaml - pnpm workspace configuration file where we specified what directory to store our packages

Configuring Babel

Babel relies on a configuration file that allows us to customize the plugins, presets, and other settings used during the Babel transpilation process. The primary configuration file is typically named .babelrc.js, although other formats are also supported (you can learn more about configuring Babel here)

There's a .babelrc.js file in the root of our project where we've correctly configured Babel to use our plugin.

📂 .babelrc.js

module.exports = {
  plugins: ["babel-plugin-remove-console"],
};

Writing a Babel plugin

Before we dive into writing the plugin, let's examine a code snippet that utilizes console.log and take a closer look at its corresponding Abstract Syntax Tree (AST). This should provide us with valuable insights on how to approach the development of our plugin.

Check it out on astexplorer

One great feature of astexplorer is its ability to allow you to interactively explore the Tree by clicking on or selecting code, which then automatically focuses on the corresponding AST node. For instance, when you work with a function call, like console.log, you'll notice that it's represented as a CallExpression. In our task, we aim to eliminate CallExpression nodes specifically for those that involve console.log.

Our Babel plugin, named babel-plugin-remove-console is housed within the plugins directory. It's a standard JavaScript package, and its entry point can be found at lib/index.js which is the core component of our plugin. When Babel processes your code, it invokes the function exported from this entry point and applies the specified transformations. It's time to write our plugin function! (make sure to update the file below on your local branch)

📂 plugins/babel-plugin-remove-console/lib/index.js

module.exports = function (api) {
  const { types: t } = api;

  return {
   name: "remove-console",
    visitor: {
      CallExpression(path) {
        const { callee } = path.node;
        if (
          t.isMemberExpression(callee) &&
          t.isIdentifier(callee.object, { name: "console" }) &&
          t.isIdentifier(callee.property, { name: "log" })
        ) {
          path.remove();
        }
      },
    },
  };
};

What do we have going on here?

First, Babel calls the plugin function with api and pluginOptions. Let's see what those are:

api: This is the primary object provided to the plugin, and it grants the plugin access to various Babel methods, utilities, and information about the code being transformed. It contains properties like types which has a bunch of utility methods like isMemberExpression on it.

options: This is an optional parameter representing the configuration options passed to the plugin. The structure and content of the options object depend on how the plugin is configured in your Babel setup. These options allow you to customize the behaviour of the plugin based on your specific requirements.

Notice that our plugin function returns an object. The object is expected to match this shape. visitor is the most important property in the returned object, the name is derived from the Visitor Pattern — a software design pattern. visitor is used by Babel to specify the part of the AST to be targeted for modification. With this pattern, we don't have to manually write a tree traversal to walk through the generated AST, we simply specify the node type and the transformation to be applied.

visitor can consist of keys that correspond to specific node types in the AST, and the values associated with these keys are functions that define the behaviour of the plugin when it encounters nodes of those types. These functions are called with a path argument representing the current node in the AST, and they determine what modifications, if any, should be made to the code.

For example, if we could apply the same idea to a house so we can close the door to all the rooms, it would look something like this:

  visitor: {
    Room(path){
      path.node.doorMode = "closed"
    }
  }

In our case, we're visiting every CallExpression and removing it if meets some conditions.

Running Babel

Now that we've written a plugin, let's run Babel to the effect of the plugin.

First, let us populate the files in our src directory with some code that contains console.log. Here's an example:

const name = "John W. Smith"

console.log(name)

Babel is installed in the root of our project, so we can run Babel using the command babel src --out-dir dist. For simplicity, I've added it to the build script in package.json:

pnpm build

Now, we should have all files in the src directory in dist with all console.log calls removed. yay!

Keep in mind that this is an example plugin and may not handle some edge cases correctly so you may not use it on your codebase. You should use babel-plugin-transform-remove-console instead.

What other plugins can you write?

Babel has all you need to move from writing a simple plugin that removes console.log to writing much more complex plugins. Most times you may not need to write your own plugin since Babel has a huge plugin library, both official and unofficial.

Babel plugins are everywhere. From being used to remove unwanted exports from files in Gatsby to being used to disallow users from doing re-exports in Nextjs.

For more information about building Babel plugins, check the Kent's Babel Handbook or this awesome Babel handbook by Jamie.

AST in Linters - ESLint

Linters are indispensable tools for upholding coding standards across your codebase. Whether you aim to eradicate semicolons, champion tabs over spaces, or delve into more intricate scenarios, Linters have got you covered.

They empower you to maintain code quality, adhere to best practices, and ensure consistency throughout your projects. From simple conventions to much more intricate ones, Linters play a crucial role in enhancing your codebase's integrity.

Fun fact! There has always been a debate about whether to use tabs or spaces online. While Linters won't necessarily settle the debate, they can help teams enforce the standard they eventually agree on — hopefully, they get to agree :)

ESLint is a great Linter! It has a plugin system where each plugin can define a set of rules. Behind the scenes, each rule operates on your code's AST to flag possible violations. ESLint also allows you to configure these rules to specify if violating them leads to an error or warning which we can specify in an ESLint config.

ESLint config

ESLint relies on a configuration file that allows us to define plugins and rules to be used and their configuration. We can also use different file formats like YAML and JSON. Here, we'll use the .js format. We have a .eslintrc.js file at the root of our project and it looks like this:

📂 .eslintrc.js

module.exports = {
  env: {
    node: true,
    browser: true,
  },
  parserOptions: {
    ecmaVersion: "latest",
    sourceType: "module",
  },
  plugins: ["emojify-array"],
  rules: {
    "emojify-array/padded-emoji-array": [
      "error",
      {
        emoji: "🔥🔥",
      },
    ],
  },
};

Here, we have a minimal configuration, just enough to get things working. Each key serves a specific purpose. Let's see what each one does:

env

The env key specifies the environments where your JavaScript code will run. In this configuration:

"node: true" indicates that Node.js specific global variables are enabled.
"browser: true" indicates that browser-specific global variables are enabled.

parserOptions

The parserOptions key is used to configure options related to JavaScript parsing and ECMAScript version. In this configuration:

ecmaVersion: "latest" specifies that the latest ECMAScript version should be used, allowing you to use the most recent JavaScript features.
sourceType: "module" indicates that the code is in ECMAScript modules (ES6 modules).

plugins

The plugins key lists the ESLint plugins you want to use. Plugins provide additional rules and features. Here, we're using the plugin "emojify-array", which we'll write in the next section.

rules

The rules key defines ESLint rules and their configurations(severity level and options). ESLint plugins can have multiple rules, so we're picking the padded-emoji-array rule from our plugin and passing a severity level and some options.

Writing an ESLint plugin

Let's take our knowledge about Abstract Syntax Trees one step higher by writing an ESLint plugin. This time, we're writing something really fun:)

Our plugin will define a rule that forces arrays to start and end with an emoji. We'll also make the emoji configurable so that anyone using our plugin can configure the emoji to be used.

The plugin is in the plugins/eslint-plugin-emojify-array directory. In the plugin's entry point (/lib/index.js), we can define all the rules that our plugin exposes in a rule object.

module.exports = {
  rules: {
    "padded-emoji-array": require("./rules/padded-emoji-array"),
  },
};

Next, we'll create the rule module referenced above in rules/padded-emoji-array.js. This rule is responsible for ensuring that arrays start and end with an emoji, and it provides an optional configuration to customize the emoji used.

The rule module must export an object with a create function, You can also define the rule's metadata and schema with meta and schema respectively:

create Function: Defines the rule's behavior.
meta Object: Provides metadata, including description and recommendations.
schema Object: Configures and validates options for the rule.

Our rule module is defined below:

📂 plugins/eslint-plugin-emojify-array/lib/rules/padded-emoji-array.js

module.exports = {
  meta: {
    type: null,
    docs: {
      description: "Make sure arrays start and end with an emoji",
      recommended: false,
      url: null,
    },
    fixable: "code",
  },
  schema: [
    {
      type: "object",
      properties: {
        emoji: {
          type: "string",
        },
      },
    },
  ],
  create(context) {
    const [optionsObject] = context.options;

    const emoji = optionsObject.emoji || "🔥";

    function containsEmoji(value) {
      const emojiPattern = /[\p{Emoji}]/gu;

      return emojiPattern.test(value);
    }

    return {
      ArrayExpression(node) {
        const startAndEndContainsEmoji =
          containsEmoji(node.elements[0].value) &&
          containsEmoji(node.elements[node.elements.length - 1].value);

        if (!startAndEndContainsEmoji) {
          context.report({
            node,
            message: "Array should start and end with an emoji",
            fix(fixer) {
              const firstElement = node.elements[0];
              const lastElement = node.elements[node.elements.length - 1];

              const fixes = [
                fixer.insertTextBefore(firstElement, `"${emoji}", `),
                fixer.insertTextAfter(lastElement, `, "${emoji}"`),
              ];

              return fixes;
            },
          });
        }
      },
    };
  },
}

Let's go straight to the key part of this module, the create function! The following steps are performed:

We access the provided options to configure the emoji that should be used (or use a default emoji, "🔥" if none is provided).
A function named containsEmoji checks if the first and last value of the array contains an emoji using a regular expression pattern.
The ArrayExpression node type is targeted in the code's AST. We check whether the first and last elements of the array contain emojis. If they don't, ESLint reports an issue.
We have a fix function so that ESLint can automatically fix the issue by adding the emojis to the array when ESLint is run with the --fix flag.

Harnessing the power of Abstract Syntax Trees, ESLint can serve as your watchful guardian to help dectect potential issues within your codebase. It caters to a spectrum of use cases, ranging from straightforward checks like the one we've just explored to more practical and intricate scenarios, such as prohibiting client components from utilizing asynchronous functions in Next.js or enforcing the rules of Hooks in a ReactJS project using eslint-plugin-react-hooks.

Running ESLint

In this section, we'll run ESLint againt our code in two ways. First, we want to list of potential errors and warning and next, we want to fix them. I've defined two scripts in package.json, lint and lint:fix you should check package.json to see the actual command behind the scripts.

lint - Calls ESLint on our src directory which then shows us all possible warnings and errors in our files.

lint:fix - Fix errors using our src directory using the fix function defined in our ESLint rule.

In one of the files in the src directory, I'd add an array without an emoji at the start and end then I'll run the lint script. The command should exit with an exit code of 1 after listing all errors and warnings:

Since our Eslint rule defines a fix function, we can run lint:fix to fix the error above:

Eslint Editor Extensions

Aside from the standard output you get when you run ESLint, you can also take the experience further by installing the Eslint Extention on your editor. With the extension installed, you'd get errors, warnings and suggestions right in the editor. In our case, we should get a warning like so:

Conclusion

In conclusion, Abstract Syntax Trees (AST) may initially seem daunting, but it becomes more approachable once you grasp the basics. This post aimed to provide a gentle introduction to AST while exploring its practical applications in JavaScript

Whether you are a newcomer to the concept of AST or already have some familiarity with it, I hope this article has shed light on its significance and use cases. While our examples mainly revolved around JavaScript and related tooling, it's worth noting that AST concepts can be applied to various programming languages.

Amongst many other practical utilizations of ASTs, we've focused on two common applications: Transpilers and Linters.

The world of ASTs is vast, and while we touched on a few applications, there's much more to explore. You can expand your knowledge of what we've learned so far by figuring out what problems you can solve using these tools.

Thanks for reading!

Other Resources

babel plugin handbook by Kent C. Dodds
The super tiny compiler by Jamie

Top comments (4)

Yeom suyun • Oct 21 '23

I just worked on an ESLint plugin, but I think espree's AST is not very suitable for linting.
I was able to complete the work using a few tricks, but it took me much longer than necessary.
I also implemented AST directly to create a VSCode extension, and AST specialized for needs is definitely more convenient.

Jude Agboola • Oct 21 '23

Curious to hear about the areas where ESpree falls short for you. As far as I know, it's been ESLint's default parser and I've not seen lots of complains about it in the wild.

Yeom suyun • Oct 21 '23

In JavaScript runtime, meaningless parentheses are not included in the AST. However, this was a significant inconvenience when creating a rule that manages parentheses in pairs.
Additionally, functions inside parentheses do not support features such as beforeComments, which is quite strange considering the structure of the AST.

Jude Agboola • Oct 23 '23

The issue with beforeComments sounds like an Eslint-specific issue. It's most likely not directly related to ESpree. Opening an issue on their repo may be a way to go.

Seem like most parsers ignore the extra parenthesis anyway. I tried @babel/parser and espree on this code block:

(((function f(){})))

And it went from ExpressionStatement to FunctionExpression, ignoring the extras, so I'd assume that most parsers do the same.

DEV Community