DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on • Edited on

Tinkering with Tree-Sitter Using Go

Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

Tree-Sitter is a powerful parsing library that generates abstract syntax trees (ASTs) for code.

Let me know if you have any other text you'd like me to clean!Ts) for code in various languages. It’s fast, incremental, and widely used in tools like Neovim and GitHub’s code search. If you’re a Go developer curious about code analysis, Tree-Sitter is a great tool to experiment with. In this post, I’ll walk you through setting up Tree-Sitter in Go, parsing JavaScript code, and exploring the resulting AST. We’ll dive into practical examples, break down the output, and uncover patterns to make sense of it all.

This isn’t a formal lecture—just me sharing my experiments, complete with code you can run and outputs you can expect. Let’s get started.

Setting Up Tree-Sitter in Your Go Project

To tinker with Tree-Sitter, you need a Go project and the Tree-Sitter library. Let’s set up a minimal project to parse some JavaScript code.

First, create a new directory and initialize it as a Go module:

mkdir gg && cd gg
go mod init github.com/yourusername/gg
Enter fullscreen mode Exit fullscreen mode

Next, you’ll need the Tree-Sitter Go bindings and a language grammar (we’ll use JavaScript for this example). Install them with:

go get github.com/tree-sitter/go-tree-sitter
go get github.com/tree-sitter/tree-sitter-javascript/bindings/go
Enter fullscreen mode Exit fullscreen mode

These packages give you the core Tree-Sitter library and the JavaScript grammar. The Tree-Sitter Go bindings are lightweight and easy to use, making setup a breeze.

Now, create a file called main.go with the following code to test the setup:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("const foo = 1 + 2")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    fmt.Println(root.ToSexp())
}

// Output:
// (program (lexical_declaration (variable_declarator name: (identifier) value: (binary_expression left: (number) right: (number)))))
Enter fullscreen mode Exit fullscreen mode

Run it with:

go run main.go
Enter fullscreen mode Exit fullscreen mode

This code parses a simple JavaScript snippet (const foo = 1 + 2) and prints the AST in S-expression format. The output is a nested structure representing the code’s syntax. Don’t worry about the S-expression yet—we’ll decode it soon.

Key points:

  • Initialize a Go module to manage dependencies.
  • Install Tree-Sitter and a language grammar (like JavaScript).
  • Use defer to clean up parser and tree resources.

Parsing Code and Understanding the AST

Now that we have a working setup, let’s dig into what the parser does. The code above takes a JavaScript snippet and turns it into an AST. The root.ToSexp() method gives us a string representation of the tree, but it’s dense. Let’s format it for clarity:

(
  program (
    lexical_declaration (
      variable_declarator
        name: (identifier)
        value: (
          binary_expression
            left: (number)
            right: (number)
        )
    )
  )
)
Enter fullscreen mode Exit fullscreen mode

This structure represents the JavaScript code const foo = 1 + 2. Here’s how it breaks down:

Node Type Description Example in Code
program Root of the AST Entire snippet
lexical_declaration A const or let declaration const foo = ...
variable_declarator A variable and its value foo = 1 + 2
identifier A variable name foo
binary_expression An operation like addition 1 + 2
number A numeric literal 1 or 2

The AST tells us the hierarchical structure of the code. For example, variable_declarator has two children: name (the identifier foo) and value (the expression 1 + 2). The binary_expression node breaks down 1 + 2 into left (1) and right (2).

Key points:

  • ASTs are hierarchical: Each node represents a syntax construct.
  • S-expressions are compact but need formatting to read easily.
  • Nodes have types like identifier or number that map to code elements.

Decoding S-Expression Patterns

The S-expression output can feel cryptic, but it follows clear patterns. Let’s analyze the output to make it less intimidating.

From the formatted S-expression, I noticed two recurring patterns:

  1. Parameter-Definition Pattern (parameter_name: definition):

    • When you see a colon (:), the left side is a parameter name, and the right side is its definition.
    • Example: name: (identifier) means the name parameter is defined as an identifier node (in our case, foo).
  2. Node-Children Pattern (name (...)):

    • A node name followed by parentheses contains its children.
    • Example: binary_expression (...) means binary_expression is a node with children like left and right.

To verify these patterns, let’s parse a slightly more complex JavaScript snippet. Update main.go to:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("function add(a, b) { return a + b; }")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    fmt.Println(root.ToSexp())
}

// Output:
// (program (function_declaration name: (identifier) parameters: (formal_parameters (identifier) (identifier)) body: (statement_block (return_statement (binary_expression left: (identifier) right: (identifier))))))
Enter fullscreen mode Exit fullscreen mode

Run it with go run main.go and format the output:

(
  program (
    function_declaration
      name: (identifier)
      parameters: (
        formal_parameters
          (identifier)
          (identifier)
      )
      body: (
        statement_block
          (
            return_statement
              (
                binary_expression
                  left: (identifier)
                  right: (identifier)
              )
          )
      )
  )
)
Enter fullscreen mode Exit fullscreen mode

This represents function add(a, b) { return a + b; }. Notice the patterns:

  • name: (identifier) for the function name (add).
  • parameters: (formal_parameters ...) for the parameter list (a, b).
  • body: (statement_block ...) for the function body.

These patterns make S-expressions predictable once you get the hang of them.

Key points:

  • Colons separate parameters from definitions.
  • Parentheses group a node’s children.
  • Practice with different code snippets to spot patterns.

Walking the AST for Fun and Profit

Parsing is cool, but the real power comes from traversing the AST to extract information. Let’s modify our code to walk the tree and print node types and their content. This is useful for tasks like code analysis or linting.

Here’s an example that traverses the AST and prints each node’s type and text:

package main

import (
    "fmt"

    tree_sitter "github.com/tree-sitter/go-tree-sitter"
    tree_sitter_javascript "github.com/tree-sitter/tree-sitter-javascript/bindings/go"
)

func main() {
    code := []byte("const foo = 1 + 2")

    parser := tree_sitter.NewParser()
    defer parser.Close()
    parser.SetLanguage(tree_sitter.NewLanguage(tree_sitter_javascript.Language()))

    tree := parser.Parse(code, nil)
    defer tree.Close()

    root := tree.RootNode()
    traverse(root, code, 0)
}

func traverse(node *tree_sitter.Node, code []byte, depth int) {
    indent := strings.Repeat("  ", depth)
    nodeType := node.Type()
    nodeText := string(code[node.StartByte():node.EndByte()])
    fmt.Printf("%s%s: %s\n", indent, nodeType, nodeText)

    for i := 0; i < int(node.NamedChildCount()); i++ {
        child := node.NamedChild(i)
        traverse(child, code, depth+1)
    }
}

// Output:
// program: const foo = 1 + 2
//   lexical_declaration: const foo = 1 + 2
//     variable_declarator: foo = 1 + 2
//       identifier: foo
//       binary_expression: 1 + 2
//         number: 1
//         number: 2
Enter fullscreen mode Exit fullscreen mode

Run it with go run main.go. This code uses a recursive traverse function to visit each named node, printing its type and the corresponding code snippet. The depth parameter adds indentation for readability.

This traversal is handy for understanding the AST’s structure or extracting specific nodes (e.g., finding all identifier nodes for variable names).

Key points:

  • Use NamedChild to iterate over significant nodes (ignoring punctuation like =).
  • Access node text with StartByte and EndByte.
  • Traversal is recursive, so handle depth to avoid stack overflows on large trees.

Where to Go Next with Tree-Sitter and Go

Tree-Sitter’s power lies in its flexibility. Here are some ideas to keep tinkering:

  • Try other languages: Tree-Sitter supports many grammars (e.g., Python, Rust). Install their bindings (like tree-sitter-python) and swap the language in SetLanguage.
  • Build a linter: Traverse the AST to enforce coding rules, like checking for unused variables.
  • Integrate with tools: Use Tree-Sitter in a CLI tool or editor plugin for real-time code analysis.
  • Explore queries: Tree-Sitter’s query language lets you search ASTs for patterns (e.g., find all function declarations). Check the Tree-Sitter docs for details.

To experiment further, try parsing larger codebases or combining Tree-Sitter with other Go libraries. For example, you could pair it with go-git to analyze code in repositories.

Key points:

  • Tree-Sitter is versatile for parsing any language with a grammar.
  • AST traversal enables linting, refactoring, or code metrics.
  • Keep experimenting with small projects to master Tree-Sitter’s API.

This journey into Tree-Sitter with Go has been a fun way to understand code parsing. The examples here are just the start—play with different snippets, explore the AST, and build something cool. If you hit snags, the Tree-Sitter community and docs are great resources. Happy coding!

git-lrc
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc logo

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit


git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…




Top comments (0)