DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on • Edited on

Making Sense of tree-sitter's C API

Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

_Right now, I’m building Tree-sitter is a powerfulparsing library that generates syntax trees for code, making it a go-to for tools like code editors and linters. Its C API is the backbone for integrating Tree-sitter into projects, but it can feel daunting with its many types and functions. This guide breaks down the Tree-sitter C API, focusing on practical usage with clear examples. We'll explore how to set up a parser, parse code, navigate syntax trees, and query them, all while keeping things developer-friendly.

Why Tree-sitter's C API Matters

The C API is the core interface for Tree-sitter, offering fine-grained control over parsing and syntax tree manipulation. It's used by language bindings (like Rust or Python) and directly in C/C++ projects for performance-critical applications. Understanding it helps you:

  • Integrate Tree-sitter into custom tools.
  • Optimize parsing for specific use cases.
  • Debug issues when higher-level bindings fall short.

The API is defined in tree_sitter/api.h (available on GitHub). It revolves around a few key concepts: parsers, trees, nodes, and queries. Let’s dive into the essentials.

Setting Up a Parser

To use Tree-sitter, you first need a parser. The TSParser struct is your entry point, and setting it up involves creating it and assigning a language.

Key Functions

Function Description
ts_parser_new Creates a new parser.
ts_parser_set_language Assigns a language to the parser.
ts_parser_delete Frees the parser.

Example: Initializing a Parser

Here’s how to set up a parser for JavaScript using a hypothetical tree_sitter_javascript language (you’d typically get this from a compiled language module).

#include <tree_sitter/api.h>
#include <stdio.h>

// Assume tree_sitter_javascript is defined elsewhere
extern const TSLanguage *tree_sitter_javascript();

int main() {
    // Create parser
    TSParser *parser = ts_parser_new();

    // Set language
    const TSLanguage *lang = tree_sitter_javascript();
    if (!ts_parser_set_language(parser, lang)) {
        fprintf(stderr, "Language version mismatch\n");
        ts_parser_delete(parser);
        return 1;
    }

    // Clean up
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output: No output if successful; prints an error if the language version is incompatible.

Notes:

  • Language versioning is critical. The API supports languages with ABI versions between TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION (13) and TREE_SITTER_LANGUAGE_VERSION (15).
  • Always check the return value of ts_parser_set_language to catch version mismatches.

Parsing Code into a Syntax Tree

Once you have a parser, you can parse code to create a TSTree. The tree represents the code’s structure, with nodes for each syntactic element (e.g., functions, variables).

Key Functions

Function Description
ts_parser_parse_string Parses a string into a syntax tree.
ts_tree_root_node Gets the root node of the tree.
ts_tree_delete Frees the tree.

Example: Parsing JavaScript Code

This example parses a simple JavaScript function and prints the root node’s type.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(
        parser,
        NULL,  // No old tree for first parse
        code,
        strlen(code)
    );

    if (tree == NULL) {
        fprintf(stderr, "Parsing failed\n");
        ts_parser_delete(parser);
        return 1;
    }

    TSNode root = ts_tree_root_node(tree);
    printf("Root node type: %s\n", ts_node_type(root));

    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

Root node type: program
Enter fullscreen mode Exit fullscreen mode

Notes:

  • The NULL old_tree parameter is used for initial parses. For incremental parsing (e.g., after code edits), pass the previous tree.
  • The root node’s type (program) is language-specific, defined in the language’s grammar.

Navigating the Syntax Tree

The syntax tree is a hierarchy of TSNode objects, each representing a syntactic construct. You can traverse the tree to inspect nodes, their types, and their positions.

Key Functions

Function Description
ts_node_child Gets a child node by index.
ts_node_named_child Gets a named child (excludes anonymous nodes like string literals).
ts_node_type Returns the node’s type as a string.
ts_node_start_point Gets the node’s start position (row, column).

Example: Traversing a Tree

This code parses a JavaScript function and prints its named children.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));

    TSNode root = ts_tree_root_node(tree);
    uint32_t child_count = ts_node_named_child_count(root);

    printf("Named children of root (%s):\n", ts_node_type(root));
    for (uint32_t i = 0; i < child_count; i++) {
        TSNode child = ts_node_named_child(root, i);
        TSPoint start = ts_node_start_point(child);
        printf("  %u: %s at (%u, %u)\n", i, ts_node_type(child), start.row, start.column);
    }

    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

Named children of root (program):
  0: function_declaration at (0, 0)
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Named vs. anonymous nodes: Named nodes (e.g., function_declaration) correspond to grammar rules, while anonymous nodes (e.g., "(") are literals.
  • Use ts_node_start_point and ts_node_end_point for precise code positions.

Using Tree Cursors for Efficient Traversal

For large trees, iterating with ts_node_child can be slow. The TSTreeCursor provides a more efficient way to traverse trees by maintaining state.

Key Functions

Function Description
ts_tree_cursor_new Creates a cursor starting at a node.
ts_tree_cursor_goto_first_child Moves to the first child.
ts_tree_cursor_goto_next_sibling Moves to the next sibling.
ts_tree_cursor_current_node Gets the current node.

Example: Using a Tree Cursor

This example traverses the tree to find all named nodes.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

void traverse(TSNode node) {
    TSTreeCursor cursor = ts_tree_cursor_new(node);
    if (ts_tree_cursor_goto_first_child(&cursor)) {
        do {
            TSNode current = ts_tree_cursor_current_node(&cursor);
            if (ts_node_is_named(current)) {
                printf("Node: %s\n", ts_node_type(current));
                traverse(current);  // Recurse
            }
        } while (ts_tree_cursor_goto_next_sibling(&cursor));
    }
    ts_tree_cursor_delete(&cursor);
}

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));

    TSNode root = ts_tree_root_node(tree);
    printf("Starting traversal:\n");
    traverse(root);

    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

Starting traversal:
Node: function_declaration
Node: identifier
Node: formal_parameters
Node: statement_block
Node: return_statement
Node: string
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Cursors are faster than repeated ts_node_child calls because they cache traversal state.
  • Always call ts_tree_cursor_delete to avoid memory leaks.

Querying the Syntax Tree

Queries let you search for patterns in the syntax tree, like finding all function declarations. The TSQuery API uses S-expressions to define patterns.

Key Functions

Function Description
ts_query_new Creates a query from an S-expression.
ts_query_cursor_new Creates a cursor for executing queries.
ts_query_cursor_exec Runs the query on a node.
ts_query_cursor_next_match Gets the next match.

Example: Finding Function Declarations

This code queries for function declarations and prints their names.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    const char *code = "function hello() { return 'world'; }\nfunction bye() {}";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));

    // Create query
    const char *query_str = "(function_declaration name: (identifier) @func-name)";
    uint32_t error_offset;
    TSQueryError error_type;
    TSQuery *query = ts_query_new(
        tree_sitter_javascript(),
        query_str,
        strlen(query_str),
        &error_offset,
        &error_type
    );
    if (!query) {
        fprintf(stderr, "Query error at offset %u\n", error_offset);
        return 1;
    }

    // Execute query
    TSQueryCursor *cursor = ts_query_cursor_new();
    ts_query_cursor_exec(cursor, query, ts_tree_root_node(tree));

    TSQueryMatch match;
    while (ts_query_cursor_next_match(cursor, &match)) {
        for (uint16_t i = 0; i < match.capture_count; i++) {
            TSQueryCapture capture = match.captures[i];
            char *name = ts_node_string(capture.node);
            printf("Found function: %s\n", name);
            free(name);
        }
    }

    ts_query_cursor_delete(cursor);
    ts_query_delete(query);
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

Found function: (identifier "hello")
Found function: (identifier "bye")
Enter fullscreen mode Exit fullscreen mode

Notes:

  • The query (function_declaration name: (identifier) @func-name) captures the identifier node as func-name.
  • Check ts_query_new for errors, as invalid S-expressions will return NULL.
  • Learn more about query syntax in the Tree-sitter documentation.

Handling Code Edits

Tree-sitter supports incremental parsing, which is crucial for real-time applications like editors. You edit the tree to reflect code changes and reparse only the affected parts.

Key Functions

Function Description
ts_tree_edit Updates the tree for an edit.
ts_parser_parse Reparses with the old tree for efficiency.

Example: Updating a Tree

This code edits a JavaScript function and re-parses it.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    const char *old_code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, old_code, strlen(old_code));

    // Simulate edit: change "hello" to "greet"
    TSInputEdit edit = {
        .start_byte = 9,  // Start of "hello"
        .old_end_byte = 14,  // End of "hello"
        .new_end_byte = 14,  // End of "greet"
        .start_point = {0, 9},
        .old_end_point = {0, 14},
        .new_end_point = {0, 14}
    };
    ts_tree_edit(tree, &edit);

    const char *new_code = "function greet() { return 'world'; }";
    TSTree *new_tree = ts_parser_parse_string(parser, tree, new_code, strlen(new_code));

    TSNode root = ts_tree_root_node(new_tree);
    TSNode func = ts_node_named_child(root, 0);
    char *func_name = ts_node_string(ts_node_named_child(func, 0));
    printf("Updated function name: %s\n", func_name);
    free(func_name);

    ts_tree_delete(tree);
    ts_tree_delete(new_tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

Updated function name: (identifier "greet")
Enter fullscreen mode Exit fullscreen mode

Notes:

  • The TSInputEdit struct requires precise byte and point offsets, which you’d typically compute from an editor’s change events.
  • Incremental parsing is much faster than re-parsing from scratch.

Debugging and Logging

Tree-sitter provides tools to debug parsing, like logging and generating DOT graphs for visualization.

Key Functions

Function Description
ts_parser_set_logger Sets a callback for parse/lex logs.
ts_parser_print_dot_graphs Outputs DOT graphs to a file descriptor.

Example: Adding a Logger

This code logs parsing events to stderr.

#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>

extern const TSLanguage *tree_sitter_javascript();

void log_callback(void *payload, TSLogType type, const char *msg) {
    fprintf(stderr, "[%s] %s\n", type == TSLogTypeParse ? "PARSE" : "LEX", msg);
}

int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());

    TSLogger logger = { .payload = NULL, .log = log_callback };
    ts_parser_set_logger(parser, logger);

    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));

    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output (example, varies by language):

[PARSE] parsing rule: program
[LEX] token: function
...
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Use ts_parser_print_dot_graphs with a file descriptor to visualize trees (pipe to dot -Tsvg for SVG output).
  • Logging is verbose but invaluable for debugging grammar issues.

Practical Tips for Using the C API

To wrap up, here are actionable tips for working with Tree-sitter’s C API:

  • Start small: Begin with simple parsing and traversal before tackling queries or incremental parsing.
  • Check return values: Functions like ts_parser_set_language and ts_query_new can fail silently if not checked.
  • Use cursors for traversal: They’re faster and cleaner than manual node iteration for largetrees.
  • Leverage incremental parsing**: For real-time applications, always edit and reuse trees to save time.
  • Debug with logs and graphs**: Enable logging or DOT output to understand parsing issues.
  • Read the source**: The api.h file is well-documented and the ultimate reference.

The C API is low-level but gives you total control over Tree-sitter’s capabilities. Whether you’re building a code editor, linter, or custom tool, mastering it unlocks powerful parsing features. Experiment with the examples, tweak them for your language, and you’ll be parsing like a pro in no time.

git-lrc
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit




AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a

Top comments (0)