Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a first of its kind tool for helping you automatically index API endpoints across all your repositories. LiveAPI helps you discover, understand and use APIs in large tech infrastructures with ease.
Tree-sitter is a powerful parsing library that generates syntax trees for code, making it a go-to for tools like code editors and linters. Its C API is the backbone for integrating Tree-sitter into projects, but it can feel daunting with its many types and functions. This guide breaks down the Tree-sitter C API, focusing on practical usage with clear examples. We'll explore how to set up a parser, parse code, navigate syntax trees, and query them, all while keeping things developer-friendly.
Why Tree-sitter's C API Matters
The C API is the core interface for Tree-sitter, offering fine-grained control over parsing and syntax tree manipulation. It's used by language bindings (like Rust or Python) and directly in C/C++ projects for performance-critical applications. Understanding it helps you:
- Integrate Tree-sitter into custom tools.
- Optimize parsing for specific use cases.
- Debug issues when higher-level bindings fall short.
The API is defined in tree_sitter/api.h (available on GitHub). It revolves around a few key concepts: parsers, trees, nodes, and queries. Let’s dive into the essentials.
Setting Up a Parser
To use Tree-sitter, you first need a parser. The TSParser struct is your entry point, and setting it up involves creating it and assigning a language.
Key Functions
| Function | Description | 
|---|---|
| ts_parser_new | Creates a new parser. | 
| ts_parser_set_language | Assigns a language to the parser. | 
| ts_parser_delete | Frees the parser. | 
Example: Initializing a Parser
Here’s how to set up a parser for JavaScript using a hypothetical tree_sitter_javascript language (you’d typically get this from a compiled language module).
#include <tree_sitter/api.h>
#include <stdio.h>
// Assume tree_sitter_javascript is defined elsewhere
extern const TSLanguage *tree_sitter_javascript();
int main() {
    // Create parser
    TSParser *parser = ts_parser_new();
    // Set language
    const TSLanguage *lang = tree_sitter_javascript();
    if (!ts_parser_set_language(parser, lang)) {
        fprintf(stderr, "Language version mismatch\n");
        ts_parser_delete(parser);
        return 1;
    }
    // Clean up
    ts_parser_delete(parser);
    return 0;
}
Output: No output if successful; prints an error if the language version is incompatible.
Notes:
- 
Language versioning is critical. The API supports languages with ABI versions between TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION(13) andTREE_SITTER_LANGUAGE_VERSION(15).
- Always check the return value of ts_parser_set_languageto catch version mismatches.
Parsing Code into a Syntax Tree
Once you have a parser, you can parse code to create a TSTree. The tree represents the code’s structure, with nodes for each syntactic element (e.g., functions, variables).
Key Functions
| Function | Description | 
|---|---|
| ts_parser_parse_string | Parses a string into a syntax tree. | 
| ts_tree_root_node | Gets the root node of the tree. | 
| ts_tree_delete | Frees the tree. | 
Example: Parsing JavaScript Code
This example parses a simple JavaScript function and prints the root node’s type.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(
        parser,
        NULL,  // No old tree for first parse
        code,
        strlen(code)
    );
    if (tree == NULL) {
        fprintf(stderr, "Parsing failed\n");
        ts_parser_delete(parser);
        return 1;
    }
    TSNode root = ts_tree_root_node(tree);
    printf("Root node type: %s\n", ts_node_type(root));
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Output:
Root node type: program
Notes:
- The NULLold_tree parameter is used for initial parses. For incremental parsing (e.g., after code edits), pass the previous tree.
- The root node’s type (program) is language-specific, defined in the language’s grammar.
Navigating the Syntax Tree
The syntax tree is a hierarchy of TSNode objects, each representing a syntactic construct. You can traverse the tree to inspect nodes, their types, and their positions.
Key Functions
| Function | Description | 
|---|---|
| ts_node_child | Gets a child node by index. | 
| ts_node_named_child | Gets a named child (excludes anonymous nodes like string literals). | 
| ts_node_type | Returns the node’s type as a string. | 
| ts_node_start_point | Gets the node’s start position (row, column). | 
Example: Traversing a Tree
This code parses a JavaScript function and prints its named children.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
    TSNode root = ts_tree_root_node(tree);
    uint32_t child_count = ts_node_named_child_count(root);
    printf("Named children of root (%s):\n", ts_node_type(root));
    for (uint32_t i = 0; i < child_count; i++) {
        TSNode child = ts_node_named_child(root, i);
        TSPoint start = ts_node_start_point(child);
        printf("  %u: %s at (%u, %u)\n", i, ts_node_type(child), start.row, start.column);
    }
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Output:
Named children of root (program):
  0: function_declaration at (0, 0)
Notes:
- 
Named vs. anonymous nodes: Named nodes (e.g., function_declaration) correspond to grammar rules, while anonymous nodes (e.g.,"(") are literals.
- Use ts_node_start_pointandts_node_end_pointfor precise code positions.
Using Tree Cursors for Efficient Traversal
For large trees, iterating with ts_node_child can be slow. The TSTreeCursor provides a more efficient way to traverse trees by maintaining state.
Key Functions
| Function | Description | 
|---|---|
| ts_tree_cursor_new | Creates a cursor starting at a node. | 
| ts_tree_cursor_goto_first_child | Moves to the first child. | 
| ts_tree_cursor_goto_next_sibling | Moves to the next sibling. | 
| ts_tree_cursor_current_node | Gets the current node. | 
Example: Using a Tree Cursor
This example traverses the tree to find all named nodes.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
void traverse(TSNode node) {
    TSTreeCursor cursor = ts_tree_cursor_new(node);
    if (ts_tree_cursor_goto_first_child(&cursor)) {
        do {
            TSNode current = ts_tree_cursor_current_node(&cursor);
            if (ts_node_is_named(current)) {
                printf("Node: %s\n", ts_node_type(current));
                traverse(current);  // Recurse
            }
        } while (ts_tree_cursor_goto_next_sibling(&cursor));
    }
    ts_tree_cursor_delete(&cursor);
}
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
    TSNode root = ts_tree_root_node(tree);
    printf("Starting traversal:\n");
    traverse(root);
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Output:
Starting traversal:
Node: function_declaration
Node: identifier
Node: formal_parameters
Node: statement_block
Node: return_statement
Node: string
Notes:
- Cursors are faster than repeated ts_node_childcalls because they cache traversal state.
- Always call ts_tree_cursor_deleteto avoid memory leaks.
Querying the Syntax Tree
Queries let you search for patterns in the syntax tree, like finding all function declarations. The TSQuery API uses S-expressions to define patterns.
Key Functions
| Function | Description | 
|---|---|
| ts_query_new | Creates a query from an S-expression. | 
| ts_query_cursor_new | Creates a cursor for executing queries. | 
| ts_query_cursor_exec | Runs the query on a node. | 
| ts_query_cursor_next_match | Gets the next match. | 
Example: Finding Function Declarations
This code queries for function declarations and prints their names.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    const char *code = "function hello() { return 'world'; }\nfunction bye() {}";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
    // Create query
    const char *query_str = "(function_declaration name: (identifier) @func-name)";
    uint32_t error_offset;
    TSQueryError error_type;
    TSQuery *query = ts_query_new(
        tree_sitter_javascript(),
        query_str,
        strlen(query_str),
        &error_offset,
        &error_type
    );
    if (!query) {
        fprintf(stderr, "Query error at offset %u\n", error_offset);
        return 1;
    }
    // Execute query
    TSQueryCursor *cursor = ts_query_cursor_new();
    ts_query_cursor_exec(cursor, query, ts_tree_root_node(tree));
    TSQueryMatch match;
    while (ts_query_cursor_next_match(cursor, &match)) {
        for (uint16_t i = 0; i < match.capture_count; i++) {
            TSQueryCapture capture = match.captures[i];
            char *name = ts_node_string(capture.node);
            printf("Found function: %s\n", name);
            free(name);
        }
    }
    ts_query_cursor_delete(cursor);
    ts_query_delete(query);
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Output:
Found function: (identifier "hello")
Found function: (identifier "bye")
Notes:
- The query (function_declaration name: (identifier) @func-name)captures theidentifiernode asfunc-name.
- Check ts_query_newfor errors, as invalid S-expressions will returnNULL.
- Learn more about query syntax in the Tree-sitter documentation.
Handling Code Edits
Tree-sitter supports incremental parsing, which is crucial for real-time applications like editors. You edit the tree to reflect code changes and reparse only the affected parts.
Key Functions
| Function | Description | 
|---|---|
| ts_tree_edit | Updates the tree for an edit. | 
| ts_parser_parse | Reparses with the old tree for efficiency. | 
Example: Updating a Tree
This code edits a JavaScript function and re-parses it.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    const char *old_code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, old_code, strlen(old_code));
    // Simulate edit: change "hello" to "greet"
    TSInputEdit edit = {
        .start_byte = 9,  // Start of "hello"
        .old_end_byte = 14,  // End of "hello"
        .new_end_byte = 14,  // End of "greet"
        .start_point = {0, 9},
        .old_end_point = {0, 14},
        .new_end_point = {0, 14}
    };
    ts_tree_edit(tree, &edit);
    const char *new_code = "function greet() { return 'world'; }";
    TSTree *new_tree = ts_parser_parse_string(parser, tree, new_code, strlen(new_code));
    TSNode root = ts_tree_root_node(new_tree);
    TSNode func = ts_node_named_child(root, 0);
    char *func_name = ts_node_string(ts_node_named_child(func, 0));
    printf("Updated function name: %s\n", func_name);
    free(func_name);
    ts_tree_delete(tree);
    ts_tree_delete(new_tree);
    ts_parser_delete(parser);
    return 0;
}
Output:
Updated function name: (identifier "greet")
Notes:
- The TSInputEditstruct requires precise byte and point offsets, which you’d typically compute from an editor’s change events.
- Incremental parsing is much faster than re-parsing from scratch.
Debugging and Logging
Tree-sitter provides tools to debug parsing, like logging and generating DOT graphs for visualization.
Key Functions
| Function | Description | 
|---|---|
| ts_parser_set_logger | Sets a callback for parse/lex logs. | 
| ts_parser_print_dot_graphs | Outputs DOT graphs to a file descriptor. | 
Example: Adding a Logger
This code logs parsing events to stderr.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
void log_callback(void *payload, TSLogType type, const char *msg) {
    fprintf(stderr, "[%s] %s\n", type == TSLogTypeParse ? "PARSE" : "LEX", msg);
}
int main() {
    TSParser *parser = ts_parser_new();
    ts_parser_set_language(parser, tree_sitter_javascript());
    TSLogger logger = { .payload = NULL, .log = log_callback };
    ts_parser_set_logger(parser, logger);
    const char *code = "function hello() { return 'world'; }";
    TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
    ts_tree_delete(tree);
    ts_parser_delete(parser);
    return 0;
}
Output (example, varies by language):
[PARSE] parsing rule: program
[LEX] token: function
...
Notes:
- Use ts_parser_print_dot_graphswith a file descriptor to visualize trees (pipe todot -Tsvgfor SVG output).
- Logging is verbose but invaluable for debugging grammar issues.
Practical Tips for Using the C API
To wrap up, here are actionable tips for working with Tree-sitter’s C API:
- Start small: Begin with simple parsing and traversal before tackling queries or incremental parsing.
- 
Check return values: Functions like ts_parser_set_languageandts_query_newcan fail silently if not checked.
- Use cursors for traversal: They’re faster and cleaner than manual node iteration for large trees.
- Leverage incremental parsing: For real-time applications, always edit and reuse trees to save time.
- Debug with logs and graphs: Enable logging or DOT output to understand parsing issues.
- 
Read the source: The api.hfile is well-documented and the ultimate reference.
The C API is low-level but gives you total control over Tree-sitter’s capabilities. Whether you’re building a code editor, linter, or custom tool, mastering it unlocks powerful parsing features. Experiment with the examples, tweak them for your language, and you’ll be parsing like a pro in no time.
 

 
    
Top comments (0)