Hello, I'm Shrijith. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
_Right now, I’m building Tree-sitter is a powerfulparsing library that generates syntax trees for code, making it a go-to for tools like code editors and linters. Its C API is the backbone for integrating Tree-sitter into projects, but it can feel daunting with its many types and functions. This guide breaks down the Tree-sitter C API, focusing on practical usage with clear examples. We'll explore how to set up a parser, parse code, navigate syntax trees, and query them, all while keeping things developer-friendly.Why Tree-sitter's C API Matters
The C API is the core interface for Tree-sitter, offering fine-grained control over parsing and syntax tree manipulation. It's used by language bindings (like Rust or Python) and directly in C/C++ projects for performance-critical applications. Understanding it helps you:
- Integrate Tree-sitter into custom tools.
- Optimize parsing for specific use cases.
- Debug issues when higher-level bindings fall short.
The API is defined in tree_sitter/api.h (available on GitHub). It revolves around a few key concepts: parsers, trees, nodes, and queries. Let’s dive into the essentials.
Setting Up a Parser
To use Tree-sitter, you first need a parser. The TSParser struct is your entry point, and setting it up involves creating it and assigning a language.
Key Functions
| Function | Description |
|---|---|
ts_parser_new |
Creates a new parser. |
ts_parser_set_language |
Assigns a language to the parser. |
ts_parser_delete |
Frees the parser. |
Example: Initializing a Parser
Here’s how to set up a parser for JavaScript using a hypothetical tree_sitter_javascript language (you’d typically get this from a compiled language module).
#include <tree_sitter/api.h>
#include <stdio.h>
// Assume tree_sitter_javascript is defined elsewhere
extern const TSLanguage *tree_sitter_javascript();
int main() {
// Create parser
TSParser *parser = ts_parser_new();
// Set language
const TSLanguage *lang = tree_sitter_javascript();
if (!ts_parser_set_language(parser, lang)) {
fprintf(stderr, "Language version mismatch\n");
ts_parser_delete(parser);
return 1;
}
// Clean up
ts_parser_delete(parser);
return 0;
}
Output: No output if successful; prints an error if the language version is incompatible.
Notes:
-
Language versioning is critical. The API supports languages with ABI versions between
TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION(13) andTREE_SITTER_LANGUAGE_VERSION(15). - Always check the return value of
ts_parser_set_languageto catch version mismatches.
Parsing Code into a Syntax Tree
Once you have a parser, you can parse code to create a TSTree. The tree represents the code’s structure, with nodes for each syntactic element (e.g., functions, variables).
Key Functions
| Function | Description |
|---|---|
ts_parser_parse_string |
Parses a string into a syntax tree. |
ts_tree_root_node |
Gets the root node of the tree. |
ts_tree_delete |
Frees the tree. |
Example: Parsing JavaScript Code
This example parses a simple JavaScript function and prints the root node’s type.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(
parser,
NULL, // No old tree for first parse
code,
strlen(code)
);
if (tree == NULL) {
fprintf(stderr, "Parsing failed\n");
ts_parser_delete(parser);
return 1;
}
TSNode root = ts_tree_root_node(tree);
printf("Root node type: %s\n", ts_node_type(root));
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Root node type: program
Notes:
- The
NULLold_tree parameter is used for initial parses. For incremental parsing (e.g., after code edits), pass the previous tree. - The root node’s type (
program) is language-specific, defined in the language’s grammar.
Navigating the Syntax Tree
The syntax tree is a hierarchy of TSNode objects, each representing a syntactic construct. You can traverse the tree to inspect nodes, their types, and their positions.
Key Functions
| Function | Description |
|---|---|
ts_node_child |
Gets a child node by index. |
ts_node_named_child |
Gets a named child (excludes anonymous nodes like string literals). |
ts_node_type |
Returns the node’s type as a string. |
ts_node_start_point |
Gets the node’s start position (row, column). |
Example: Traversing a Tree
This code parses a JavaScript function and prints its named children.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
TSNode root = ts_tree_root_node(tree);
uint32_t child_count = ts_node_named_child_count(root);
printf("Named children of root (%s):\n", ts_node_type(root));
for (uint32_t i = 0; i < child_count; i++) {
TSNode child = ts_node_named_child(root, i);
TSPoint start = ts_node_start_point(child);
printf(" %u: %s at (%u, %u)\n", i, ts_node_type(child), start.row, start.column);
}
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Named children of root (program):
0: function_declaration at (0, 0)
Notes:
-
Named vs. anonymous nodes: Named nodes (e.g.,
function_declaration) correspond to grammar rules, while anonymous nodes (e.g.,"(") are literals. - Use
ts_node_start_pointandts_node_end_pointfor precise code positions.
Using Tree Cursors for Efficient Traversal
For large trees, iterating with ts_node_child can be slow. The TSTreeCursor provides a more efficient way to traverse trees by maintaining state.
Key Functions
| Function | Description |
|---|---|
ts_tree_cursor_new |
Creates a cursor starting at a node. |
ts_tree_cursor_goto_first_child |
Moves to the first child. |
ts_tree_cursor_goto_next_sibling |
Moves to the next sibling. |
ts_tree_cursor_current_node |
Gets the current node. |
Example: Using a Tree Cursor
This example traverses the tree to find all named nodes.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
void traverse(TSNode node) {
TSTreeCursor cursor = ts_tree_cursor_new(node);
if (ts_tree_cursor_goto_first_child(&cursor)) {
do {
TSNode current = ts_tree_cursor_current_node(&cursor);
if (ts_node_is_named(current)) {
printf("Node: %s\n", ts_node_type(current));
traverse(current); // Recurse
}
} while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
TSNode root = ts_tree_root_node(tree);
printf("Starting traversal:\n");
traverse(root);
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Starting traversal:
Node: function_declaration
Node: identifier
Node: formal_parameters
Node: statement_block
Node: return_statement
Node: string
Notes:
- Cursors are faster than repeated
ts_node_childcalls because they cache traversal state. - Always call
ts_tree_cursor_deleteto avoid memory leaks.
Querying the Syntax Tree
Queries let you search for patterns in the syntax tree, like finding all function declarations. The TSQuery API uses S-expressions to define patterns.
Key Functions
| Function | Description |
|---|---|
ts_query_new |
Creates a query from an S-expression. |
ts_query_cursor_new |
Creates a cursor for executing queries. |
ts_query_cursor_exec |
Runs the query on a node. |
ts_query_cursor_next_match |
Gets the next match. |
Example: Finding Function Declarations
This code queries for function declarations and prints their names.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *code = "function hello() { return 'world'; }\nfunction bye() {}";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
// Create query
const char *query_str = "(function_declaration name: (identifier) @func-name)";
uint32_t error_offset;
TSQueryError error_type;
TSQuery *query = ts_query_new(
tree_sitter_javascript(),
query_str,
strlen(query_str),
&error_offset,
&error_type
);
if (!query) {
fprintf(stderr, "Query error at offset %u\n", error_offset);
return 1;
}
// Execute query
TSQueryCursor *cursor = ts_query_cursor_new();
ts_query_cursor_exec(cursor, query, ts_tree_root_node(tree));
TSQueryMatch match;
while (ts_query_cursor_next_match(cursor, &match)) {
for (uint16_t i = 0; i < match.capture_count; i++) {
TSQueryCapture capture = match.captures[i];
char *name = ts_node_string(capture.node);
printf("Found function: %s\n", name);
free(name);
}
}
ts_query_cursor_delete(cursor);
ts_query_delete(query);
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output:
Found function: (identifier "hello")
Found function: (identifier "bye")
Notes:
- The query
(function_declaration name: (identifier) @func-name)captures theidentifiernode asfunc-name. - Check
ts_query_newfor errors, as invalid S-expressions will returnNULL. - Learn more about query syntax in the Tree-sitter documentation.
Handling Code Edits
Tree-sitter supports incremental parsing, which is crucial for real-time applications like editors. You edit the tree to reflect code changes and reparse only the affected parts.
Key Functions
| Function | Description |
|---|---|
ts_tree_edit |
Updates the tree for an edit. |
ts_parser_parse |
Reparses with the old tree for efficiency. |
Example: Updating a Tree
This code edits a JavaScript function and re-parses it.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
const char *old_code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, old_code, strlen(old_code));
// Simulate edit: change "hello" to "greet"
TSInputEdit edit = {
.start_byte = 9, // Start of "hello"
.old_end_byte = 14, // End of "hello"
.new_end_byte = 14, // End of "greet"
.start_point = {0, 9},
.old_end_point = {0, 14},
.new_end_point = {0, 14}
};
ts_tree_edit(tree, &edit);
const char *new_code = "function greet() { return 'world'; }";
TSTree *new_tree = ts_parser_parse_string(parser, tree, new_code, strlen(new_code));
TSNode root = ts_tree_root_node(new_tree);
TSNode func = ts_node_named_child(root, 0);
char *func_name = ts_node_string(ts_node_named_child(func, 0));
printf("Updated function name: %s\n", func_name);
free(func_name);
ts_tree_delete(tree);
ts_tree_delete(new_tree);
ts_parser_delete(parser);
return 0;
}
Output:
Updated function name: (identifier "greet")
Notes:
- The
TSInputEditstruct requires precise byte and point offsets, which you’d typically compute from an editor’s change events. - Incremental parsing is much faster than re-parsing from scratch.
Debugging and Logging
Tree-sitter provides tools to debug parsing, like logging and generating DOT graphs for visualization.
Key Functions
| Function | Description |
|---|---|
ts_parser_set_logger |
Sets a callback for parse/lex logs. |
ts_parser_print_dot_graphs |
Outputs DOT graphs to a file descriptor. |
Example: Adding a Logger
This code logs parsing events to stderr.
#include <tree_sitter/api.h>
#include <stdio.h>
#include <string.h>
extern const TSLanguage *tree_sitter_javascript();
void log_callback(void *payload, TSLogType type, const char *msg) {
fprintf(stderr, "[%s] %s\n", type == TSLogTypeParse ? "PARSE" : "LEX", msg);
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_javascript());
TSLogger logger = { .payload = NULL, .log = log_callback };
ts_parser_set_logger(parser, logger);
const char *code = "function hello() { return 'world'; }";
TSTree *tree = ts_parser_parse_string(parser, NULL, code, strlen(code));
ts_tree_delete(tree);
ts_parser_delete(parser);
return 0;
}
Output (example, varies by language):
[PARSE] parsing rule: program
[LEX] token: function
...
Notes:
- Use
ts_parser_print_dot_graphswith a file descriptor to visualize trees (pipe todot -Tsvgfor SVG output). - Logging is verbose but invaluable for debugging grammar issues.
Practical Tips for Using the C API
To wrap up, here are actionable tips for working with Tree-sitter’s C API:
- Start small: Begin with simple parsing and traversal before tackling queries or incremental parsing.
-
Check return values: Functions like
ts_parser_set_languageandts_query_newcan fail silently if not checked. - Use cursors for traversal: They’re faster and cleaner than manual node iteration for largetrees.
- Leverage incremental parsing**: For real-time applications, always edit and reuse trees to save time.
- Debug with logs and graphs**: Enable logging or DOT output to understand parsing issues.
- Read the source**: The
api.hfile is well-documented and the ultimate reference.
The C API is low-level but gives you total control over Tree-sitter’s capabilities. Whether you’re building a code editor, linter, or custom tool, mastering it unlocks powerful parsing features. Experiment with the examples, tweak them for your language, and you’ll be parsing like a pro in no time.
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
Top comments (0)