DEV Community

Kenneth Tubman
Kenneth Tubman

Posted on • Originally published at kentubman5.xyz on

Writing a Static Documentation Generator using Tree-sitter [Part 1]

Introduction

Lately I’ve been trying to generate documentation from multiple different repositories that are in different languages, and have found that the tools are mostly inadequate for what I need them to do. Usually I can find some documentation generators for 1 language or maybe 2 (JavaScript/TypeScript) if they are similar. But when using multiple languages it doesn’t seem to work as well. So instead of trying to use conversion tools to make it fit for the existing documentation generators, I thought I’d take a stab. The tool I want to use is Tree-sitter to parse the different languages since there are a lot of parsers available and the output format is convenient and easy to use. Neovim uses it for internally now and it has allowed for some fast and accurate syntax highlighting, so I know it works well.

Testing the waters

I started out by just making sure I could parse a particular string of code into an AST (Abstract Syntax Tree). I used TypeScript to test it since it’s what I used but you could use it with any language that supports tree-sitter.

const Parser = require('tree-sitter');
const TypeScript = require('tree-sitter-typescript');

const tsParser = new Parser();
tsParser.setLanguage(TypeScript.typescript);

const tree = tsParser.parse(`
/**
 * @returns string Hello world
 */
const hello = () => {
  return "Hello world";
}
`);

console.log(tree.rootNode.toString());
Enter fullscreen mode Exit fullscreen mode

It worked well and produced a nice AST that looks like this:

(program
  (comment) 
  (lexical_declaration 
    (variable_declarator 
      name: (identifier)
      value: (arrow_function
        parameters: (formal_parameters)
        body: (statement_block (return_statement (string)))))))
Enter fullscreen mode Exit fullscreen mode

For my purposes I want the comment nodes in the AST. Then what I need to do is check if those comments match the standard in the specific language. The standard I want to use for the comments is JSDoc. Luckily there is a JSDoc grammar written for tree-sitter, so I can just download tree-sitter-jsdoc and use it.

const jsdocParser = require('tree-sitter-jsdoc');

const commentNode = tree.rootNode.child(0).text;
console.log(commentNode);

const jsDocTree = jsdocParser.parse(commentNode);
console.log(jsDocTree.rootNode.toString());
Enter fullscreen mode Exit fullscreen mode

This turns the comment string:

/**
 * @returns string Hello world
 */
Enter fullscreen mode Exit fullscreen mode

into this:

(document (tag (tag_name) (description)))
Enter fullscreen mode Exit fullscreen mode

So next I need to figure out how to traverse the AST to find these comment nodes and then parse what is below them to document them.

Another thing that needs to be figured out is how to keep structure. When there is a class in TypeScript or other languages, we need to keep track of that so we know that it belongs in the same documentation file.

Retry later

Top comments (0)

Retry later
Retry later