Introduction
Lately I’ve been trying to generate documentation from multiple different repositories that are in different languages, and have found that the tools are mostly inadequate for what I need them to do. Usually I can find some documentation generators for 1 language or maybe 2 (JavaScript/TypeScript) if they are similar. But when using multiple languages it doesn’t seem to work as well. So instead of trying to use conversion tools to make it fit for the existing documentation generators, I thought I’d take a stab. The tool I want to use is Tree-sitter to parse the different languages since there are a lot of parsers available and the output format is convenient and easy to use. Neovim uses it for internally now and it has allowed for some fast and accurate syntax highlighting, so I know it works well.
Testing the waters
I started out by just making sure I could parse a particular string of code into an AST (Abstract Syntax Tree). I used TypeScript to test it since it’s what I used but you could use it with any language that supports tree-sitter.
const Parser = require('tree-sitter');
const TypeScript = require('tree-sitter-typescript');
const tsParser = new Parser();
tsParser.setLanguage(TypeScript.typescript);
const tree = tsParser.parse(`
/**
* @returns string Hello world
*/
const hello = () => {
return "Hello world";
}
`);
console.log(tree.rootNode.toString());
It worked well and produced a nice AST that looks like this:
(program
(comment)
(lexical_declaration
(variable_declarator
name: (identifier)
value: (arrow_function
parameters: (formal_parameters)
body: (statement_block (return_statement (string)))))))
For my purposes I want the comment nodes in the AST. Then what I need to do is check if those comments match the standard in the specific language. The standard I want to use for the comments is JSDoc. Luckily there is a JSDoc grammar written for tree-sitter, so I can just download tree-sitter-jsdoc
and use it.
const jsdocParser = require('tree-sitter-jsdoc');
const commentNode = tree.rootNode.child(0).text;
console.log(commentNode);
const jsDocTree = jsdocParser.parse(commentNode);
console.log(jsDocTree.rootNode.toString());
This turns the comment string:
/**
* @returns string Hello world
*/
into this:
(document (tag (tag_name) (description)))
So next I need to figure out how to traverse the AST to find these comment nodes and then parse what is below them to document them.
Another thing that needs to be figured out is how to keep structure. When there is a class in TypeScript or other languages, we need to keep track of that so we know that it belongs in the same documentation file.
Top comments (0)