A few weeks ago, I was talking with a developer in our Community Slack who was interested in adding their own TODO linter. At face value, this is a trivial problem. There are several linters that already support this to varying degrees, and many of them offer decently extensible configuration and their own plugin ecosystems. But the more I thought about it, the more the question piqued my interest. Trunk supports 100+ linters out of the box (OOTB), but which one would solve this problem best? So I set out to evaluate them all. Here are my findings...
To simplify this experiment, we should clarify what makes for a good TODO linter. Depending on your team’s culture, you may want to prevent any TODOs from making it to main, or you may just want to keep tabs on them. But at a minimum, a TODO linter should satisfy the following:
- Easily and quickly report what files have “TODO” strings and where
- Support multiple languages/file types
- Don’t generate additional noise (“mastodon” isn’t a todo)
As a bonus, some TODO linters might:
- Require specific syntax for TODO comments (e.g. clang-tidy)
- Support other keywords and cases (e.g. FIXME)
- Be able to ignore false positives as appropriate (automatically handled with trunk-ignore)
Now that we have our criteria, let’s dive in. All examples (both with and without Trunk) can be found in this sample repo, so feel free to follow along! If you haven’t used Trunk before, you can follow our setup instructions in our docs.
The Sample File
We'll lint this file with all the tools we test in this blog. This file has some real TODO comments and some fake TODOs meant to confuse linters.
# Test Data
A collection of different ways that TODO might show up.
``yaml
# TODO: Make this better
version: 0.1
``
``typescript
// TODO(Tyler): Optimize this
const a = !!!false;
``
<!-- MASTODON is not a fixme -->
## Another Heading
Look at all the ways to check for todo!
<!-- trunk-ignore-begin(todo-grep-wrapped,codespell,cspell,vale,semgrep,trunk-toolbox) -->
Let's ignore this TODO though
<!-- trunk-ignore-end(todo-grep-wrapped,codespell,cspell,vale,semgrep,trunk-toolbox) -->
Per-Language Rules
Let’s try a naive approach. Several linters have built-in rules to check for TODOs (e.g. ruff, ESLint). Many others support plugin ecosystems to add your own rules. Let’s take a look at markdownlint’s approach to this, using the markdownlint-rule-search-replace package. Run trunk check enable markdownlint
to get started.
In order to configure the rule, we must modify .markdownlint.json:
{
"default": true,
"extends": "markdownlint/style/prettier",
"search-replace": {
"rules": [
{
"name": "found-todo",
"message": "Don't use todo",
"searchPattern": "/TODO/gi"
}
]
}
}
Then, we can run it and inspect the output:
Note that we have a trunk-ignore
to suppress the TODO
on line 24.
Markdownlint here gets the job done, but will of course only work on MD files. As soon as you start to add other file types, even YAML or JS, it doesn’t scale, and you’ll lose coverage and consistency, and chasing down the particular incantation to do this for every linter is intractable. Let’s look at some other more sustainable options.
CSpell
CSpell is a relatively extensible code spellchecker. It’s easy to use OOTB, and it runs on all file types. However, it has a high false positive rate and requires that you manually tune it by importing and defining new dictionaries. Let’s see what it takes to turn it into a TODO linter. First, run trunk check enable cspell
.
We can define our own dictionary or simply add a list of forbidden words to cspell.yaml:
version: "0.2"
# Suggestions can sometimes take longer on CI machines,
# leading to inconsistent results.
suggestionsTimeout: 5000 # ms
words:
- "!todo"
- "!TODO"
We end up with a quick case-insensitive search for TODOs, albeit with some messy suggestions. It gets the job done, but getting it production-ready for the rest of our codebase will usually require curating additional dictionaries. Running it on the sample repo flags 22 additional false positive issues.
codespell
codespell is a code spellchecker that takes a different approach. Much like CSpell, it is prone to false positives, but rather than defining dictionaries of allowlists, it looks for specific common misspellings and provides suggestions. This reduces its false positive rate, but it usually still requires some tuning. Run trunk check enable codespell
to get started.
To teach codespell to flag TODOs, we need to define our own dictionary and reference it:
todo->,encountered todo
[codespell]
dictionary = todo_dict.txt
Still a bit cumbersome, but we can fine-tune the replacements if desired. Let’s examine some other options.
Vale
Vale is a code prose checker. It takes a more opinionated approach to editorial style, and thus can require lots of tuning, but it is very extensible. Let’s have it check for TODOs. Run trunk check enable vale
to get started.
Vale has an opinionated, nested structure to define its configuration. For now, we will only do the minimum to check for TODOs:
StylesPath = "styles"
MinAlertLevel = suggestion
Packages = base
[*]
BasedOnStyles = Vale, base
extends: existence
message: Don't use TODO
level: warning
scope: [raw, text]
tokens:
- TODO
If you’re already using Vale, and you’re willing to eat the cost of configuration, it can work quite well! Additionally, you can easily customize which file types and scopes it applies to. Let’s try a few more.
Semgrep
Semgrep is a static analysis tool that offers semantic-aware grep. It catches a number of vulnerabilities out of the box, and it’s fairly extensible. It handles most file types, although anecdotally it struggles in some edge cases (e.g. C++ macros, networkless settings). Run trunk check enable semgrep
to get started.
Thankfully, Semgrep is configured pretty easily and lets us just specify words or patterns to check for. We can add a config file like so:
rules:
- id: check-for-todo
languages:
- generic
severity: ERROR
message: Don't use TODO
pattern-either:
- pattern: TODO
- pattern: todo
It works pretty well!! And we can customize it however we want in their playground, even modifying our pattern to require specific TODO styling. Semgrep seems like a decent contender for a best-effort solution, but let’s give a couple more a try.
trunk-toolbox
trunk-toolbox is our open-source homegrown linter Swiss Army knife. It supports a few different rules, including searching for TODO and FIXME. It works on all file types and is available just by running trunk check enable trunk-toolbox
.
Enable TODO checking in toolbox.toml:
[todo]
enabled = true
This immediately accomplishes the stated goal of a TODO linter–if you just want to find TODOs, just use trunk-toolbox–but it isn’t configurable beyond that.
Grep Linter
Let’s take this one step further. How difficult is it to prototype a solution from scratch? Building a wrapper around grep is the no-brainer solution for this, so let’s start with that.
At its simplest, we can build something like:
lint:
definitions:
- name: todo-grep-linter
description: Uses grep to look for TODOs
files: [ALL]
commands:
- name: lint
run: bash -c "grep -E -i 'TODO\W' --line-number --with-filename ${target}"
output: pass_fail
success_codes: [0, 1]
This pass_fail
linter will just report when we have TODOs. In order to get line numbers, we can wrap this in a script and make it a regex
linter with an output that Trunk Check understands:
#!/bin/bash
set -euo pipefail
LINT_TARGET="${1}"
TODO_REGEX="TODO\W"
GREP_FORMAT="([^:]*):([0-9]+):(.*)"
PARSER_FORMAT="\1:\2:0: [error] Found TODO in line (TODO)"
grep -o -E "${TODO_REGEX}" --line-number --with-filename "${LINT_TARGET}" | sed -E "s/${GREP_FORMAT}/${PARSER_FORMAT}/"
lint:
definitions:
- name: todo-grep-wrapped
description: Uses grep to look for TODOs
files: [ALL]
commands:
- name: lint
run: sh ${cwd}/todo_grep.sh ${target}
output: regex
parse_regex: "((?P<path>.*):(?P<line>-?\\d+):(?P<col>-?\\d+): \\[(?P<severity>.*)\\] (?P<message>.*) \\((?P<code>.*)\\))"
success_codes: [0, 1]
It’s a bit messy, but it gets the job done. It’s another thing to maintain, but you can tune it as much as you want. We’ll definitely be using one of the pre-built solutions, though.
What did we learn?
There are more than a couple of reasonable options, and depending on your appetite for configuration vs. plug-and-play, some make more sense than others. But overall, using an existing language-agnostic tool performs much better.
And regardless of your preference, all of these options can be super-charged by Trunk. Using githooks and CI gating, you can prevent TODOs from ever landing if that’s your taste. Or, you can burn them down incrementally, only tackling new issues with Hold the Line. You can always make TODOs a non-blocking threshold if need be, or turn them on for yourself without blocking your team.
We all end up with more TODOs than we’d like, but it’s important to build processes that track them (and if necessary gate them) so they don’t get out of hand, just like any other linting issue. There are lots of reasonable options to choose from, but it’s important to make an informed decision when adopting a generalizable approach to linting.
If this post interests you, come check out our other linter definitions in our open-source plugins repo or come chat with us on Slack!
Top comments (0)