DEV Community

Cover image for I Built a Tool That Turns TODO Comments Into Actual Documentation
NITISH Sharma
NITISH Sharma

Posted on

I Built a Tool That Turns TODO Comments Into Actual Documentation

Most developers already “document” their code — just not in a structured way.

We write things like:

// TODO: optimize this
// BUG: fix crash here
// NOTE: handle edge case
Enter fullscreen mode Exit fullscreen mode

Over time, these comments pile up across files, and eventually… they become invisible.

I ran into this problem while working on multiple small projects. I knew I had pending work scattered across the codebase, but there was no simple way to track it without introducing another tool or workflow.

So I built DocTrack.


💡 The Idea

Instead of forcing developers to adopt a new system, I wanted to reuse what already exists inside the code.

DocTrack scans a project and extracts structured information from inline comments:

  • What needs to be done (TODO, BUG, etc.)
  • Where it exists (file + line)
  • What context it belongs to (code block)

The goal is simple:

Turn implicit developer notes into explicit, usable documentation.


⚙️ How It Works

At a high level, the tool:

  1. Recursively scans a directory using C++17 filesystem APIs
  2. Reads files line-by-line
  3. Uses regex to detect tagged comments:
  • TODO
  • BUG
  • NOTE
  • FIXME
  • CODENOTE

    1. Extracts:
  • file name

  • line number

  • message

    1. Captures surrounding code context using a brace-tracking approach
    2. Generates output inside a docs/ folder:
  • doc.md

  • report.html (via Pandoc)


🧠 The Interesting Part: Context Extraction

Extracting a single line is easy. Extracting meaningful context is not.

I initially tried using regex to capture entire functions, but that quickly breaks due to:

  • nested {} blocks
  • different coding styles
  • multi-line structures

Instead, I implemented a brace counting strategy:

  • When a relevant tag (like BUG or FIXME) is detected:

    • Start capturing lines immediately
    • Increment counter on {
    • Decrement on }
  • Stop when braces balance

This provides a reasonable approximation of the surrounding code block without needing a full AST parser.

It’s not perfect, but it works reliably for most real-world cases.


📄 Example Output

# FILE: calculator.cpp

### Line 12 [BUG] → fix division logic

int divide(int a, int b) {
    // BUG: division by zero
    return a / b;
}
Enter fullscreen mode Exit fullscreen mode

⚖️ Trade-offs

This approach intentionally avoids:

  • AST parsing (too complex for a first version)
  • language-specific parsing logic

Instead, it favors:

  • speed
  • simplicity
  • language-agnostic behavior

The downside:

  • context detection is heuristic-based
  • edge cases can break block extraction

🚀 Why This Approach

There are already tools for task tracking, but most require:

  • manual input
  • separate interfaces
  • extra discipline

DocTrack works differently:

  • no new workflow
  • no extra effort
  • just leverage what developers already write

🔗 Project

If you're curious or want to try it:

https://github.com/monkonthehill/doctrack


🤔 Open Questions

I’m still exploring a few directions:

  • Should this move toward AST-based parsing for accuracy?
  • Would a VS Code extension be more useful than CLI?
  • How to handle large codebases efficiently?

Would love to hear thoughts from others building developer tools.


📌 Final Thought

Developers already leave a trail of intent inside their code.

The real opportunity is not adding more tools —
but extracting value from what’s already there.

Top comments (2)

Collapse
 
apex_stack profile image
Apex Stack

The brace-counting heuristic for context extraction is a pragmatic call — I've run into the exact same tradeoff on a different problem. I manage a content generation pipeline that processes thousands of pages, and we use a similar "good enough" heuristic approach for extracting structured data from financial reports rather than building full parsers for every source format.

To your open question about AST vs heuristic: I'd stay with the heuristic for v1 and let real usage guide whether AST is worth the complexity. In my experience, 80% of TODO/BUG comments live in straightforward function bodies where brace counting works perfectly. The edge cases (nested lambdas, template metaprogramming) are rare enough that handling them probably isn't worth blocking the first release.

One direction worth exploring: tracking TODO/BUG comment velocity over time. If you run DocTrack on each commit, you could generate a trend of how technical debt accumulates or gets resolved. That's the kind of insight that turns a documentation tool into a project health dashboard. Would be a compelling reason to integrate it into CI.

Collapse
 
nitish_sharma profile image
NITISH Sharma

This is a really solid perspective — especially the point about not overengineering v1.

I went through the same dilemma with AST vs heuristic, and your framing matches what I’ve been seeing: most real-world cases don’t justify the complexity upfront. Brace counting isn’t perfect, but it hits that “good enough for 80%” sweet spot, which is exactly what a first version needs.

Also interesting to hear you’ve applied a similar approach in a content pipeline at scale — that’s reassuring because it validates the idea beyond just code parsing.

The TODO/BUG velocity idea is 🔥

That’s actually a direction I hadn’t fully explored yet, but it makes a lot of sense:

  • Track additions vs resolutions over time
  • Surface trends (growing debt vs cleanup phases)
  • Potential CI integration → project health signal

That shifts DocTrack from:
“documentation extractor”
→ to something closer to a development insight tool

I think that’s where this can become genuinely valuable beyond just convenience.

Appreciate the insight — this is exactly the kind of direction that helps shape what v2 should look like.