Most developers already “document” their code — just not in a structured way.
We write things like:
// TODO: optimize this
// BUG: fix crash here
// NOTE: handle edge case
Over time, these comments pile up across files, and eventually… they become invisible.
I ran into this problem while working on multiple small projects. I knew I had pending work scattered across the codebase, but there was no simple way to track it without introducing another tool or workflow.
So I built DocTrack.
💡 The Idea
Instead of forcing developers to adopt a new system, I wanted to reuse what already exists inside the code.
DocTrack scans a project and extracts structured information from inline comments:
- What needs to be done (TODO, BUG, etc.)
- Where it exists (file + line)
- What context it belongs to (code block)
The goal is simple:
Turn implicit developer notes into explicit, usable documentation.
⚙️ How It Works
At a high level, the tool:
- Recursively scans a directory using C++17 filesystem APIs
- Reads files line-by-line
- Uses regex to detect tagged comments:
- TODO
- BUG
- NOTE
- FIXME
-
CODENOTE
- Extracts:
file name
line number
-
message
- Captures surrounding code context using a brace-tracking approach
- Generates output inside a
docs/folder:
doc.mdreport.html(via Pandoc)
🧠 The Interesting Part: Context Extraction
Extracting a single line is easy. Extracting meaningful context is not.
I initially tried using regex to capture entire functions, but that quickly breaks due to:
- nested
{}blocks - different coding styles
- multi-line structures
Instead, I implemented a brace counting strategy:
-
When a relevant tag (like BUG or FIXME) is detected:
- Start capturing lines immediately
- Increment counter on
{ - Decrement on
}
Stop when braces balance
This provides a reasonable approximation of the surrounding code block without needing a full AST parser.
It’s not perfect, but it works reliably for most real-world cases.
📄 Example Output
# FILE: calculator.cpp
### Line 12 [BUG] → fix division logic
int divide(int a, int b) {
// BUG: division by zero
return a / b;
}
⚖️ Trade-offs
This approach intentionally avoids:
- AST parsing (too complex for a first version)
- language-specific parsing logic
Instead, it favors:
- speed
- simplicity
- language-agnostic behavior
The downside:
- context detection is heuristic-based
- edge cases can break block extraction
🚀 Why This Approach
There are already tools for task tracking, but most require:
- manual input
- separate interfaces
- extra discipline
DocTrack works differently:
- no new workflow
- no extra effort
- just leverage what developers already write
🔗 Project
If you're curious or want to try it:
https://github.com/monkonthehill/doctrack
🤔 Open Questions
I’m still exploring a few directions:
- Should this move toward AST-based parsing for accuracy?
- Would a VS Code extension be more useful than CLI?
- How to handle large codebases efficiently?
Would love to hear thoughts from others building developer tools.
📌 Final Thought
Developers already leave a trail of intent inside their code.
The real opportunity is not adding more tools —
but extracting value from what’s already there.
Top comments (2)
The brace-counting heuristic for context extraction is a pragmatic call — I've run into the exact same tradeoff on a different problem. I manage a content generation pipeline that processes thousands of pages, and we use a similar "good enough" heuristic approach for extracting structured data from financial reports rather than building full parsers for every source format.
To your open question about AST vs heuristic: I'd stay with the heuristic for v1 and let real usage guide whether AST is worth the complexity. In my experience, 80% of TODO/BUG comments live in straightforward function bodies where brace counting works perfectly. The edge cases (nested lambdas, template metaprogramming) are rare enough that handling them probably isn't worth blocking the first release.
One direction worth exploring: tracking TODO/BUG comment velocity over time. If you run DocTrack on each commit, you could generate a trend of how technical debt accumulates or gets resolved. That's the kind of insight that turns a documentation tool into a project health dashboard. Would be a compelling reason to integrate it into CI.
This is a really solid perspective — especially the point about not overengineering v1.
I went through the same dilemma with AST vs heuristic, and your framing matches what I’ve been seeing: most real-world cases don’t justify the complexity upfront. Brace counting isn’t perfect, but it hits that “good enough for 80%” sweet spot, which is exactly what a first version needs.
Also interesting to hear you’ve applied a similar approach in a content pipeline at scale — that’s reassuring because it validates the idea beyond just code parsing.
The TODO/BUG velocity idea is 🔥
That’s actually a direction I hadn’t fully explored yet, but it makes a lot of sense:
That shifts DocTrack from:
“documentation extractor”
→ to something closer to a development insight tool
I think that’s where this can become genuinely valuable beyond just convenience.
Appreciate the insight — this is exactly the kind of direction that helps shape what v2 should look like.