samuelfatzinger

Posted on Mar 24

Turning Documentation Instinct Into a CLI Tool for Finding README Gaps

#technicalwriting #opensource #documentation #github

Search and Scroll

As mentioned in my previous post, contributing documentation to open source can be daunting for a technical writer trying to get started. Something I didn’t mention was the physical slog that can also be discouraging, even for writers who are already comfortable contributing. By slog, I don’t mean the work of editing, reorganizing, or pushing commits. I’m talking about the search and scroll for suitable repositories.

You open repo after repo, scan the README, decide it’s not a good fit, and move on. It can be monotonous, tedious, and frustratingly mind-numbing. What makes this even harder for non-coders is that suitability narrows the scope further. GitHub is built around code, and finding projects that benefit from documentation work takes time.

There are several excellent sites designed to help new contributors get their foot in the door, such as:

firsttimersonly.com,
First Contributions, and
goodfirstissue.dev.

These types of guides also help contributors navigate searches and tags like “documentation” and “good first issue.”

As a new contributor, I used these tags and search parameters to find repositories in need of documentation updates, only to discover that the content didn’t always need repair or that the issue required more technical knowledge than expected. I know from experience how disheartening it can be to want to contribute but spend limited free time scrolling through search screens, wearing down a mouse wheel, and clicking “Next Page.”

There Has to Be a Better Way

One morning last week I sat staring at my screen, convincing myself that time spent sipping coffee wasn’t wasted as my vision unfocused and drifted toward the wall. I put myself to work, but while typing into GitHub’s search field, I thought: there has got to be a better way to do this.

Third-party sites are useful, but from my experience they are often geared toward first-time coders or people looking for specific projects. YMMV. I wanted something that would make my searches more efficient so I could spend more time writing and less time scrolling.

I started wondering if I could build a tool to automate the search so I could spend more time contributing and less time looking.

So I did.

readme-radar

The result is readme-radar, a CLI tool that searches GitHub for repositories with weak or missing README files that could benefit from documentation work. The project is available here: readme-radar on GitHub

The premise is simple:

fetch repos → evaluate READMEs → rank candidates → show results

The goal is to reduce manual scanning and identify documentation opportunities quickly. It looks like this:

Input

python readme_radar.py "python cli" --show 1 --compact

Output

readme-radar
============
Query: python cli
Scanned: 30
Flagged: 5
Shown: 1
Strong candidates: 1
Good candidates: 0

Top issues:
2 - README under 100 words
1 - Missing README

1. STRONG CANDIDATE | user/repo | stars: 12 | score: 92 | README under 100 words
   https://github.com/user/repo
   other issues:
   - Missing Installation section
   - Missing Usage section

Scoring and Ranking

The most important pieces here are the search and ranking metrics. I’m only looking for repository candidates I can realistically contribute to, which means building a scoring model that fits my needs and capabilities.

The ratings are based on factors like whether a README exists, whether it’s substantial, missing sections, and overall brevity. These are key indicators of what can make a README weak or in need of improvement.

Often what I have found is that weak READMEs work more as placeholders that never get revisited. They are short and lack specificity, or are sometimes filled with repetitive content meant to take up space, sometimes with multiple, redundant, numerous, excessive repetitions for the sake of filling space. Both of these are great candidates for improved documentation.

To be clear, just because a README is short is NOT automatically an indication that it is weak. Sometimes short fits the bill. I have contributed to repos that had short READMEs that needed a few language edits and nothing more. But, brief can be bad if it is brief for lacking what’s necessary for the user.

Weak READMEs are usually missing headers and sections as well, either as a continuation of the previously stated causes, or because the README is a wall of text without any markdown at all. These can be flagged by readme-radar and often make excellent candidates. Lack of sections or proper markdown can be an organization issue rather than missing content, and those feel great to contribute to because the author knew what they needed to include, you’re just there to help tidy up a little bit.

The tool’s candidate results are ranked so the weakest candidates, which are often the best fit for documentation work, appear at the top, and then descend in order according to the scoring metrics. This lets the user see the strongest candidates first with little to no scrolling.

The compact output prints key metrics on the top lines to improve scannability even more. Rather than reading dozens or even hundreds of READMEs, I may only need to look at a handful to decide what’s worth opening.

The tool also supports JSON export. This makes it possible to save candidate lists from a strong search result or build workflows around specific contribution sessions. Instead of picking through repositories one at a time, the tool makes it easier to build a queue and track documentation opportunities.

Designed for Writers

Engineers start with a focus on code quality. Technical writers start with a focus on connectivity. It’s all about the entry point: What is this? Why is it useful to me? How do I use it? These are questions that echo in the ravine between engineer and end user. A technical writer’s job is to build a bridge across that gap. Sometimes there’s just a rope flung across that gap. It’s staked down at each end, sagging in the middle, and a bit frayed. It works, but it’s not inviting. It’s not exactly safe to walk across either. A technical writer sees that rope and thinks about adding a few more, braiding them together, and putting in better stakes.

When I first started contributing, I wanted to make every bridge I came across a marvel of civil engineering. But that’s not necessary in most cases. All I was doing was frustrating myself by wanting to do more. I was gilding the lily. Documentation, cliché or not, really is a “less is more” kind of work.

Part of the problem for me was that I was spending so much time searching for something to work on that when I finally found a candidate I wanted to validate my time spent searching by doing too much. What readme-radar has helped me with is cutting that work anxiety because I know there are other readily available projects to contribute to. It’s easy to find a project, do the work, and move on to the next one. It’s a tool built specifically with writers in mind by streamlining the workflow and facilitating efficiency during contribution sessions.

Finding Friction

I kept running into the same friction: finding something worth improving took longer than the improvement itself. It should be about making things easier for the user, and in this case, that user was me.

Building a tool like this changed how I started thinking about my own portfolio of work. I had always thought of myself as separate from coding, despite having some knowledge and experience. I imagined open source as a world populated by engineers, with technical writers working somewhere on the periphery.

This project shifted that perspective. It showed me that there isn’t just overlap, but an interweaving of skills. Documentation, tooling, and development all support each other.

readme-radar started as a way to reduce the search and scroll. It ended up reinforcing something broader: technical writers don’t just improve documentation. Sometimes they build the tools that make better documentation easier to find.

If you’ve found good ways to identify documentation gaps in open source, I’d be interested to hear your approach.

Feedback on readme-radar is welcome as well.

DEV Community