DEV Community

Cover image for Tackling a large volume of simple problems in an Open Source project
Mike Stemle
Mike Stemle

Posted on

Tackling a large volume of simple problems in an Open Source project

Hello again!

I've been contributing to a really helpful Rust crate by @xampprocky called "octocrab". I started using this module first because it's hands-down the best GitHub library in the Rust ecosystem, and I'm doing a lot of automation against GitHub for my day job.

As with most emerging ecosystems, there are a number of functionality gaps even though the library is more-or-less fantastic. The primary maintainer has spent a lot of time shepherding the project, but as with anything as big as a GitHub library stuff is bound to be missed. Enter my problem.


The Problem

As part of a project I have been working on to build out some tools for enterprise GitHub management, I have found that there's a pattern of functions which are more likely than not to be broken in this library. Here's the pattern:

  1. The API that the function calls must return a 204
  2. The API that the function calls must not return a post body (this goes hand-in-hand with returning a 204)

That's it!

Finding functions which fit this pattern turned out to be really simple, but the set it returned is kinda daunting. Let me first walk you through how I produced the list.

A chicken, on a touchscreen keyboard, doing what needs to be done

Data Acquisition

As most organizations with APIS do, GitHub has released a great deal of documentation regarding their APIs. I don't really need much in the way of human-readable documentation right now, though, I need a list of endpoints, their HTTP methods, and the expected HTTP status codes. This is the sort of thing you can get from an OpenAPI spec, or a Swagger file, and as luck would have it, GitHub has produced exactly that!

This is an OpenAPI spec for GitHub's APIs, specifically the "Enterprise Cloud" version. This means you'll get all of the fun new stuff regarding GitHub advanced security and their other enterprise management features. Essentially, this is the most comprehensive set of APIs that GitHub offers.

All I've gotta do is

❯ wget https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/ghec/ghec.yaml
Enter fullscreen mode Exit fullscreen mode

and now I've got my data. Yay!

Data Processing

My goal is to make a GitHub Project that I can then use to track my progress toward the goal of fully covering the endpoints matching this pattern. Now that I have the OpenAPI spec, though, I need to first isolate the endpoints I want to consider. For that, we're going to need a little bit of code. Here's my flow:

  • Parse the YAML
  • Filter out all every endpoint that doesn't have a 204 response defined.
  • Output a CSV file which has all of the fields we'll want to import into GitHub as Issues

The code is pretty quick, and looks something like this:

const _ = require('lodash')
const YAML = require('yaml')
const { readFileSync } = require('fs')

const fileContents = readFileSync('ghec.yaml', { encoding: 'utf-8' })

// Who do you want the issues to be assigned to? (empty string for none)
const assignee = 'YOUR USERNAME HERE'

// A comma-separated list of labels
const label = 'to investigate'

// All HTTP methods which could return a 204
const httpMethods = ['get', 'post', 'put', 'delete', 'patch']

// Your CSV header.
const lines = ['"title","body","labels","assignee"']

// Parse the file
const yamlContents = YAML.parse(fileContents)

// Get all of the API paths from inside the YAML structure
const paths = yamlContents.paths

// Iterate through each of the paths, looking for 204s.
_.forEach(paths, (value, key) => {
  // First find the methods for this which have a 204 response
  const methodsWith204 = httpMethods.filter(
    (method) => value?.[method]?.responses?.['204']
  )

  // For all of the methods with a 204 available, add them to the list of records to output
  methodsWith204.forEach((method) => {
    lines.push(
      `"${method.toUpperCase()} ${key}","${method.toUpperCase()} ${key} has statuses ${_.keys(
        value?.[method].responses
      ).join(',')}: ${value?.[method].summary}","${label}","${assignee}"`
    )
  })
})

// Now print out all of the lines as CSV data
console.log(lines.join('\n'))
Enter fullscreen mode Exit fullscreen mode

To run it, you just use the command

node find-204s.cjs > 204-issues.csv
Enter fullscreen mode Exit fullscreen mode

Code quality is usually a top priority, but when you're doing quick-and-dirty (QnD) one-off type programs, my priorities are that the code is simple and that it's easy to come back to later when I need something similar. This script is unlikely to be terribly reusable, and that's OK with me, so we're just gonna let it be what it is. It works, it served its purpose, and now I will put it in my collection of misfit scripts.

More importantly, however, now we have usable CSV data which can be imported into GitHub!

A professional chicken wearing AR goggles, satisfied that the code works and that the data has been processed.

Importing GitHub Issues

The point of all of this so far has been to get a GitHub Project set up with all of these endpoints so that I can investigate whether or not they're broken, so let's put all of these endpoints into issues, so that they're easily imported into a project!

It is very important to note that I am not adding these issues to the upstream repository, I'm adding them to my own fork of that repository. I'm not trying to interrupt what the mainline project is doing, instead I'm running my own initiative to fix what I see as a problem. I promise you, maintainers love this as long as you follow their contributing guidelines (usually found in a CONTRIBUTING.md or README.md file at the top-level of the repository).

To import the issues, GitHub does not yet have the ability in their web UI to import a CSV file, but they do have APIs which let you add issues. Lucky for me, a friend who goes by @gavinr made a lovely little Node script that can do this for me, so I don't have to write another QnD!

My CSV file contains 263 issues, so with rate-limiting it'll take a little bit of time...

githubCsvTools \
  --organization manchicken \
  --repository octocrab \
  --token "$GITHUB_TOKEN" \
  ./204-issues.csv
Enter fullscreen mode Exit fullscreen mode

I've Got Issues

Look at all of these lovely issues!

A screenshot of GitHub Issues, with 263 issues listed

Notice how they've all got my to investigate label, and they're all assigned to me. Now I can start to turn them into a project!

I made a new project, and now I want to add issues in bulk. GitHub has some documentation for this, and as I do it, it's starting to look pretty good.

A screenshot of GitHub Projects, as issues are being added

I just search for the label to investigate, and then I can add them in batches of 25. It takes a little bit of time, but I'm done pretty quick. It's important when I finish to verify that the number of items in the Project Backlog are now 263, the same number of issues I have. When I finish, sure enough, my record counts are matching.

Now, the Hard Part

So, now I have a project filled with items to investigate. This isn't a sexy task, nobody grows up wanting to burn through tickets and fixing bugs when they grow up, but it is really valuable work. The maintainers of the project have a number of other things competing for their time, and especially with a project of this size it's important for everybody to contribute where they can and where they have interest.

I don't necessarily have an interest in working tickets, but I do have an interest in this project being a high-quality library which meets the needs of its users. That requires tests, that requires making sure that things work, and sometimes that requires taking the tedium of a project like this on and getting it done.

Wrapping Up

I'm going to start working the tickets on this GitHub Project now. First, I'll skim the issues for the ones I've already fixed, and mark them as fixed. From there the process will be a pretty tedious one, but it'll look something like this:

  • Take an issue
  • Check to see if it has a its semantic API function defined
  • If not, mark it as won't fix, since that's a different project
  • If there is a semantic API function...
    • See if there's a test for the function which covers the 204 scenario
    • If there isn't a test, write one on a new branch
    • If the test fails, fix the code so that the test passes
    • Submit the test and the fix as a PR to the upstream branch
  • Close the issue on my project
  • Repeat

It'll take a while to do this, and if anybody is eager to help I'm happy to work with them on this. I see a number of issues on the upstream repo which could be caused by the errors I'm seeing with this pattern. Even if I don't fix any existing issues, though, this project is worth it just to have the tests which prove that these use cases are working properly.

I hope to hear in the comments what you think of this process, and what sorts of stuff you do to help your favorite open source projects.

Quick Note on Participation

Maintainers like the ones I've mentioned here are working hard to make things that they hope will be useful to everybody. Please do me, and all of them, a huge favor and be patient with them. They're doing their best, and in many cases this doesn't pay much of at all.

If you see something wrong with a project you use, and you have the ability to help fix it, please participate in fixing it. It probably won't take as much time as you think it will, and you will help everybody by keeping the project(s) healthy.

Top comments (0)