There was a point some years ago when I realized I didn't quite know what state my repositories were in. Exactly which ones had failing build-badges? Was there a pull-request I’d left unmerged somewhere? Didn’t I forget to set a description in one of the repos?
Hardly life-threatening issues, but it kept annoying me, and the lack of clarity kept me comparing repos to ensure consistency. And it wasn’t just my repositories, it started at work too, no-one had a real overview of the state of our business-critical repositories. Isn't there some better way?
Today I’d like to present two tools:
repo-lister — a web-based overview of repositories. It exports a simple static site that's easily hosted via e.g. GitHub Pages or S3.
repo-scraper — a small CLI to scrape repository information (only from GitHub at this point). It outputs a JSON file which repo-lister consumes.
Together they form a simple, secure way to display an overview of your repositories. I've made it available for my repos at repos.jonlauridsen.com if you'd like to see the end-product. And because the website never has any direct access to repositories there are no security concerns, I just need the scraper to run regularly so the underlying data is kept up to date.
The tale of how this came to be
I landed on this specific architecture after originally thinking to make a server that would simply have API access to repositories and render that information. But I felt that approach had three major drawbacks:
Servers are complex. They’re like a constantly running engine, making one that’s maintenance-free and doesn’t break down is… well it’s doable, but it sure would be nice if that whole concern could be avoided. The less moving parts the better.
Servers are costly. Okay a small server isn’t that expensive, but it’s a small cost that could run for years. Do I really want that just to have an overview of my personal repositories?
But most critically, what about security? Can I convince my company to set up a server with direct API access to all our private repositories? Forget about them, can I even convince myself? No matter what, that server would represent a constantly running attack surface and that makes me nervous.
Hence the idea of a static site. That's as simple as anything on the web gets, they can be hosted for free on services like GitHub Pages or S3, and don’t expose any vulnerable surface. The CLI scraper tool would be completely decoupled and only run on demand, and its output is easy to manually inspect to ensure no unwanted information is leaked.
To get started I outlined an initial list of critical features:
Of course it should show basic repo-information like description, number of open pull-requests, etc. But I really wanted to show a list of badges (y’know, those little images that show e.g. CI status). I wanted to see which repos are failing their builds, or where I’d forgotten to add standardjs, or any of the other many kinds of badges that are popular to use.
Isolated components. One thing I knew about UI-dev was to avoid the mess I’d seen at work where the UI was one big tangled codebase with no chance of iterating on any one component. The entire app had to be started every time, and it was a giant pain.
Testable components. UIs seemed hard to test but I wasn’t about to sacrifice clean code principles. This project would be testable, come hell or high water.
One obvious immediate obstacle was I’d never done front-end work before, so I really had no clue how to do a site or what to expect in terms of complexity. But hey what better way to learn than to do a project, right?
I began by testing basic assumptions, like, is it even possible for a static site to render components based on the contents of a JSON file? At the time I didn’t know! And how can a component be isolated and tested? It was during this I came across the Next.js project, a React-based static site generator, and as I’d wanted to learn React it seemed like a great fit for my needs.
Fast forward to one year later (to the day!, my first commit was 01/01/18 and I’m writing this blog 01/01/19) the project is now ready enough to declare a success for my personal needs. Eh it’s still rough and there’s a whole list of ideas I haven’t gotten to, but the fundamentals are in place.
Actually it was a bit of a struggle to get this far. Around July everything kind of worked, the basics were there but it just didn't feel very… controlled. I didn't really feel in control of how the components rendered, and the repository was weighed down by too many dependencies to keep track of; it was just too much noise having to support both the static site configuration, and tested components, and a deployment mechanism that both scraped and re-exported the site. I didn't like maintaining it, and what's the point then of a pet-project?
So even though everything kind of worked I took a deep breath and chopped it into pieces. I extracted all the components into their own repository (repo-components), and put all the deployment stuff into repos.jonlauridsen.com. And then enforced a stricter separation between the site’s concerns versus the, in principle, reusable and product-agnostic React components. Now the components repo deploys a style guide, making it possible to iterate on the components without even thinking about the bigger site. A short 6 months later this rewrite was done (though that downtime was mostly due to moving and starting new job 😄), and, well, here we are.
That’s probably enough storytime, let me explain how to actually use this.
The UI
Let’s start with repo-lister, this is the GUI and it’s really little more than a preconfigured Next.js project. How about we make an overview site for your repositories? To follow along at home you'll need MacOS or some flavor of Linux (I only know Ubuntu), with a modern version of node.js
installed.
To get started launch a terminal, go to a folder you like to keep projects in, and simply clone repo-lister
and run its dev
script, e.g.:
$ git clone https://github.com/gaggle/repo-lister && cd repo-lister && npm ci && npm run dev
This will spin up a locally-hosted development version, available at http://localhost:3000 (It’ll use my default sample data but lets fix that in a minute).
That's nice, but I assume you're looking to deploy this somewhere so it's available from more than just your computer. To do that you need to run this command:
$ next build && next export
That creates an out
folder, and this you can put on a server. But wait, we have a problem, the site loads that repository data from a data.json
file which by default is grabbed from this URL: http://localhost:3000/static/repos/data.json
.
That’s fine for the localhost version but it won’t work wherever you're planning to deploy to. I'm guessing you want to serve this from some DNS address of your own choosing, maybe an S3 bucket or some other static-site server you’ve made accessible. For this example let's say you've prepared a folder somewhere that maps to http://example
. We need to change the site so it looks for the data.json
file on that address. And that's configured via the environment variable DATA_URL
, like so:
$ export DATA_URL=http://example/static/repos/data.json; next build && next export
NOW you can drop that out
folder onto your server and it’ll all work. Okay yes it still just serves my test-data, but we're making progress!
The scraper
To fetch your repository data and generate your own data.json
file you’ll want to use the repo-scraper
tool. If you’re still in the repo-lister
folder from before just run this command to install it:
$ npm install https://github.com/gaggle/repo-scraper.git —save
You now have a CLI available called repo-scrape
. To actually start a scrape you must provide a GitHub API token (with at least permissions to access repositories), passing it in via the GH_TOKEN
environment variable. So the command to run is something like this:
$ export GH_TOKEN=<token>; repo-scrape github
(Theoretically repo-scrape supports other providers, but so far GitHub is the only available one. Leave a comment if you’d like other providers supported)
Once done you have a folder called static
that contains your scraped repository data. On the server you can now replace the contents of static/repos/
with the contents of this static
folder. And hey presto there’s all your repo-data. From now on you can simply re-run repo-scrape
regularly to keep the data up-to-date. Personally I've set it up so it's run daily.
If you actually followed these steps I'd love to hear if it worked for you. If it didn't definitely leave a comment or make an issue in the relevant repo, I'm happy to help. The repo-scrape
CLI has a couple debug options available, see repo-scrape --help
for details.
In conclusion
This certainly isn’t a polished project ready to take over the world, but eh it works for my needs and I think it’s fun to share the private projects we work on, so I thank you for reading this. It’s also my way to participate in the dev.to platform, it’s wonderfully inspiring to read the blogs you all write here and my thanks to @benhalpen et al. for growing this community.
I'd love to hear from you, so leave a comment or catch me on Twitter. I wish you a wonderful day.
Top comments (2)
Hi, thanks!, yeah as you found the scraper outputs JSON, currently just a simple set of keys per repo that gets rendered in this repo-card component
(the data is passed from the JSON file to the component via spread operator in repo-lister’s index page)
There is this slightly old WIP pullrequest to add support for GitLab, I’ll get it dusted off tonight and I’m happy to see if it fits your use-case. Thanks again for your interest.
Just to say I've merged (preliminary) support for Gitlab.com, and cut release 0.3.0.
I didn't manage to get the readme file as HTML via the GL API so no badge-scraping yet, but the basic information should show up. I'll see if I can figure out those badges next time.
Hosted GitLab and the other providers are great suggestions too, I'll mull those over 🤔 And if you end up contributing that's of course always welcomed 😄