Over the past few years, my software developer friends and I have been endlessly confronted with the need to design systems in ways we haven't done before and learn the infinite stream of new apps and tools. Now don't get me wrong, all this progress and variety is great and I love every bit of it. It's just that when we sit down to learn the actual new goodies and design the next app, we usually miss a certain kind of information. Of course, we have the docs, we have the hello worlds, and sometime after the tool release, the 'net gets a fair share of medium.com posts, and that's great!
But we don't get to see the usage of these tools and libraries inside live production systems, inside large-scale apps, and inside mobile apps and how they are used in the wild to solve real-world problems. It's one thing to see simple proof of concept usage of a messaging broker. However, we always wanted to see and explore how are the PROs using these building blocks to coordinate flocks of microservices or how it affects the systems architecture.
So how does one approach this?
I was lucky enough that I somehow stumbled upon withspectrum/spectrum repo and found out that there actually are great applications running on the internet, under load and are dutifully maintained while being open-source! Spectrum has been back then one of the most eye-opening experiences for me as a junior developer. Although I didn't actually ever got to build a system like that, it taught me a great deal on how such app operates and how can various libraries be used.
Looking for more
Since then, throughout the years, I was always on the lookout for more similar apps, but I could only find a handful more, and every time it was more luck than conscious discovery.
One night a few weeks ago, I decided to step up the searching efforts. It took a few hours until I realized that searching online wouldn't be as effective as I hoped. I could search by library names or by programming language but a query "find all javascript repositories with express, react and mongoose inside package.json"
was not fruitful as the vast majority of results were libraries instead of applications. I needed to query non-published node packages.
Perhaps I can make an index?
If there was be a way to fetch all package.json files from all repositories, it would be possible to search all their dependencies and filter out the ones matching. Thankfully GitHub API is one excellent interface, and as luck would have it, they enable you to do just that! I always wanted to make a site that others would benefit from. And with a little rxjs
cooking my crawler was born.
const searchGithub = () => {
const controller$ = new BehaviorSubject(new Date("2010-01-01T00:00:00Z"));
return controller$
.pipe(
takeWhile((created) => {
if (created < new Date()) {
return true;
}
return false;
}),
mergeMap((created) => {
const end = add(created, { months: 1 });
const q = Object.entries({
created: `${created.toISOString()}..${end.toISOString()}`,
language: "javascript",
stars: ">=1000",
})
.map(([k, v]) => `${k}:${v}`)
.join(" ");
return octokit
.paginate(octokit.rest.search.repos, {
q,
})
.then((results) => {
controller$.next(end);
return results || [];
});
}, 1),
concatAll()
)
.subscribe({
next: (repo) => {
// store repository information into database
},
complete: () => {
console.log("Scan complete");
},
});
};
This observable finds all the repos with more than 1000 stars and that are written in javascript. The only hiccup was the limit on the GitHub side, where a single query can return at most 100 pages. This can be overcome by splitting the search request by repository creation date. The only thing remaining is to fetch the package.json and make the index.
const fetchPkgJson = (repo) => {
return from(repo.full_name).pipe(
mergeMap((full_name) => {
return octokit.paginate(octokit.rest.search.code, {
q: `repo:${full_name} filename:package extension:json`,
});
}, 1),
filter((items) => items.length > 0),
concatAll(),
mergeMap(async (item) => {
const { data } = octokit.request(
"GET /repositories/{id}/git/blobs/" + item.sha,
{
id: item.repository.id,
}
);
const stringPackageJson = Buffer.from(data.content, "base64").toString();
return JSON.parse(stringPackageJson);
}, 1)
);
};
A few more nights were then spent building react UI, setting up the hosting, and the https://codelib.club was born. I tried to make it as simple as possible -- no fuss or distraction. When you search a library you are presented with the list of repositories. Upon opening one repository there is a list of packages, with the ones containing your query as a dependency highlighted. You can also click a dependency to search it or be taken straight to GitHub search to find exact locations where it is being used.
Goal hit/miss ratio
In the end I was surprised at the sheer number of open-source repositories, solutions and ideas that are out there if you know where to look. You can use codelib to search for apps that are built with whatever dependency you fancy:
- APIs made with express+passport
- Apps with react+stripe
- Services built with mongoose and so much more
There are currently JavaScript and TypeScript repositories scanned with more than 200 stars from github.com and gitlab.com and I'm planning to add more. There are still things missing (like turning on the SSR, perhaps other languages, better ranking system etc.), but the primary use cases are covered. There's probably also a number of bugs 😁 so if you happen to stumble upon some, let me know. Also, if you have any kind of feedback, ideas, praise or hate, please don't hesitate and share it in the comments.
Thanks for reading
Top comments (0)