Motivation: Github is the largest source code host in the world with more than 254M repositories (at least 52M public repos in 2020, per x-lab report). Beyond source code, Github has become a broader community hub for over 73 million developers worldwide (16M new in 2021).
Methodology: I downloaded data on 247 of the top trending repositories on Github through December 27th by net star growth and methodically categorized each repository: first by whether it was a technology repo or educational repo, then by subcategories.
Six clear trending areas emerged — each deep dived in sections below:
Educational resources (~70/250): patterns, interview prep, build lists…
Web development technologies (~43/250): frameworks on frameworks
Developer tools (~35/250): IDEs, terminals, and really random stuff
Data, AI, ML (~21/250): mostly deep learning
New languages on the block: TypeScript, Go and Rust
MISC: Microsoft, Dev Counterculture, and Surveillance
The full list of repositories can be downloaded here (including descriptions, total contributor, forking, issues, and pull request counts for each repo) and I include a summary table below.
While Github was designed to store code, it has become a major hub for crowdsourcing knowledge and educational resources. In fact, only 3/10 of the top Github repositories today are ‘technologies’ (Vue, React, & Tensorflow), and by far the most popular repository on Github is a free, non-profit, coding camp.
Notable subcategories & examples:
Interview prep: resources geared toward job interview prep (e.g. coding interview university, tech interview handbook). Half of these were geared toward the leetcode platform, most of which had Chinese translations as well.
There is a massive workforce reskilling underway. There are millions of people globally learning to program for the first time — their motivations are often economic emancipation. This is powerful, but also driven by profound change in the labor market. The World Economic Forum has estimated technology will eliminate ~85M jobs, while creating ~97M new jobs by 2025 (link). Not everyone will successfully make this transition, we need to remember this, and we need mechanisms to support those that do not.
Developers ❤️ to learn & build**. **One of my favorite things about working in tech is the people that partake in it: nerds who love continuous learning & makers who love to build!
Content marketing is king: write good content by making it educational for continuous learners and/or applied for makers that like to build 😃
*Marking “Good First Issues” can be effective: *a couple of the ‘build project’ lists are actually mostly compilations of issues labelled “Good First Issue” in open source repos… Newer developers can hone their skills while becoming contributors to your projects if you guide them toward the easier opportunities. There is actually an entire section in Github’s 2021 Octoverse report on the effects of using this label!
Give to get**: **education can be used create moats and unlock new TAMs. For example, the dbt Labs community has become the defacto source for an entire generation of data analysts looking to upskill and is helping bring thousands of companies to the Modern Data Stack. They have their own set of online courses. There are also even independent programs, such as the Analytics Engineers Club, offering a 10 week, cohort-based program.
Another interesting example is Conduktor, a company that leveraged a massively popular online Udemy course and built a community around an enterprise Kafka platform that simplifies the management of pub sub systems.
As a fan of great visualizations, I want to highlight one of my favorite educational repositories: the developer roadmap by Kamran Ahmed. Note, there are interactive versions for Frontend, Backend, DevOps and specific languages. They are super cool, find them at https://roadmap.sh/, also useful for folks trying to understanding various technology landscapes at a higher level.
I include the top repos below. *Ant-design *leads the list, though I suspect there is a data quirk at hand.
Cross platform frameworks are trending: ***Flutter, **Google’s multi-platform framework continues to grow in popularity. *Tauri**, **a framework for the development of desktop applications leveraging front-end technologies, was one of the fastest growing in relative terms on the list. Still TBD how the evolution of WebAssembly will further enable this space.
There are **two open-source alternatives to firebase **on the list: Supabase, and Appwrite. I will be writing about open-source alternatives in an upcoming post.
Web development technologies evolve incredibly quickly, and web developer preferences are extremely ephemeral. See for example the two charts below:
Contributors to top Web Development repositories over time
Front-end frameworks are rather fickle:
Gatsby, later purple peak, skyrockets from June 2017 to June 2020, then equally quickly declines
Flutter, yellow, peaks in march 2021, then begins declining abruptly
Even Next.js begins slightly declining around March 2021
React-native, pink, peaks quickly around 2017, then begins declining)
Backend frameworks are more enduring: Rails, Django, and Node in particular
C*ontributors to top SRE¹ (think Cloud Native) repositories over time*
Notice how much slower the rise and more enduring the core technologies of the SRE¹ stack appear to be, particularly Kubernetes in purple with the largest number of active and qualified contributors of any project.
Note: I will write much more about this methodology, as well as more detailed benchmarks from this perspective, in a future post...
**Don’t be a one trick pony, especially not in front-end: **a single trending web dev technology is not likely to sustain a large and enduring company given the speed of change in the space. Latching on to a single core tech has been the downfall of companies even in spaces where dev preferences are much more sustained (think early data infra companies that latched on to Hadoop). Recognizing this, expanding beyond Next.JS while it is still wildly popular, and hiring the creator of Svelte, is part of the brilliance of Guillame Rauch. We may joke about everyone going to work for Vercel, but it is likely a rare viable enduring strategy within the space!
Know where your community lives: **at least on Github, the web developer population is still much larger than the data or other developer populations.** Out of the top ~250 repos in 2021, web development repositories gathered ~600K stars, while Data, AI and ML related repos only around ~162K, 4x less. Note, per 2018 Slashdata estimates below, Github likely accurately indicates that there are far more web-centric vs. data-centric developers in the world today.
I guess there are two forces at play that will change this dynamic: 1) more and more ‘software engineers’ will begin primarily leveraging and interfacing with data infrastructure vs. web frameworks over time, 2) more and more non-technical data practitioners (any business analyst) will become increasingly technical, whether via learning Python or SQL/dbt… Not quite sure either will become hardcore Github users though!
Honestly, I don’t have much to say here… Github is full of quirky and cool developer tools. A bunch of these top repos were shells, terminals, IDEs or Code Editors, perhaps the most important components of developer experience:
Microsoft’s VS Code topped the list with 20K stars and probably one of the best code editors out there today, a difficult truth for many to accept 😅 Ofc, Powershell is also on the list with ~9K stars in 2021
Coder— VS Code in the browser, is one of the fastest growing repos by other activity metrics as well
There are also some interesting plug-ins, add-ons for the same set of tools:
Started in late 2020 and growing ~8K stars almost from scratch this year, is Fig, which adds advanced autocomplete to your terminal, regardless of which you choose to use
Somewhat comically, one of the older and more popular utilities and overall more popular repos on Github (65K stars) is ‘thefuck’, which autocorrects your previous console command
Starship, on the other hand, helps customize the prompt of any shell that you may be using
No surprises here: deep learning is the most popular subcategory, with hugging face transformers repo, YOLOv5, Tensorflow and Deepmind’s Alphafold all in the mix. Surprisingly, the only proper infrastructure-ey repos on the list are Meilisearch and Clickhouse, a tad bit surprising given all the hype data infrastructure receives in VC-world, but again, probably just a question of size of end-user populations + whether data scientists spend tons of time on Github vs. Web Developers…
I have tons more to write on Data, plan on publishing future posts… For now, I just will highlight two more resources that are cool:
- the AI Expert Roadmap (interactive web page), seems to have taken inspiration from the developer roadmap linked above and is awesome. I LOVE how they separate out different personas, from data scientist, to machine learning, to deep learning, to data engineering, etc. It’s really well done and fun to browse through! It is also kind of fun to juxtapose this with the aforementioned Developer Roadmap, as well as the Analytics Engineers Club, as they collected cover so much of modern tech is slightly MECE² (#BCG) ways 😃
- The second repo I LOVE is Eugene Yan’s Applied ML repository. This is a brilliant idea to create and actually something I was planning on sort of casually doing in my non-existent free time… Anyhow, it is a curated list of technical posts from top engineering teams (Netflix, Amazon, Pinterest, Linkedin, etc.) detailing how they built out different types of AI/ML systems (e.g. forecasting, recommenders, search and ranking, etc.). Ofc, it focuses on AI/ML, but something similar could be made for the traditional or BI-oriented analytics stack, as well as the streaming world, super high value for practitioners! Btw-one of my favorite things at BCG used to be looking at our IT architecture team’s reference architecture diagrams… the best way to understand technologies is to look at how a ton of stuff is architected… and its fun!
The technology repositories probably reflect what hackers are excited about and the future, while the educational repos reflect which languages are still most popular today. A friend has suggested to me it also just has to do with languages that are ‘easiest barrier to entry’ which makes sense!
To this end, I compare three more data sources for programming language popularity below. It would seem that looking at top trending Github repos is actually the most strongest forward looking indicator for the future!
- Github actually publishes their own ranking of programming languages used across the entire service: you can see how strongly Typescript trends below, though Rust and Go don’t even make it on that list yet!
- Perhaps the most general source of all is the TIOBE index, which aggregates searches for different programming languages across search engines. Surprisingly, Java and C are still super dominant on this list, with Python only overtaking them in the last month!
a. Microsoft has come a really really long way in terms of open-source, developer friendliness, and its broader company strategy. Senior Microsoft strategy executives once compared open-source companies to those that caused the dot-com bubble crisis — irresponsible and recklessly giving away services for free. Bill Gates even once claimed open-source is bad for jobs 😂
The significant change came with the appointment of Satya Nadella as CEO in 2014 and the open sourcing of the .NET stack (Wikipedia has a good recap of it all). Today, Microsoft finds itself with 8 of the most popular repos in 2021 on Github:
three educational courses- Web Dev, ML, and IoT for beginners. Note re using educational resources as a strategy for marketing , at least the ML course links to various Azure services. Google does this a bunch as well, with Collab notebooks often being used to demo educational materials.
b. Hackers embody a healthy dose of counter-culture **(in American terms at least haha)**. Anti-paying for things, anti-advertisements, and strongly pro-labor 💪
Bypass paywalls chrome is plug-in that does exactly what it says, let’s users bypass website paywalls to access content. Note, if you can, please pay for quality journalism.
Block the spot helps block advertisements on the internet. I am not a fan of hidden, advertisement-driven business models. That’s actually a big part of why I like enterprise vs. consumer tech more broadly, much cleaner and ethical business models
996.ICUis an amazing repository, basically a list of bad tech employers in China (perhaps broader now). It received significant media attention when started trending in 2019. Their own description below:
The name 996.ICU refers to "Work by '996', sick in ICU", an ironic saying among Chinese developers, which means that by following the "996" work schedule, you are risking yourself getting into the ICU (Intensive Care Unit)
c. There are some sketchy things trending, not all tech does good (e.g. a less fun way to end things)
AI/ML is awesome and will bring a ton of good to the world, but there are also serious risks and safety considerations. Enhanced surveillance and State control is certainly one of them, and perhaps one of the ripest use cases for abuse is around facial recognition. One of the top trending repos in 2021 was Tencent’s GFPGAN, which ‘aims at developing Practical Algorithms for Real-world Face Restoration’. Another trending library was DeepFaceLab, for creating deep fakes. Note, famously in 2020, Huawei published about testing software for facial recognition of Uighurs. Earlier that year, IBM announced it would no longer develop facial recognition software. I come from a country where state surveillance is fairly normalized, albeit discreet. I’m talking journalists have their homes broken into, their messenger texts intercepted, and the secret police taps your cell phone type surveillance. So when our government bought 1000+ Huawei smart cameras a couple years back with facial recognition embedded, human rights activist were not thrilled.
btw: there is an excellent compilation of awful use cases of AI in this repository aptly named Awful AI
Another two trending repositories dealt with locating people across social media accounts: project Sherlock and social analyzer— kind of sketchy seeing tech like this floating around in the public and easily downloadable domain and a good reminder of how public our lives are on the internet.
 SRE — Site Reliability Engineering, e.g. CNCF
 MECE — Mutually Exclusive, Collectively Exhaustive, uber consulting speak (link)