DEV Community

Gabor Szabo
Gabor Szabo

Posted on

 

The problem with open source: not enough contributors

It is clear to me that there are many problems with open source, the most important one, in my opinion, is the fact that not enough people contribute to it.

At least that's what I thought. Then I ran some searches and ended up on the 2022 The state of open source report of GitHub that seems to say something else.

Side rant about URLs

There is a clear URL for the main page of the 2021 report Same with the 2020 report and probably all the earlier reports. However the main page of the 2022 report is the front page of the whole site and if you try to visit the URL that should be the home of the 2022 report it gives yous 404 not found. I guess when they publish the 2023 report they will move the current front page to this sub-folder and then the link I just included will start to work. It is really strange. This means that every article that tries to refer to the 2022 report before the 2023 report comes out will have to link to the main page of the site and when the 2023 report comes out the link will be incorrect. As it will point to the newer report.

On the other hand if you land on an older report you don't have a link to switch to the newest report. Nor is there a clear indication (well, besides the date), that there is a newer report.

Numbers

Just a few numbers from the report to make it easy to quote.

  • 94M developers are on GitHub with 27% year-over-year growth
  • 413M open source contributions in 2022
  • 85.7M+ new repositories with 20% year-over-year growth
  • 3.5B+ total contributions to all projects on GitHub (Contributions include commits, issues, pull requests, discussions, gists, pushes, and pull requests reviews.)
  • 227M+ pull requests merged
  • 31M+ issues closed
  • 263M automated jobs run on GitHub Actions every month (With more than 41 million build minutes a day)

Number of open source contributors

Back to the numbers, GitHub reports that there are 94M developers on GitHub in 2022. That seems to totally contradict my feeling that not enough people contribute to open source.

I work as a contractor/consultant/trainer so I see a number of companies every year. Not a lot, usually 3-10 per year. There might be a strong selection bias, but nearly none of their employees contribute to open source projects. Well, thinking about it again, they might use GitHub for some private project. They might even push out some code to a public repository of their own, but most of that code is not used by anyone besides them. Some of them might open issues on projects, but even that seems to be limited.

So I wonder. Is this just the limited and probably biased sample I have? How come that I feel that there is not enough contribution to open source while GitHub seems to indicate that there are 94M people on their platform.

Multiple and dormant accounts

I have at least 3 accounts on GitHub. One of them is my primary account and two others I use when I demo something. E.g. How a user without any extra rights can interact with one of my repositories. I bet I am not the only one with multiple accounts.

I am also quite sure that there are many people who signed up to GitHub but don't use it too frequently if at all. So that 94M is probably much bigger than the number of active users. And what does "active" mean, anyway?

Very few contributions per person

There are 94M developers on GitHub and there were 413M open source contributions in 2022. So an average of 4.3 per year. I can only assume that by "open source contribution" they mean contribution to one of the public repositories. I looked at my own GitHub Profile and according to that I contribute somewhere between 10-120 per day. Though when I was on vacation there were a few days without any contribution. It is also true that in my case some of those commits are generated by scripts so they don't reflect real work on my part, but they still count in that 413M open source contributions.

On another page they say there were 3.5B total contributions to all projects on GitHub. Here "contributions" include commits, issues, pull requests, discussions, gists, pushes, and pull requests reviews. It probably also includes all the private repositories as well. this brings us to roughly 37 contributions per year per person. Still not a very big number for open source if we take in account that some of these go to private repositories.

Only 20% are public repositories

One number I found in the report is that only 20% of the GitHub repositories are public.

Some very popular project, many unknown projects

In their report they show the 10 projects with the biggest number of contributors. The first one is microsoft/vscode with 19.8K contributors in 2022 and the 10th place is tensorflow/tensorflow with 4.4K contributors. That's really nice, but my guess is that most repositories have very few contributors.

The report says "85.7M+ new repositories". In previous year there were 61, 60, 44 new repositories and in 2018 the reported that there are 96M repositories) So based on this there are a total of 346M repositories. So given there were 3.5B contributions, the average number of contributions to a repository is around 10 per year.

I think this is called a Pareto distribution. Very limited number of projects have a lot of contributions and a lot of contributors, but very quickly it gets to under 10 per year and then also under 1 per year. Meaning the project has not seen any activity for a year.

Conclusion

I am far from being done with this topic, but I think this is enough for now. While this report indicates that there are a lot of people registered to GitHub, I think it also shows that only a small subset of those people have contributed to open source projects and even those contributions are concentrated among a few (probably a few thousand) popular projects.

So maybe I am not that off reality with my Open Source Developer Course in which I try to teach people how to contribute to open source projects.

I will need to look for further reports and run some queries using the GitHub API to collect some data.

Top comments (8)

Collapse
 
silent_mobius profile image
Alex M. Schapelle

I agree with your opinion, yet I also would add part where, not many people understand true spirit of Open Source and also "attractiveness of the projects":

  • True spirit of Open Source: it has somewhat complex meaning and descriptions of "free-speech not free beer" sometimes misleads end-users, who are possibly future developers. People tend not to understand why would anyone in their right mind would work for 10-12 hours, drive 1-2 hours back and forth between work and home, and then commit more code to something that does not convey any income.
  • Attractiveness of the projects: contributing for something that developers do not "connect with" usually does not works well. I have contributed to some projects, with which I did not have deep interest, and as such I found myself eventually falling apart from the project.
Collapse
 
dagnelies profile image
Arnaud Dagnelies

People tend not to understand why would anyone in their right mind would work for 10-12 hours, drive 1-2 hours back and forth between work and home, and then commit more code to something that does not convey any income.

Indeed. Why would you?

Collapse
 
silent_mobius profile image
Alex M. Schapelle

for the same reason why linux kernel enables you to run laptops/phones/servers which are backbone for economy/medicine/academia, same reason why VLC is used as main streaming application, same reason why you have this platform as opensource to ask that question: to help each other to succeed and lead to better future in some manner.

Thread Thread
 
dagnelies profile image
Arnaud Dagnelies • Edited

I just wanted to say it wasn't meant to be rude, but indeed curious. I think too most open source people do it with some kind of idealism mixed with a bit of fun and sometimes expectations. But after weeks, months, years, decades, such contributions might become tiresome, since you get nothing in return. That's why I think that most open source projects are neglected, while the thriving projects are usually the ones backed by a business model or with big brand(s) behind. In the end it's lots of hard work, and doing it "for free" has its own mileage. Moreover, when speaking of "free-speech not free beer", I think what most people care about is the free beer actually. ;) I mean, isn't it ironic enough that GitHub itself is closed source? Nobody cares, it's free, yay! ;) ...now, everybody loves to use open source. It's free, it has longevity and you can try to fix something if really necessary. But as a maintainer, it's a of volunteer work, a difficut and often unrewarding one at that ...if your projects gathers eyeballs at all. I'm sure there are plenty of undiscovered gems out there.

Thread Thread
 
silent_mobius profile image
Alex M. Schapelle • Edited

My apologies, I know you were not rude, I just have a habit of over dramatize my answers some times... πŸ˜…
I disagree in regards to "get nothing in return": you get recognition, experience, knowledge... one might get paid for it, if one knows how to work with marketing.
You are right about most of the things that you've mentioned: the beer, the volunteering, maintainers, github (use gitlab btw, way better and open source) and so on, but all and all, it boils down to culture and education:

  • Culture: in a sense of what the tech culture that you were taught on. In my case, I had honor to work with a lot of open source groups in my area, Debian and Ubuntu communities when they were in their development stages, fedora community when they were already a giant, python community when we all used python2.7.. all these taught me to share even if I do something small. ( i remember waiting for broadcom Linux kernel drivers developed by Linux community for at least a year, while using external dongle, because broadcom didn't want to share the code in early 2000)
  • Education: most of academic and private institutions should be emphasizing the "sharing is caring" policy, but due to misconception and disregarding the main ideas of open source, the don't talk about it. I tried to get myself software engineer degree around 2010, after I was sys-admin, and everything I was taught, was based either on Microsoft or on UNIX from early 90's, which was badly maintained. You are also right, when it comes to undiscovered gems, to which I can only suggest to each and everyone to go out and seek those gems (gitlab is developed in ruby, which has package manager named gem - pun intended πŸ˜„) So in conclusion, we can not force any one to contribute to open source, but we can educate those around us about it, and suggest to go on project discovery that are attractive to each and every one individually, in hopes that they all find their own code oasis... Sorry for over dramatizing my answers πŸ˜…
Collapse
 
jd2r profile image
Dominic R.

Interesting insights. Bravo on contributing so much yourself! πŸ‘

Collapse
 
hassan_schroeder profile image
Hassan Schroeder

I don’t believe number of users or number of repos on GH has any relation to OSS contributions. Everywhere I’ve worked in the last 10+ years uses GH for private repos so all team members need accounts, and I’m sure the majority only use them for day-to-day work.

Collapse
 
szabgab profile image
Gabor Szabo

Yes, that was my conclusion as well.

50 CLI Tools You Can't Live Without

The top 50 must-have CLI tools, including some scripts to help you automate the installation and updating of these tools on various systems/distros.