It is clear to me that there are many problems with open source, the most important one, in my opinion, is the fact that not enough people contribute to it.
At least that's what I thought. Then I ran some searches and ended up on the 2022 The state of open source report of GitHub that seems to say something else.
Side rant about URLs
There is a clear URL for the main page of the 2021 report Same with the 2020 report and probably all the earlier reports. However the main page of the 2022 report is the front page of the whole site and if you try to visit the URL that should be the home of the 2022 report it gives yous 404 not found. I guess when they publish the 2023 report they will move the current front page to this sub-folder and then the link I just included will start to work. It is really strange. This means that every article that tries to refer to the 2022 report before the 2023 report comes out will have to link to the main page of the site and when the 2023 report comes out the link will be incorrect. As it will point to the newer report.
On the other hand if you land on an older report you don't have a link to switch to the newest report. Nor is there a clear indication (well, besides the date), that there is a newer report.
Numbers
Just a few numbers from the report to make it easy to quote.
- 94M developers are on GitHub with 27% year-over-year growth
- 413M open source contributions in 2022
- 85.7M+ new repositories with 20% year-over-year growth
- 3.5B+ total contributions to all projects on GitHub (Contributions include commits, issues, pull requests, discussions, gists, pushes, and pull requests reviews.)
- 227M+ pull requests merged
- 31M+ issues closed
- 263M automated jobs run on GitHub Actions every month (With more than 41 million build minutes a day)
Number of open source contributors
Back to the numbers, GitHub reports that there are 94M developers on GitHub in 2022. That seems to totally contradict my feeling that not enough people contribute to open source.
I work as a contractor/consultant/trainer so I see a number of companies every year. Not a lot, usually 3-10 per year. There might be a strong selection bias, but nearly none of their employees contribute to open source projects. Well, thinking about it again, they might use GitHub for some private project. They might even push out some code to a public repository of their own, but most of that code is not used by anyone besides them. Some of them might open issues on projects, but even that seems to be limited.
So I wonder. Is this just the limited and probably biased sample I have? How come that I feel that there is not enough contribution to open source while GitHub seems to indicate that there are 94M people on their platform.
Multiple and dormant accounts
I have at least 3 accounts on GitHub. One of them is my primary account and two others I use when I demo something. E.g. How a user without any extra rights can interact with one of my repositories. I bet I am not the only one with multiple accounts.
I am also quite sure that there are many people who signed up to GitHub but don't use it too frequently if at all. So that 94M is probably much bigger than the number of active users. And what does "active" mean, anyway?
Very few contributions per person
There are 94M developers on GitHub and there were 413M open source contributions in 2022. So an average of 4.3 per year. I can only assume that by "open source contribution" they mean contribution to one of the public repositories. I looked at my own GitHub Profile and according to that I contribute somewhere between 10-120 per day. Though when I was on vacation there were a few days without any contribution. It is also true that in my case some of those commits are generated by scripts so they don't reflect real work on my part, but they still count in that 413M open source contributions.
On another page they say there were 3.5B total contributions to all projects on GitHub. Here "contributions" include commits, issues, pull requests, discussions, gists, pushes, and pull requests reviews. It probably also includes all the private repositories as well. this brings us to roughly 37 contributions per year per person. Still not a very big number for open source if we take in account that some of these go to private repositories.
Only 20% are public repositories
One number I found in the report is that only 20% of the GitHub repositories are public.
Some very popular project, many unknown projects
In their report they show the 10 projects with the biggest number of contributors. The first one is microsoft/vscode with 19.8K contributors in 2022 and the 10th place is tensorflow/tensorflow with 4.4K contributors. That's really nice, but my guess is that most repositories have very few contributors.
The report says "85.7M+ new repositories". In previous year there were 61, 60, 44 new repositories and in 2018 the reported that there are 96M repositories) So based on this there are a total of 346M repositories. So given there were 3.5B contributions, the average number of contributions to a repository is around 10 per year.
I think this is called a Pareto distribution. Very limited number of projects have a lot of contributions and a lot of contributors, but very quickly it gets to under 10 per year and then also under 1 per year. Meaning the project has not seen any activity for a year.
Conclusion
I am far from being done with this topic, but I think this is enough for now. While this report indicates that there are a lot of people registered to GitHub, I think it also shows that only a small subset of those people have contributed to open source projects and even those contributions are concentrated among a few (probably a few thousand) popular projects.
So maybe I am not that off reality with my Open Source Developer Course in which I try to teach people how to contribute to open source projects.
I will need to look for further reports and run some queries using the GitHub API to collect some data.
Top comments (8)
I agree with your opinion, yet I also would add part where, not many people understand true spirit of Open Source and also "attractiveness of the projects":
Indeed. Why would you?
for the same reason why linux kernel enables you to run laptops/phones/servers which are backbone for economy/medicine/academia, same reason why VLC is used as main streaming application, same reason why you have this platform as opensource to ask that question: to help each other to succeed and lead to better future in some manner.
I just wanted to say it wasn't meant to be rude, but indeed curious. I think too most open source people do it with some kind of idealism mixed with a bit of fun and sometimes expectations. But after weeks, months, years, decades, such contributions might become tiresome, since you get nothing in return. That's why I think that most open source projects are neglected, while the thriving projects are usually the ones backed by a business model or with big brand(s) behind. In the end it's lots of hard work, and doing it "for free" has its own mileage. Moreover, when speaking of "free-speech not free beer", I think what most people care about is the free beer actually. ;) I mean, isn't it ironic enough that GitHub itself is closed source? Nobody cares, it's free, yay! ;) ...now, everybody loves to use open source. It's free, it has longevity and you can try to fix something if really necessary. But as a maintainer, it's a of volunteer work, a difficut and often unrewarding one at that ...if your projects gathers eyeballs at all. I'm sure there are plenty of undiscovered gems out there.
My apologies, I know you were not rude, I just have a habit of over dramatize my answers some times... š
I disagree in regards to "get nothing in return": you get recognition, experience, knowledge... one might get paid for it, if one knows how to work with marketing.
You are right about most of the things that you've mentioned: the beer, the volunteering, maintainers, github (use gitlab btw, way better and open source) and so on, but all and all, it boils down to culture and education:
Interesting insights. Bravo on contributing so much yourself! š
I donāt believe number of users or number of repos on GH has any relation to OSS contributions. Everywhere Iāve worked in the last 10+ years uses GH for private repos so all team members need accounts, and Iām sure the majority only use them for day-to-day work.
Yes, that was my conclusion as well.