Many aspects affect the number of unfixed and solved bugs like:
What I see in your graph is that the most used languagues (C++/Java) with the biggest codebases and most features under the software built with them have the most bug per repo, but it seems logical.
Seeing that, it is now quite hard to draw any conclusion from that data alone.
What I surely see is that static typing serve as a mandatory documentation that help both the compiler, the IDE and the developper to reason about the code. There less information available on a typical dynamic language meaning that one has to rely more on alternate solutions but in state of the art tooling, the IDE/compiler typically never catch up. More checks are done at run time and the IDE fail to provide the same quality of tooling and context (auto completion, refactoring, code navigation).
Thanks for the comments.
The data is from Github which means Open Source code and from tens of thousands of repositories.
I state that the approach is very naive but I am still surprised about the results.
I agree that not all bugs are equal and you shouldn't use the same development practices in all projects.
I think that is the reason why monoliths needs to be split into micro services at some point. My personal experience is that language expressiveness matters
Have you look at the comments by Blaine? They are really interesting.
The most visible correlation in your data is that the more stars a repo has, the more bug there is inside. Also the respective ranking of language change significantly with the number of stars, like java being quite good for all repo, but quite bad by your metric on big repos.
It may be possible there a correlation between language and number of bugs or dynamic/static typing but really the data is not refined enough to remove other variables so concluding anything is impossible from the data.
Sure that language expressiveness matters, it is enough to try to develop anything in assembly vs Java or Lisp and you sure see a higher level language work better. But there expressive languages on both sides and different languages may suit different problem categories too.
My impression is also that huge projects are not often done in dynamically typed languages. I feel like a dynamically typed language may be able to leverage more of the individual productivity and on the contrary are not that great when the code base scale (millions lines of codes).
The number of line of code is not a good metric but it is far better than thinkings all repos are equals, so I would consider bug per LOC. After that is done you could always apply a factor between high level language vs basic one (like C typically needs more LOC than Java).
Thanks for the comments!
I neither do think that the data proves anything, I hope I made that clear in the post. Proving is a big word that I rarely use for anything.
I don't know if you noticed by I linked to the best source of studies on the matter that I found.
Reading your comments, something popped to my mind.
When we talk about huge projects, do you think that we plan from the beginning for huge projects or their start small and grow to be huge? Do you know think is common in the second case to switch languages?
About huge projects do we know in advance? Well I guess it is case by case.
Now I have colleagues and friends working for the french civil aviation and they decided long ago to make a new version of one of their key component. They started thinking big from the start. And by the way, automatic memory management was a no go as not realtime friendly, meaning many language like Java/Clojure/Lisp are instand no go.
There a saying that if you are a startup, you should go for instant productivity and that you'll always have time and money to rewrite everything if you company is to be successful, but if you are not successful, going more slowly to ensure better architecture, easier to maintain code or better performance doesn't make sense at all.
Some other would say you should use what you master. I think that make a lot of sense it save you time and let you concentrate on more important aspects like finding clients, hiring the right people or creating a business plan...
Theses companies have technical policies and outside of proof of concept, for anything that may go to production, it has to use allowed technology. For my current company that's C/C++ for most legacy, Java for most new things, Scala/Spark for BigData analysis and a bit of python. That last one being restricted to scripting, small projects that do not need to scale.
I do not necessarily says it is the right way to proceed, but the common practice is to use a statically typed language that has widespread adoption in the industry, and a mature echosystem that help on the productivity.
That being said, I quite remember the arguments of Paul Graham about lisp and how it helped him on his startup.
But even if he criticized it, when Yahoo brought his company, one of the first things they did was to migrate the code from lisp to a statically typed language... The decision was criticized, maybe rightly so, but it show that many people are not that found on dynamically typed languages.
I am personally quite torn about standardisation.
I always wonder what I would do if I created my own company. Would I mandate some popular language or would I allow every team choose whatever they wanted?
I can see a lot of good arguments in both sides and I have seen a lot of talks about the subject, and again, nobody agrees.
It is a little bit paradoxical what you say about the best tool for the job. I have similar experience and I see it as "the best tool within these limited and blessed toolset". When and how do you decide to add a new tool? It is really hard to quantify the value and cost, when we keep saying things like "more maintainable" or "easier to use".
Paul Graham essay is a classic, every developer should read it, not because of Lisp but to be aware of the Blub Paradox. It applies to all of us.
Out of curiosity, what language did your friends choose?
I guess for developers like you or me that love our craft, we want to get the most of our time, tooling and libraries. As such we like to have the best of the best, whatever it is.
That the promise of languages like lisp where you can easily build new abstractions that fit the best to solve the problem at hand.
But many things require several or many people either at the same time or over the years... maybe for example you'll not want to devote the next 20 year to the maintenance of the project you did in the past 5-10 years. This is where standardization make sense. If you get better productivity for yourself but the overall productivity drop, that a net loss.
So both aspects are to be taken into account. I would say in a big company, small independent team each completely responsible of its area even including the production make sense and help to scale that productivity. In today world, for many case, just saying your are able to provide VMs in the cloud that are able to respond to some kind of network queries should work and let of freedom in how things are done inside.
But even that doesn't solve everything. The interractions between teams will still dictate many things like what protocol data is exchanged. But also how you managed your database, what is overal architecture, what tools will you use for the continuous integration, QA testing, the cloud you'll use and how your application will skrink and scale dynamically...
There no much we can do alone in a big company if we don't cooperate.
I am convinced the language impact the productivity somewhat, but many other things impact it more. The programing language is a tactician choice, while the bigger things are strategist choices. And while you'll want to delegate the details to great tacticians, you'll want to have great strategists when you are in a big company... If just switching teams mean your employee need 6 months or 1 year before he become fluent in the technology stack, that's a real downside because this is only a small part of the job.
You are so much right. As Gerald Weinberg said:
"The Second Law of Consulting: No matter how it looks at first, it's always a people problem."
Thanks a lot for your toughts. It has been a pleasure to have a civilized discussion.
I have started following you on Twitter just in case you decide to start blogging.
Thanks for the link to Paul Graham’s Beating the Averages and The Blub Paradox. I run into that all the time.
Trying to convince the other developers (who are very bright people) that there are alternative languages that would be more powerful and suitable for our problem domain invariantly meets with deer-in-headlights blank stares.
Even contemplating alternative languages is outside of most developer’s comfort zone. Or moreso, even outside of capability of consideration. Even as a thought experiment.
When I look at the trends, I see object-oriented programming to continue for the foreseeable future. But, I also think there will be two language idioms that overtake object-oriented programming languages: functional programming languages, and domain specific languages.
I consider Lisp to be a programmer's programming language. An "abstract syntax tree oriented" programming language. Paul Graham's secret super-weapon is safe.
We’re a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.