How do you measure code quality / engineering team performance?

mshel profile image MikhailShel ・1 min read

I've been looking into this problem for a while now and here are the few things that I came up with(however NaN of them feel quite right):

  1. Test code coverage
  2. Amount of bugs / Amount of features (%)
  3. AVG Page load time (assuming we talking about some sort of website)
  4. Cyclomatic complexity (spaghettines of the code) should be small
  5. Essential Complexity (unnecessary loops and ifs)-should be small

So the question is how do you guys do it? and at what point (every 2weeks/ on the time of release/ every 1/2 year /every day/ every commit/ once in a blue moon)


markdown guide

I think the whole reason the question seems hard to answer is that whatever metrics you collect, they will likely never correlate to "team performance". A team doesn't perform well because their code output takes on specific characteristics. They perform well when they work in unison.

And if by "team performance" you actually mean individual performance of each team member, then the act of measuring these things might make it impossible for them to work well together. Because then they become competitors trying to outdo each other (or avoid blame) on metrics; it is not in their best interest to help each other.

I was recently tasked to do a "performance review" of my software team. And because I'm old and I'd rather find another job than do something that I consider dehumanizing, I asked them how they wanted to be evaluated (in the context of avoiding a manager passing a subjective judgement). And we had an ad hoc design exercise. What we ultimately came up with was to identify the activities within our department. Then identify which activities (on which products) each team member participated in. And the level at which they participated (Leader, Contributor, Participant). In the end each "review" read almost like a resume. Since it also identified all department activities, it provided team members a look at the different areas they could choose to grow into. It was also pretty uncontroversial. "Hey, didn't you go solo to those customer meetings about the new features they want? Ok, so you were a Contributor in Business Analysis for that product." We even referred to our previous sprints to fill it out. As a hypothetical exercise, we did a retroactive review for last year, and you could clearly see the progress between years. You could also see who was doing different activities on the team and infer how people interacted. If you brought a new manager on board or had to justify some decision, these reviews would tell loads more about the team than any code metrics ever could. And the activities themselves are researchable as to market value.

Anyway, my 2 cents.


That is actually very helpful a lot of good food for thought in your experience! Now the question is more about very large codebase with 10+ different teams contributing to it. so the idea here is to look on performance of whole department and no one in particular.


Ah, I appreciate your response. :) So the primary question I would have if tasked with this is Why? and what kind of actions is this data informing? I'm having trouble imagining how aggregate performance across a dozen teams will be actionable. (Like, what kind of department-wide policies could result from it?) But admittedly, my imagination is not very broad.

Best wishes!

Oh I don't see it as policies in any way, it s more of a tool to show people that we are getting better/ worst and I do believe that engineers prefer to work in clean== better quality code, so having this score should reinforce the feeling of project moving in the right direction or reconsider coding practices / pr review system.

My take on this is that the only way to ensure quality work is done is to hire people who actually care and can work well with others. These people are in short supply and/or are hard to find through current hiring practices. So, the next best option is to keep a pulse on how the product is doing.

  • Revenue
  • Support volume
  • Bugs filed
  • User errors (e.g. API request rejection)
  • Time spent on each screen

If a lot of people are calling support about specific processes, the natural instinct is to write documentation to explain. But probably a better strategy is to try to redesign it so it needs as little explanation as possible. A lot of managers fall into the trap of judging progress (or being evaluated) by amount of features produced. And they don't want to go back and change a feature that is already there. But sometimes it would make a bigger impact than adding something new.

But I digress. The other thing I wanted to mention is that even being aggregate metrics, if they are going to cause coding practice / PR review changes for example, people are going to game them. At the end of the day your ICs (even ones that genuinely care) will usually look at these as overhead (read: impediments) to doing their actual work. So their best interest is served by keeping code metrics looking good, regardless of whether they reflect any reality. And they will ultimately become a source of false confidence to management.

The only real way to know how the teams are doing is to have someone on the teams tell you. Traditional top-down mgmt structures create a separation that makes it hard for managers to actually be a part of the team and know how it's really doing. I think that's a big reason of the push for servant-leadership nowadays. Everybody appreciates the manager who is always there to lend a hand and keep them informed. But everybody would rather interact as little as possible with that manager who is pushing new initiatives on them.

I apologize, as I may have vented a little previously. Thank you for being a good sport and genuinely seeking answers and improvements -- for being someone who cares.


It is a difficult topic, maybe the answer is a mix out of all of your points. But most of them can not be taken as a single indicator of the code quality or teams performance.

You can write shitty test code which still covers most of your code, the amount of bugs in features depends a lot on the complexity of the feature and the already existing code around it. AVG page load time does still not tell you how well the code is written. Low complexity is a good indicator but also does not say a lot alone.

For Bugs, I find it very important to discuss each Bug and its fix in the team, without making it a blame game. The Questions should be why did it happen (Answers could actually point to 1, 4, 5 in your list)? What can we improve to avoid similar bugs the next time?

A lot of times I just see Bugfixes after Bugfixes which could be traced back to the same root cause, like spaghetti code, if you would take the time to analyze it. This is basically what should happen in each Sprint Retrospective but rarely is done (at least in my experience).


all good ideas, but so what I want to get is some sort of number/numbers to be able to evaluate team performance over time.

And the page load is an indicator that I actually dislike most... especially since its average, there are just too many possible fluctuations there(amount of pages going up and down the peak ours, promotions on tv, etc.)


I guess for all things mentioned, you could record the numbers automatically, like every week and then create graphs and check the trend.

This should give you a quite good indication of how a team performs over time and to changes, for example, if a lot of new members join or if the workload is increasing.

If your company records over hours you should add these too, so you might see that the performance goes down if there are too many.

  1. Test coverage never ever measures quality. Let's say a model class. What is the point to test a model class when it is practically logic-less?. Also, some methods need more than one unit test and it is something that is not measured by test coverage.

  2. Easy to say than done.

Good quality is measured by the clarity of the code and of course, the code must work.


I'd say it does, code tends to be left untested when it's untestable - improper dependency initialization static vars, etc. However you right in a away that you can abuse this metric if you want but why would you?


Unit Test is fine for some cases but let's say you are working for big business or corporate. In this case, it is not rare if not the norm to do manual testing, so the unit test is optional if any.

Now, we have some guys/girls testing our code (QA), from how it works to how it looks. Usually, they torture our code. Let's say it fails a single test, then they stop the test and they send it back to us. Rinse and repeat until the code passes all the test.

Sometimes we even fight with the DBA (performance test).

I usually spend time doing a unit test for my fancy projects but, I don't spend time with my customers because they do QA anyways.


I guess the question is how you can put the number on clarity to know how you are doing


Usually, it's about to follow a standard.

Let's say we (as a team) decide to name all the classes as Peter, so we could have classes called PeterInvoice, PeterPurchase, PeterLog and so on. Is it clear? Yes if the team understand it.

Now, let's say we want to use a database, then the team could use Oracle (or some specific database) because the team is seasoned with it. The same with libraries, components, and style of developing. Finally writing the code is more a routine than search & research, so it's easy to measure the advance and deadlines of the project.

It is not rare (nor elegant) to copy and paste a solution, but it is how most companies work.

However, it also eternalizes terrible practices, so the team needs to spend some quality time s&r, but this evolution must be slow and controlled.

For example, let's say our team works in C# and Python and Java, at the same time. It is a mess, no matter if the code is clean.


Top of my mind here is "why?", what's the purpose of measuring stuff: profitability for the company; team happiness; customer engagement?

The highly recommended Accelerate book by Nicole Forsgren et al (amazon.com/Accelerate-Software-Per...) suggests the following metrics are useful for end-end delivery performance, which is correlated with better return on investment for the company, happier people and happier customers:

  • Lead time for changes
  • Deployment frequency
  • Time to restore service
  • Change failure rate

Have a read - or the shorter State of DevOps Reports derived from the same research: puppet.com/resources/whitepaper/st... - the TL;DR version :)


Nice, definitely will check it out!


And the main goal is definitely team happiness

  • Codacy with python and node standards (pylint, eslint). Code coverage.
  • User story points done in a sprint. The user story points are estimated beforehand for several sprints and at least five people in the team participate in the estimation.

Spend less time on metrics and more on helping your team to understand the values of principles of good software. I think about my favorite quote:

“If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.“ ~ Antoine de Saint-Exupéry


0) Can we move as fast we need to?
1) Bugs / Hotfixes / per release
2) Tech debt
3) Do we have simple solutions for complex problems


that is all good stuff but how do measure it, the idea is to have some sort of metrics so we could go back and see weather team have improved/ degraded performance