Misha Shelemetyev

Posted on Jun 7, 2019

How do you measure code quality / engineering team performance?

#help #codequality #teamwork #discuss

I've been looking into this problem for a while now and here are the few things that I came up with(however NaN of them feel quite right):

Test code coverage
Amount of bugs / Amount of features (%)
AVG Page load time (assuming we talking about some sort of website)
Cyclomatic complexity (spaghettines of the code) should be small
Essential Complexity (unnecessary loops and ifs)-should be small

So the question is how do you guys do it? and at what point (every 2weeks/ on the time of release/ every 1/2 year /every day/ every commit/ once in a blue moon)

Top comments (16)

Kasey Speakman • Jun 7 '19 • Edited

I think the whole reason the question seems hard to answer is that whatever metrics you collect, they will likely never correlate to "team performance". A team doesn't perform well because their code output takes on specific characteristics. They perform well when they work in unison.

And if by "team performance" you actually mean individual performance of each team member, then the act of measuring these things might make it impossible for them to work well together. Because then they become competitors trying to outdo each other (or avoid blame) on metrics; it is not in their best interest to help each other.

I was recently tasked to do a "performance review" of my software team. And because I'm old and I'd rather find another job than do something that I consider dehumanizing, I asked them how they wanted to be evaluated (in the context of avoiding a manager passing a subjective judgement). And we had an ad hoc design exercise. What we ultimately came up with was to identify the activities within our department. Then identify which activities (on which products) each team member participated in. And the level at which they participated (Leader, Contributor, Participant). In the end each "review" read almost like a resume. Since it also identified all department activities, it provided team members a look at the different areas they could choose to grow into. It was also pretty uncontroversial. "Hey, didn't you go solo to those customer meetings about the new features they want? Ok, so you were a Contributor in Business Analysis for that product." We even referred to our previous sprints to fill it out. As a hypothetical exercise, we did a retroactive review for last year, and you could clearly see the progress between years. You could also see who was doing different activities on the team and infer how people interacted. If you brought a new manager on board or had to justify some decision, these reviews would tell loads more about the team than any code metrics ever could. And the activities themselves are researchable as to market value.

Anyway, my 2 cents.

Misha Shelemetyev • Jun 7 '19

That is actually very helpful a lot of good food for thought in your experience! Now the question is more about very large codebase with 10+ different teams contributing to it. so the idea here is to look on performance of whole department and no one in particular.

Kasey Speakman • Jun 7 '19

Ah, I appreciate your response. :) So the primary question I would have if tasked with this is Why? and what kind of actions is this data informing? I'm having trouble imagining how aggregate performance across a dozen teams will be actionable. (Like, what kind of department-wide policies could result from it?) But admittedly, my imagination is not very broad.

Best wishes!

Misha Shelemetyev • Jun 7 '19

Oh I don't see it as policies in any way, it s more of a tool to show people that we are getting better/ worst and I do believe that engineers prefer to work in clean== better quality code, so having this score should reinforce the feeling of project moving in the right direction or reconsider coding practices / pr review system.

Kasey Speakman • Jun 8 '19

My take on this is that the only way to ensure quality work is done is to hire people who actually care and can work well with others. These people are in short supply and/or are hard to find through current hiring practices. So, the next best option is to keep a pulse on how the product is doing.

Revenue
Support volume
Bugs filed
User errors (e.g. API request rejection)
Time spent on each screen

If a lot of people are calling support about specific processes, the natural instinct is to write documentation to explain. But probably a better strategy is to try to redesign it so it needs as little explanation as possible. A lot of managers fall into the trap of judging progress (or being evaluated) by amount of features produced. And they don't want to go back and change a feature that is already there. But sometimes it would make a bigger impact than adding something new.

But I digress. The other thing I wanted to mention is that even being aggregate metrics, if they are going to cause coding practice / PR review changes for example, people are going to game them. At the end of the day your ICs (even ones that genuinely care) will usually look at these as overhead (read: impediments) to doing their actual work. So their best interest is served by keeping code metrics looking good, regardless of whether they reflect any reality. And they will ultimately become a source of false confidence to management.

The only real way to know how the teams are doing is to have someone on the teams tell you. Traditional top-down mgmt structures create a separation that makes it hard for managers to actually be a part of the team and know how it's really doing. I think that's a big reason of the push for servant-leadership nowadays. Everybody appreciates the manager who is always there to lend a hand and keep them informed. But everybody would rather interact as little as possible with that manager who is pushing new initiatives on them.

I apologize, as I may have vented a little previously. Thank you for being a good sport and genuinely seeking answers and improvements -- for being someone who cares.

Andreas • Jun 7 '19

It is a difficult topic, maybe the answer is a mix out of all of your points. But most of them can not be taken as a single indicator of the code quality or teams performance.

You can write shitty test code which still covers most of your code, the amount of bugs in features depends a lot on the complexity of the feature and the already existing code around it. AVG page load time does still not tell you how well the code is written. Low complexity is a good indicator but also does not say a lot alone.

For Bugs, I find it very important to discuss each Bug and its fix in the team, without making it a blame game. The Questions should be why did it happen (Answers could actually point to 1, 4, 5 in your list)? What can we improve to avoid similar bugs the next time?

A lot of times I just see Bugfixes after Bugfixes which could be traced back to the same root cause, like spaghetti code, if you would take the time to analyze it. This is basically what should happen in each Sprint Retrospective but rarely is done (at least in my experience).

Misha Shelemetyev • Jun 7 '19

all good ideas, but so what I want to get is some sort of number/numbers to be able to evaluate team performance over time.

And the page load is an indicator that I actually dislike most... especially since its average, there are just too many possible fluctuations there(amount of pages going up and down the peak ours, promotions on tv, etc.)

Andreas • Jun 7 '19

I guess for all things mentioned, you could record the numbers automatically, like every week and then create graphs and check the trend.

This should give you a quite good indication of how a team performs over time and to changes, for example, if a lot of new members join or if the workload is increasing.

If your company records over hours you should add these too, so you might see that the performance goes down if there are too many.

Javier Aguirre • Jun 8 '19

Codacy with python and node standards (pylint, eslint). Code coverage.
User story points done in a sprint. The user story points are estimated beforehand for several sprints and at least five people in the team participate in the estimation.

Misha Shelemetyev • Jun 7 '19

I'd say it does, code tends to be left untested when it's untestable - improper dependency initialization static vars, etc. However you right in a away that you can abuse this metric if you want but why would you?

Cubicle Buddha • Jun 8 '19

Spend less time on metrics and more on helping your team to understand the values of principles of good software. I think about my favorite quote:

“If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.“ ~ Antoine de Saint-Exupéry

Arekusandr • Jun 7 '19

0) Can we move as fast we need to?
1) Bugs / Hotfixes / per release
2) Tech debt
3) Do we have simple solutions for complex problems

Misha Shelemetyev • Jun 7 '19

that is all good stuff but how do measure it, the idea is to have some sort of metrics so we could go back and see weather team have improved/ degraded performance