loading...

On Measuring a Software Engineer’s Performance

anaulin profile image Ana Ulin 😻 Originally published at anaulin.org ・3 min read

I recently learned that a former employer has started evaluating how productive software engineers are by ranking them according to the number of pull requests they open. The person who shared these news with me snarkily commented: “Well, I guess we should be grateful that they aren’t measuring lines of code per week.”

After I got over my disbelief, I got to thinking: but what is a good way to measure the productivity or, as our corporate overlords like to call it, the “performance” of a software engineer?

Let’s take a step back and start by asking ourselves what are we really trying to measure here. As an engineering manager, when I’m assessing a team member’s performance, what I’m trying to do is to get an objective picture of how well this person does their job.

Since this person is working as a software engineer, their job can be summarized as: helps us build and ship good software that does what we wanted it to. This might involve writing code1, but if we break this down further, writing code turns out to be only a small part of the work.

The job involves thinking through problems, proposing solutions, and designs to build those solutions. It requires collaborating closely with other engineers, designers, and product managers; maybe even with customer support and salespeople. It might also involve interviewing potential new team members, helping onboard new hires and mentor less experienced engineers. Depending on the structure of the organization, it might also include things like performing release engineering tasks or taking on-call shifts.

The list goes on and on, and will vary substantially depending on size and structure of the organization, tenure and experience level of the individual, and even phase in the project’s lifecycle (early on we might be doing no releases and doing a lot of design discussions, later on it might be mostly bug-fixing and customer support).

Once you break it down this way, it becomes clear that counting number of PRs is not very useful. Same goes for counting code commits, lines of code added, bugs fixed, etc. These metrics can only capture one small slice of an engineer’s work. Besides, such metrics are hard to get right, because not all PRs (or lines of code, or bug fixes) are of equal impact or complexity: my change fixing some style errors is probably less valuable than your change laying down the scaffolding for a new service. (Relatedly, this is why getting your coding workflow to be very efficient — using keyboard shortcuts, getting familiar with your editor of choice, using code snippets, etc — will only get you so far when you’re leveling up as a software engineer. The coding itself is only a small part of the battle.)

So back to the original question: how do we evaluate a software engineer’s performance?

For the reader that wants to be data-driven, I have bad news: there is no set of statistics that you can use to evaluate how good of a job an engineer does. And it gets worse: if you insist on measuring something, it is likely that your team will over time optimize for the metric you picked, likely getting you the opposite of what you wanted. Imagine a team where over time folks have optimized for sending at least a couple of PRs a day, no matter the content. Is that what you are shooting for?

If you want to truly assess how well an engineer does their job — all of it, in its complex and messy glory — there is no substitute for good judgement, a willingness to stay present, observe and listen. You work with the team, you watch the dynamics, you get to know people, how they work and what their full contributions are, you ask for input from peers, you apply your own experience and judgement. It’s as simple and as difficult as that. There is no magic bullet.

Modern corporations, with their career ladders and quarterly reviews, have taught us to see performance assessments as a normal part of work life. A good performance review might get you that raise or promotion.

But if you’re not in it to climb the ladder, or if you’re a manager that cares, the truly valuable part comes after you’ve done the assessment. That’s when you help folks level up. You give them kind but honest input about where they are weak, and work with them on ways to improve. That’s where the magic happens.


  1. Some software engineers might focus on driving architectural discussions, doing code reviews, supporting others and produce little or no code “of their own”. This is pretty normal for folks in technical leadership positions. 

Discussion

pic
Editor guide
Collapse
ejames_c profile image
Ced

I'm curious to know if you use any quantifiable metrics at all, or if you could recommend some in addition to qualitative measurements. Sure, there might not be a 'good set of statistics' one could use, but surely the answer isn't "good judgment", since that opens the door to bias?

What's a good mix?

Collapse
thejoezack profile image
Joe Zack

There are a bunch of studies that have tried to figure that out, specifically regarding software development.

The results are highly controversial though. I wrote an article about it a while back. The article references what they measured and has links to some of those studies at the bottom:
codingblocks.net/practice/four-rea...

For the most part, these studies looked at how programmers solved the same problems in isolation and the results are still highly controversial. As for pulling metrics from actual "real world" work where people are working on different problems, and have different specialties, and different responsibilities...well, good luck! :)

Collapse
anaulin profile image
Ana Ulin 😻 Author

Surely the answer isn't "good judgment", since that opens the door to bias?

Even if you applied a statistic of some sort, you would still have to apply your own judgement to make sure that it is a statistic that does not introduce its own bias, and that you are applying it correctly.

It would be great if we could determine everything by seemingly objective and data-driven numbers, but the reality is that even interpreting numbers still requires judgement. You can see examples of this all around, for example in politics you can see folks using the same numbers to draw different conclusions and support opposing stances.

If you look at the article that Joe shared below (very interesting, by the way, Joe, thanks for sharing!), it is revealing that the metrics in that study looked at the programmers "over a handful of hours" and measured things like "program size" and "time to debug". Maybe these metrics can tell us something about an engineer's ability to write code, but they tell us very little about their ability to do actual software engineering. As Jack points out in the article, they didn't look at the things that, in my opinion, truly distinguish stronger engineers: maintainability of the resulting code, ability to deal with ambiguous requirements, etc. They also never once addressed anything relating to collaboration, which is essential in a modern software engineering team.

So yeah, the answer is, in fact, "good judgement". As for bias, if you have managers that are not working to counteract their prejudices, no set of statistics is going to save you.

Collapse
ejames_c profile image
Ced

Thank you for this reply!

Collapse
kspeakman profile image
Kasey Speakman

I think this is spot on. If management starts looking for a metric, employees will automatically (without even thinking) start gaming that metric. If you want an engaged employee you have to put them in charge of their advancement. "What are your goals? Where do you want to be next year? What does your ideal future in this company look like?" Then give them guidance to get there. Point out any obstacles they will face. For example, if they want to be a manager but don't like people... maybe work on those soft skills. But required assessments based on some generic/arbitrary criteria are simply dehumanizing.

Collapse
vanvictorlim profile image
van

One idea I can think of are weighted tasks, wherein managers assess each deliverables based on difficulty. It needs to be carefully deliberated, not only by the managers, but by everyone who's already been thru such similar task. From there, every programmer involved is awarded to every milestone achieved.

This has been the practice on some teams.