Note that everything written here is solely mine and it does not represent any of my past/present/future employers.
This three-part series will walk through my mini journey in search of a new holy grail for latest software development/delivery techniques
- Part 1 - The Pledge
- Part 2 - The Turn (this post)
- Part 3 - The Prestige
The Turn
As mentioned in Part 1, my search of a new holy grail for latest software development/delivery techniques led me to 2 sources that inspired me about Quality and Time.
Quality
Let's talk about the first source which is a four-part article written by Lyft Engineering Team about their evolution on scaling productivity.
At first Lyft was approaching their testing process and environments like what everyone else would and as if we were all a by-product of the same school of thoughts. The environments were set up as follows:
Devbox - An in-house local development VM that took care of everything starting from packages installation and update, configuration all the way to database creation and seeding. The only thing left to be done for a developer was just a command to start it up. How convenient!
Onebox - A longer-lived environment which can be shared within the organization. It is basically Devbox on steroid running on high-tier EC2 instance (r3.4xlarge).
Staging - An environment that is nearly identical to production except it has smaller footprint and no production data.
And this is how the testing process looked like
Local test - Lyft team relied on Devbox to conduct local testing across multiple services on their MacBook laptop.
Integration test - This test is conducted on Onebox's cloud infrastructure. A manifest.yaml file would define the group of dependencies and a temporary Onebox would be spun up to execute tests per each pull request.
Load test - The staging environment is used to perform load testing. The decision was the result of the surge in rideshare traffic during peak holiday events such as New Years Eve or Halloween. Lyft built a simulation of ride traffic at scale to run against the staging environment and it constantly generated up-to-date data for users, rides, payments, and so on. The deploy authors would see error logs and alarms immediately if anything goes wrong.
The above environments and testing process seem to be so perfect out of textbook. What could go wrong with that design coming out of one of the best group of software engineers in the world?! This seemed like the best setup you could wish for as a software engineer if you wanted to aim so high for a top-quality product. Let's find out more below.
Scalability - Eventually as Lyft's services grew so large that it was impossible to get Onebox to be useful as it was used to be. People probably got upset before giving up and moved to testing in Staging instead because it scaled so well. However, as more engineers flocked into relying more and more on Staging, they made their own disservice because some changes broke Staging blocking others from completing their work. Also since one change per service could effectively be tested at a time, it took merely too long to build, deploy and get to your turn for testing.
Maintainability - With the growth of Lyft's services which indirectly enticed people to ignore Devbox and Onebox in favor of Staging for their testing (also at the same time, Production and Staging were well-maintained), the boxes were falling behind more and more. As a result, less teams were taking care of how their services behaved in Devbox and Onebox triggering downward spiral of maintainability.
Accountability - Given the aforementioned of Onebox's status, there lacked a clear ownership or accountability of who should fix what when something breaks.
Efficiency - If you're a seasoned software engineer, it is not beyond fathomable to predict that more often than not, the response to defects or production incidents would involve adding more testing coverage on the environment that you would trust to catch the regression; yes.. the Staging. This in a way caused the entire test suites to be bloated and full of obsoleted test cases because, you know it, we only tend to add rather than remove; it's generally saver of course.
After I read up to this point in the first part of the series, it just dawned to me - here in a metaphorically fashion - that we often have a good intention to plant one tree at a time but forget how the forest shapes up until it's a little too late.
Quality is never the only mandate and time, in whatever dimension to be measured, has to be accounted for. I will not go any deeper into the rest of Lyft's series as you can follow the original article and read them yourself. What I learned from Lyft's transformation though is very powerful yet people always forget to refer back to Software Testing 101. Sticking to the testing pyramid and making it to work for the best of your software's quality and time.
Time
Alright, now we got most background about Quality and Time spent during development cycle. How about the other side of Time that we, professional software developers, would worry about; Time to release or how much a team really delivers in a given time.
In Agile development, there is a guideline of velocity measurement where it is assumed over the long term, an Agile team should be able to build a baseline of story points that they are capable of. In addition, the baseline will be used to predict how much the team can deliver in a sprint or even the entire project. This property of story point has been advised as a way to measure productivity. Please do not get me wrong, I believe I understand enough about a lot of misconception on story points such as that it is bad when delivered points are down in a given sprint and some teams would pressure themselves to get it back up. Leave me in the comment below if you believe I get it incorrect otherwise.
Having been working in so-claimed Agile environment for almost a decade, I am yet to satisfy about any of my past experience on how exactly I should measure whether the team is doing on its own best about time spent to build and release value. In many occasions, I often have a big doubt about the flakiness of story points starting from how they are assigned, poker-pointed and to eventually how they are being really useful over the long term.
This unsatisfactory brought me to do more research for a practice that may or may not be already out there somewhere in hope to find an answer that I will feel better than what I have had experience with - as harsh as I could be, story point concept has been useless to me in practicality. Then the search brought me to the second source of my inspiration about Time on a podcast featuring a blog owner named Lucas F. Costa. You could go follow Lucas's blog here.
In one of Lucas's article, he laid the groundwork resonating with my experience velocity is not an effective way for engineers and managers to use as metrics because it in and of itself does not give readers enough insights to conduct an effective data-driven team conversation. On the other hand, Lucas proposed to look at the software delivery pipeline as a queue where the following properties are more useful as metrics:
- Arrival Rate - The rate at which tasks arrive in the system
- Departure Rate or Throughput - The rate at which tasks leave the system
- Queue Size - Amount of work in progress at any given time
- Cycle Time - Duration a task gets done
Those properties in modern software development tools could be shown under "Cumulative Flow Diagram" as shown in the image below (Image reference: Lucas's article)
In theory, the textbook CFD will look like below where it indicates your engineering team is having a constant value delivery pipeline, e.g. stable throughput. Images below are borrowed from the article.
However, one of the examples I love the most is below where Lucas demonstrated a situation where the engineering team might have an issue with accumulating some tasks for too long resulting in multiple periods of no throughput.
At this point, our readers may ask further if there is anything you could do to prevent some undesirable to happen, you will not be disappointed that Lucas also outlined his rationale in another article. It comes down to avoiding a fundamental mistake that most of us seasoned software development professionals know but may forget from time to time; High Capacity Utilization. In short, watch and control the queue size which is a leading indicator of arrival-throughput imbalance; hence long cycle time which should be avoided. The concept of watching out the queue makes total sense to me where if your team can only work on 5 units of task at any given time, the more items in the queue means the more pressure your team will feel and it would seem everything is a priority and a rush. What do you think about this so far, leave me a comment below.
I am not sure how much you are convinced so far with Lucas's work but let me assume that you believe to a degree that large queue size is a leading indicator to long cycle time which is a bad thing. But what exactly why it is a bad thing?!?!
Hear me out, I cannot say I believe everything Lucas said but to this question, he has another article that explained what long cycle time tends to produce here. In essence, long cycle time kills productivity due to possible incurring overheads such as context switching costs, accumulated delta changes, difficulties to find defects, etc.
After I read all Lucas's works up to this point, I have my own conclusion that
The best measurement of Time in software development is not about Speed but Predictability
This is because to be predict something, the following elements must exist:
- Stability - If something is stable or consistent, the metrics against that can be used to predict more accurately
- Scope - Aiming for predictability encourages us to break down our works into small meaningful units where we know their boundaries
Still not convinced? Let me try one more time. In the same article, Lucas used Monte Carlo mathematical model against 2 hypothetical team where one is having a consistent throughput (small and consistent batch deliveries) while the other team is the opposite (large batch deliveries). The simulation result on the team with consistent throughput yields a smaller possible ranges of the number of dates needed for the entire project (from around 20 days to 40 days).
On the other hand, the team with high variability of throughput data results in wider range of possible project duration (from 0 days to 90 days)
My last word - what customers would not love a company that keeps their promise of product specification and deadline?
In the last part of this series; The Prestige, I will combine the lesson from the two great sources above into my thoughts on what I believe is one of the best ways in 2023 to support your team to scale and thrive in the new economic expectations through the definition we just discovered about Quality and Time.
Top comments (0)