Development, test, performance, staging, canary, production…
I have a theory (and some experience) that having fewer environments in the software ecosystem is better than having more of them.
It’s a common practice in many software projects to have:
- The development environment — also known as the developer’s computer
- The development integrated environment — a remote environment used by developers to have their code integrated so that they can check if they broke the code of someone else, or if someone else broke their code
- The test environments — environments used for manual and/or automated tests
- The staging environment — used for acceptance testing before pushing new code to production
- The canary environment — a special kind of environment used to test new features with a small set of users (commonly used to test things internally before reaching the broader audience)
- The production environment — the environment that our users really use
Some might argue that having all these environments is a good thing. I say it’s not.
The more environments you have, the more problems you have to deal with.
Tell me. Has it never happened to you that a specific bug would only occur in a particular environment, but not in all others?
And have you never experienced an issue that was caused due to a misconfiguration in a specific environment, and you spent hours before you figure it out?
Or, has it never happened that certain feature of the application was behaving differently in different environments due to manual testing happening here and there, changing the state of the application in an uncontrolled way?
These are just a few examples of how bad things can go due to having too many environments.
In one of my previous jobs, we had only three different kinds of environments, and it was awesome! They were:
- The development environment — also known as the developer’s computer
- The continuous integration/delivery environment — created on-demand exclusively to execute automated tests and to run deployments, and they were automatically destroyed after that
- The production environment — the one used by our real users
What was good about having only these three environments?
We had fewer problems to deal with in terms of different environments being differently configured, bugs not being reproducible in some of them, and manual testing interfering with the development and delivery cycle.
I remember even arguing with our CEO that we should have a staging environment before production, to have more confidence before deploying new code that could be reaching the end-users, but luckily, he made me understand and convinced me that this wasn’t a good idea.
Of course, having fewer environments comes with challenges. Below I list a few of them.
- By having only three environments, sometimes we would introduce bugs in production because we were not able to catch them before;
- We had to find a way to “test” minimum viable features (MVFs) with smaller sets of customers or beta testers, in production;
- Performance testing would need to happen in production.
Actually, these are good problems to have, and they forced us to implement modern practices of software development that led us to innovation.
When you have the risk of introducing bugs in production because of the lack of environments, you’re forced to have excellent coverage of automated tests, all of the kind. They can be unit tests for the back-end, unit tests for front-end components, integration tests between services, API tests, functional end-to-end tests, screenshot comparison tests, performance tests, and the list goes, on and on.
Also, you must have a way to roll back the application to a known, stable state, very quickly.
In a team I worked, we were able to run thousands of unit tests, hundreds of integration and API tests, close to a hundred of end-to-end tests, and a few visual regression tests in a maximum time of ten minutes. And even more importantly, we were able to roll the application back to a known good state with two clicks. The whole process would also take a max of ten minutes. This was real continuous integration and continuous delivery.
In regards to testing MVFs with customers and beta testers, this can be solved with the use of feature toggles, and proper monitoring systems. You can have all kinds of strategies when working with feature toggles if you choose the right tool to implement it. Some tools allow you to create rules like:
- Turn on this specific feature (or set of features) to 1% (or n%) of the user base
- Turn on this particular feature (or set of features) only to this user (or set of users)
- Turn on this specific feature (or set of features) only to this particular range of IPs
- Turn on this particular feature (or set of features) only to this specific region in the globe
And the list goes, on and on, with as many different combinations as you can think of.
An excellent tool that allows you to do that is the Unleash open-source feature flag & toggle system.
Of course, by using such an approach, monitoring is essential. Still, there is beauty here. When you realize that a certain feature (that is behind a feature flag, and that’s turned on, let’s say, to 5% of your user base) is causing some stress to the system, and this is starting to cause problems in different parts of the app, you can toggle the feature off, and calmly investigate and fix the issue.
In terms of performance testing, there may be specific times where such tests can run in production in a controlled way, when the application is not receiving so much traffic, for example.
Of course, in some cases this might not be possible, and in such an exception, maybe it’s good to have a specific performance test environment, that is created and destroyed on-demand.
In conclusion, having less environments makes the system more resilient, and it helps you (and your company) to experiment, and really fail fast and learn and improve fast, which is essential to survive in this chaotic “environment” where new technologies are launched every day, new competitors come from all over the world, and users are more willing to change to a better experience.
That was on my mind for a while, and I’m happy that I was able to express it. I’ll be glad if you comment on your thoughts about this subject.
This content is also available in Portuguese on my WordPress blog, Talking About Testing and you can find it through the following URL: https://bit.ly/31i6JdW
My name is Walmyr, and I’m a software engineer passionate by software quality, test automation, teaching, learning, music, tattoos, and skateboarding. You can find me on Twitter as @walmyrlimaesilv, where I tweet about technology, both in English and in Portuguese.
Top comments (0)