Many development teams skip having a staging environment for their applications. They often submit a PR, potentially run tests in a CI system, merge to master, and then deploy to production. If they have a staging environment, it's often not a true reflection of the production environment due to lack of effort, complexity, or expense. This is a risky pipeline because there is no true integration environment or integration testing being performed. What’s worse is that if there is an issue, developers may engage in “cowboy coding” to try to resolve the issue on their live production environment.
In this article, we'll discuss some of the advantages of using a true staging environment in your software development lifecycle, and why they don't need to be complicated to set up.
A staging environment is another environment that you deploy to in your software development and deployment process. You deploy to a staging environment before you deploy to your production environment.
Staging environments are generally meant to be identical or nearly identical to production. This means that they have the same hardware, software, and configuration. The closer you can get to this, the more useful your staging environment will become.
That level of sameness between staging and production ensures that testing on your staging environment reflects what would happen in production under like circumstances.
Unlike development or limited integration test environments, staging environments utilize the same back-end and up-and-downstream services. They also have the same architecture, the same scale, and have highly similar or identical configurations to the production environment.
Depending on any regulatory factors (such as GDPR requirements) and your organization’s level of ability to anonymize data, a staging environment may even have anonymized or complete sets of production data in order to more closely mimic the real world production environment. That means that a staging environment is typically not released or made available to your production user base, but rather it is made available to an internal or pilot user base.
To control costs, you can deploy to your staging environment as part of a release cycle and then tear it down after the release has moved to production.
This methodology provides you with the ability to discover any code quality issues, a higher level of data quality issues, integration problems, and or other dependency issues which would simply not be present or obvious in an integration test environment or lower local or developer environments.
This methodology also gives you the ability to predict with a high degree of confidence, whether or not your production deployment will even be successful, allowing you to answer questions like “will this new service we wrote hang when it starts up in production?” If for example, the libraries we used worked on our computers but don’t work the same on the Linux VM we use to deploy.
Working with a staging environment forces you to validate all the assumptions you made during development and ensure that you have had the relevant thought exercises necessary to ensure deployment success.
Any way you cut it, testing locally or running unit tests is not a sufficient litmus test for your product quality and functionality. Unit tests are written by humans, and humans are fallible. If you only test for known issues, then you can’t cover the issues you don’t know about.
People often forget about changes that cross service boundaries or gloss over dependencies with upstream users and database migrations. Sometimes a library you’re utilizing may work on your local machine but may not work in the cloud, and the only time you figure out that dependency failure, is when you deploy to production.
Often the data set being tested in lower environments is an unrealistic simulacrum of what is in production. Some people may think a staging environment isn’t necessary when using canary or blue/green deployments since the problem will be caught early, but you are still exposing users to bugs and misconfigurations.
All in all, this exposes users to potentially breaking changes, and depending on the level of the change, it can impact production data or other dependent services and processes.
Relying on faith and hope as a policy to ensure successful production deployments will inevitably run the risk of creating a negative perception of the quality of your product and ultimately result in lost sales, customer attrition, and possible violations of your customer SLAs.
The costs associated with these types of deployment or code failures can include:
- Having to provide immediate hotfixes
- Rolling back releases
- Impacts to development timelines
- Potential data loss
- Negative user experience impact
- Missed SLAs
- Reputational/brand risk
- Lost transactions/sales
- Lost customers
The benefit you receive from using a staging environment is a higher degree of quality assurance and customer satisfaction. Also, by reducing the impact or number of errors in your product you can see a lot of indirect cost savings. For example, you can reduce the amount of time you have to spend potentially rolling changes back, or reduce the time spent providing timely hotfixes that can also impact your regular development cycles by pulling developers out to deal with the problem. You also save the cost of potentially troubleshooting issues in production, and time lost in responding to user inquiries or bug reports.
Let’s walk through some potential scenarios that could be easily prevented if you use staging environments. We’re working with an app called Bitcoin Price Index, by Mark Chavez. This a simple React-based application, which connects the user to the CoinDesk API to provide trend information for Bitcoin prices based on the currency selected.
In our first scenario, while performing development in our lower (development/local) environments, we point our app to a mock CoinDesk API service to reduce our API usage and control costs. This URL should point to the real CoinDesk API before deployment to production.
As you can see above, this mock URL somehow made it into our main service application code. (Bad practice, I know, but it happens more often than you’d think!)
This breaking change would have worked seamlessly in developer environments as the mock URL would have been presumably available in the developers’ network. In a staging environment which mimics production, that service dependency wouldn’t have been there, and this breaking change would have been caught before going directly to production.
This is the principle value of having a staging environment: Keeping your breaking changes from going directly to production by providing you a mirror environment to test and validate your changes in.
Let’s look at another example: Two developers commit new features that both have the same style names, but in different lines of the CSS file. In each developer’s individual feature branch, the styles and subsequent product look as expected.
However, when each developer issues pull requests to merge this back into development, the overlapping styles don’t get picked up during review since they are in different pull requests. They get merged incorrectly and deployed to production. The final result leaves the product in an undesirable state.
Finally, let’s dive into the old developer expression “it worked on my machine”.
Here a developer added imagemagick to the stack to handle the modification of profile images uploaded to the site. The associated NPM imagemagick library “imagemagick” was installed and saved to package.json, but the underlying imagemagick-cli was only installed on the developer’s machine.
So when testing locally, the functionality worked exactly as expected, but when pushed to production, the feature doesn’t work and the following errors resulted in the logs.
Error: Command failed: CreateProcessW: The system cannot find the file specified
Without a staging environment, it’s easier for this type of problem to make it all the way to production.
The end reality of all these examples is that they are all completely preventable mistakes. These mistakes invariably happen, and may not be caught before production if there isn’t a staging environment. As your application grows in complexity, the potential for these types of errors also grows exponentially.
Using a staging environment as part of your SDLC and deployment lifecycle can reduce the risk of these errors being public mistakes, or private ones.
One common excuse for not using staging environments is they are too complicated or costly to set up. There is some truth to this as they can be an added expense. Associated DevOps can be an expensive endeavor, and staging environments can be as difficult to set up as your production environment. However, it does not need to be.
Modern cloud platforms provide the ability to spin up staging environments as needed and allow you to automate their deployment to be part of your standard deployment pipelines. They can prevent mistakes that would break production or block it from being deployed.
The easiest way to get started is to embrace modern cloud providers and their DevOps toolchains such as Heroku Pipelines, which can make deploying staging environments easy, even more cost-effective, and relatively painless.
Heroku Pipelines is a product from Heroku that allows you to manage a group of Heroku apps which share the same code base. Each app in a Pipeline represents a state in your delivery lifecycle. Heroku automatically spins up a staging environment when you deploy to your master branch. Once you’ve verified the app, promoting to production is as simple as clicking a button.
Alternative methods to automatically deploy staging environments include infrastructure as code and container orchestration solutions such as Terraform or Kubernetes.
Without an automated process to make this process easier for us, we would have to provision machines or containers that were identical to our production environment and then deploy them manually.
Ultimately, using a staging environment helps you to embrace modern software development and release methodologies to improve your team’s productivity. More importantly, it helps improve the quality of the product you deliver to customers.