Continuous integration services like Jenkins, Bitbucket Pipelines, CircleCI, GitHub Actions, Gitlab CI/CD force extreme automation. They require your continuous integration flows to run non-interactively in remote environments. Most of these tools do not allow you to interact with the continuous integration environment.
All of these tools require users to define deployment workflows in custom definition languages. It is very common in my experience that users of these services can only run their deployment workflows using the service in question.
When something goes wrong with a deployment, debugging takes the form of a deployment death loop:
- Make changes locally (mostly to continuous integration configuration files).
- Push to the appropriate branch on your git host (usually master).
- Wait for a job to start on your continuous integration service.
- Wait minutes or even tens of minutes following the logs to see if your changes fixed the issue.
- Repeat this procedure until you have solved the problem.
Things are even worse when you have to debug configuration that lives in a database or on a configuration management system such as Vault or Parameter Store. If you can’t manually trigger a workflow, you are forced to push superficial changes to your git repository (like adding punctuation or white space to your README) to see if your changes worked.
How to break out of this death loop?
First, any developer should be able to deploy from their local environment (provided they have the required level of access). The continuous integration workflow definition should be secondary to this mechanism. If possible, it should make use of this mechanism.
For example, at Bugout, our services are deployed by a bash script which can be executed on our servers via SSH. Even in production, if a developer has access to the production servers, they can run a deployment by simply running a git pull and then the deployment script over SSH. This is useful not only when debugging deployments but also when our continuous integration service goes down, which has been known to happen.
Second, embrace checklists. Have a checklist of external actions that must be taken before a change can be deployed. This checklist can include things like running database migrations, setting environment variables, or modifying a load balancer.
It’s okay to check items off such a checklist manually. Manual checklists may invalidate ideas of fully automated deployments. This is alright. Deployments that require manual steps should be performed with human oversight.
The general principle is to favor interactivity over automation. Make your deployments interactive by default. You will not regret it.
Top comments (2)
Oh that's what it is called! And here I thought this issue was unique to me. Luckily, I figured out a while ago that it is better to create a detailed step-by-step checklist to prevent issues with deployment, than to rely on my memory.
Haha it's so common that I'm surprised there wasn't already a name for it.
I'm curious about your checklists. Do you keep them in docs somewhere?