Don't deploy on Friday afternoons!
This expression is taken as programmer wisdom but I hate it. I'm going to try and kill it, with words and experience.
The motivation behind it is sound. I don't want to spend my Friday nights debugging a production problem either.
To me the expressions smacks of unprofessionalism. Software development as an industry has a poor reputation and phrases like this do not help.
If you went to an ER on a Friday afternoon and was turned away because the doctors don't trust their own tools and know-how, how would you feel?
If we want people to take our craft seriously we need to own it and not give the impression that we dont understand the systems are we making enough to make changes at the end of the week.
Why a lot of people don't want to deploy on Fridays
- You don't understand your system. You're making changes but you're not 100% sure it'll be OK. Ask yourself why this is. Why are you scared of changing your software?
- Poor monitoring. If the users are the first to tell you if something is wrong, that feedback loop spills into time at the pub, rather than when you deploy.
- Overly-complicated rollback/branching. How is your disaster recovery? How easy is it to push a fix once you solve a bug?
Continuous Delivery (CD)
I have worked on teams that have deployed new versions of various services in a distributed system multiple times at 4:30pm and not broke a sweat.
Why? Because deployment is fully automated and is a non-event. Here is the groundbreaking process.
- Write some code
git commit -am "made the button pop"git pull -r && ./build && git push- Make a cup of tea
- It's now live.
Not so long ago it was considered normal for there to be releases every 6 months, or even just one at the end of a project.
The forward thinkers in that age saw problems with this
- Poor feedback loops
- Stressed development teams
- Catastrophic failures.
So the industry as a whole worked on lots of tooling, techniques and best practices to allow us to release software far quicker.
Recognising that releasing often reduces risk is generally accepted nowadays but teams still often settle on weekly or fortnightly releases; often matching the cadence of their sprints.
What are the problems with weekly/fornightly releases?
- The feedback loops are still not great. If you do your release there can be quite a lot of commits going live and if something is wrong it can be challenging to figure out exactly what broke. Especially if you wrote it 2 weeks ago.
- Still overly reliant on manual processes. I have seen teams actually skip a release because a QA was on holiday. This is surely unacceptable in 2018. Manual testing does not scale into the future. People leave, things get forgotten, etc.
- Let's you fall into the trap of writing stories that are dependant on other stories being finished in a "sprint". When they aren't things can get very complicated.
With CD we recognise that we can go further, deploying new software to live every time the build is green. This has some amazing benefits,
- Extremely fast feedback loops. No longer do you have to think about code you wrote 2 weeks ago when there is a problem in live.
- Forces best practices. In order to be able to deploy to live on green you need excellent monitoring and tests. These are all good things in their own right.
- Reduces stress. "Releasing" is no longer a thing any more. You can be confident in writing your software again!
- Vastly improves agility. Found a bug? Just fix it! This encourages a more lean way of working vs lots of upfront planning. There isn't even an option for a convoluted release process, you have to keep it simple.
- Forces you to work on stories that are actually releasable. Not dependent on story x y and z. Forces the best practices on user stories that everyone acknowledges but often people ignore.
But what if you break things?
Often people say with CD
Yeah it's nice but what if it breaks? We should have a QA check things over
Here's the thing, no process in the world prevents bugs. You will ship broken code. What's really important is how quickly you can detect and recover from it. Hoping manual testing will catch everything is wishful thinking.
How to CD on a new project
It is much easier to do CD on a new project since you can start small and evolve.
Generally your work should be centered on delivering the most valuable user journeys first, so this is an excellent chance to practice how to ensure that feature works without any humans checking anything.
- Write an end to end test. These are expensive to write and run and should only be reserved for your most important journeys
- Have monitoring with threshold alerts for when things go wrong
- Set up your pipeline so that when your code is pushed all the automated tests are run, if they pass go to production.
- Have some kind of green/blue release mechanism. Run your automated tests on the deployed release candidate and if they dont pass, dont ship it.
For each subsequent story ask yourself
- How will we know this is working? (monitoring)
- What tests do I need to have enough confidence this will work without any humans checking. Not every story needs a full end-to-end test on a huge distrubuted system but obviously you'll need some tests.
- Is this story as small as it can be? If your user stories are massive they are more likely to go wrong. If the story takes a week then that's back to slow feedback loops.
- If you cant answer these questions then you need to rethink the story. Notice these are all just basic agile principles for user stories. Continous delivery forces you to adhere to the principles that so often get ignored
How to CD on an existing project
Peel away at the manual process
- You may have some kind of "run book" that is used when shipping the software. See what you could do to automate it.
- Find out all manual processes are happening. Ask why they are needed and what could be done to automate them.
CD up to staging.
Some companies have many environments in their delivery pipeline. A good first start is to automatically ship all the way up to the environment before live. A better step is remove as many of them as you can. It's ok to have some kind of "dev" environment to maybe experiment with but ask yourself why cant just test these things locally in the first place.
Identify a sub-system you could work with as a starting point
If you're working with a distributed system you might be able to identify a system which is easier to CD than the rest. Start with that because it'll give your team some insights into the new way of working and can help you begin to break the cultural barriers.
CD is a cultural issue as much as a technical one
Roles and responsibility
Often a product owner or project manager wants to be the one who is in charge of releasing.
There are circumstances where exposing features to users should be controlled by a non-technical member of your team, but this can be managed with feature toggles.
But the copying of code from one computer to another is the responsibility of the developers on the team. After all we are the ones who are responsible for making sure the system works. It is a technical concern, not a business one.
What do QAs do now?
CD is actually liberating for QAs
- Rather than spending their time manually testing poorly tested systems they can now focus on a more holistic view of the system, trying to facilitate an environment for CD so that the whole team can be confident things are working working
- QAs spend more effort helping developers define what needs to be tested and monitored for a story to be written.
- More time for exploratory testing
Re-evaluate your tolerance for defects
Lots of companies think they cannot have any defects and will spend a lot of time and effort on complicated, time consuming (and therefore expensive) processes to try and stop them.
But think about the cost of all this? If you push a change to production that isn't covered by tests, perhaps a CSS change; consider if it's really catastrophic if there's a small visual fault for some browsers
Maybe it is, in which case there are techniques to test specifically for this too.
Recovery
Each release you do with CD will have the following qualities
- Plenty of tests
- Good monitoring
- Small scope
- Still "fresh" in the developer's mind
So in my experience fixing anything that falls through the cracks is easy. It's much less complicated than trying to look through 2 week's worth of git history.
I would recommend in most cases not rolling back (unless it's really bad), but just fixing the issue and releasing it. Rollback is sometimes not an option anyway (e.g database migrations) so the fact that your system is geared to releasing quickly is actually a real strength of CD.
Other quick tips
- Fast feedback loops are key. Measure your
git pushto live time and keep it low. - If things are getting slow re-evaluate your end-to-end tests. If you removed a test or refactored it to be a unit test, would you be any less confident? If not, then refactor away
- You may need to invest some time in making your components more testable to avoid writing lots of slow end-to-end tests. This is a good thing.
- Feature toggles are a useful tool but can become messy, keep an eye on the complexity
- Celebrate it. Moving from one release every 2 weeks to 50 a day feels great.
Wrapping up
This has been a small introduction to CD, it's a huge topic with plenty of resources to investigate.
Continuous delivery is a big effort both technically and culturally but it pays off massively.
I have worked on distributed systems with over 30 separately deployable components with CD and it has been less stressful than another project I've worked on that had just a handful of systems but a ton of process and ceremony.
Being able to release software when it's written puts a higher emphasis on quality and reduces risk for the team. It also forces you into using agile best practices like testable, small, independently releasable user stories.
Maybe most importantly, it demands that you understand your system and that you're not deploying to production and crossing your fingers. That feels more professional to me.
Latest comments (39)
Yeah right. Everithing cool with more testing, a better CI CD Life cycle. I don't like to spend my weekends debugging some random thing related to an important new feature that can't be turned off because of business reasons. I would deploy a poc, a Friends and family on friday afternoon though. It is not lazyness. It is the preasure of knowing that the business is loosing millons of dollars every hour and it is your responsability to fix it and you had a couple of drinks because it is saturday night, and the other persons that were responsible for the oncall can't get reached. Is prioritizing my me time. On my weekends I code other stuff and I rather do that than the things I do 8 hours everyday. On my weekends I disconnect from Office and I engage with my family.
Nice gif
You start off by comparing apples and oranges. A standard code release into production is more like a scheduled operation, walking into the ER is like applying an emergency hotfix. It's also fairly typical for hospitals to not have operations over weekends and last thing on Fridays for the same reason most developers don't release code on those times: there's not enough staff on hand to support those things.
I think it's unreasonable to expect any single dev to understand 100% of a medium to large system. Bear in mind, these things are built by teams, and one product or service might actually have hundreds of microservices behind supporting it. Any single one of those could go wrong in a myriad of ways. Even Rain Man would have trouble keeping up with that.
Well, that's just patently wrong. A good suite of automated tests supported with good manual QA will prevent bugs. It won't stop everything, but it's far better than not testing.
Some things just can't be easily tested locally, and some things shouldn't be tested by the developer who wrote them. Having testing environments that very closely match production is vital for proper continuous delivery.
While you still have some form of argument for deploying late in the day and just before the weekend, this only works for
Anything else means that you care enough about the project that you don't want to release bugs that could have been caught with a few hours more testing, or that you really aren't 100% sure about the scope of your changes. That latter is ok; as a developer you should be confident that you're not causing side-effects with your changes, but sometimes there can be no guarantee.
My point is for most teams we dont need this overhead, in fact the additional overhead of process and delayed feedback loops makes it even more likely your system to be problematic.
I agree, but for a given story a developer should have 100%(ish) knowledge of the surface area of what they're changing. Remember agile advocates for small stories so I dont think we're asking too much here. In addition, everything else should be tested so there is very low risk in that respect.
I think you may have misinterpreted. What I am trying to say is no matter what you do, you will still probably encounter bugs; even with automated tests, manual checking etc.
Re your other points, this approach is successfully used on large systems. Go to any devops-themed conference and they emphasise that this, along with the best practices i described (monitoring, observability, testing, small releases, etc) is the way forward. Re "production like staging environments" I recently went to a conference where the talk had a lot about how the industry is kidding itself if it thinks it can be "production" like in non prod; particulary with distributed systems and the unpredictability of user behaviour. Our time is much better served optimissing for MTTR (mean time to recovery) enterprisedevops.org/article/devop...
This article shows your complete authority and experience on the subject. Enjoyed reading.
Interesting, we only deploy on a Friday afternoon. It's the day where everyone's workstack is at their lowest.
IMHO it's also down to how your userbase prefers to get updates. Sure, no user wants to use buggy code, but bundling and explaining changes in regular interval could also be better use of their attention span.
There's a separation though between releasing and delivery.
Ideal world in terms of simplicity delivering features as they're ready (continous delivery)
However you can still shipe your code and hide new features with toggles. This means your code is always deploying, so you get all the benefits of feedback loops but you control what the user sees. The downside is feature toggles have a cost in terms of maintainability and effort.
Yeah I think that after a threshold of complexity and size of user base basically every new feature will have to be behind a/b test or feature toogles.
Thank you for sharing your insights and experience. I think you can't stress enough three importance of good automation and solid tests for your codebase to be able to release confidently.
We do not deploy on Fridays though for two reasons: the first reason is that we have a desktop software which needs to be installed by the users. So if we push something and find a bug afterwards, we can't just update the css and after a page refresh everything works. The users will have to install the software again, which will annoy them if it happens too often.
The second reason is for our backends. We have tests and automation in place, but if something should go wrong, it can get pretty expensive pretty soon. So if you say they people should be professional like doctors or lawyers, this would mean that they should also be held responsible alike. So if a lawyer makes a mistake that costs his client €10k, he should better have an insurance to cover the costs. What happens when we as developers make such a mistake? Should we be responsible and compensate the loss? This would be a strong motivation to get it right with every deployment, but it would probably be unrealistic to happen. The minimum would be to go and fix stuff even in the afternoon or on the weekend if it breaks. Good monitoring does not help if it goes to your work inbox and will not be noticed till Monday.
Please mind that I don't want to oppose your arguments for confident releases with a good CE pipeline, but I think there are still good reasons to be careful with the deployment time. I'd rather wait till Monday to deploy to keep and eye on it than push it live and leave for the weekend, even if it does work most of the time.
I don't get it. After the release, if there's a bug, you need to release next version anyway and people would need to re-install the software again. So what's the difference if you release on Friday, Monday 6 AM or Christmas? I always thought it is easier with regard of deploy time for "boxed" products rather than continuously delivered services.
The point was going in the direction of "there will be bugs anyway, so you don't need a QA before the release" (which was not meant this way, I know!). Therefore, I was bringing up a different aspect for desktop software. Of course, if there is a bug, it needs to be fixed, but it can also make sense to wait and collect a few bugs and fix them a few days after the release instead of doing a minor release per bug that was found (depending on the severity).
The main difference about the timing is: If you publish software with a critical bug on Christmas eve, you might not spend Christmas with your family because you are hotfixing. If you do this two days before Christmas or just after New Year, you might spend one or even three days fixing that bug as well, but the bug has your full attention during your regular working hours.
Of course, if a developers does not care, this developer would also not spend Christmas fixing a bug, but I assumed that people reading here is more of the kind that takes developing to the heart ;-)
You make some fair points, I tried to communicate it requires you to evaluate your risk tolerance. I would content most companies dont need to spend so much time/effort on weekly (or whatever) releases in the name of not shipping bugs; but there are of course exceptions. Certainly desktop software has very different shipping requirements!
While I don't disagree that CD is good, that isn't a good argument to purposefully release software with decreased availability. I understand that any type of issue should be quick to diagnose... but why? It can wait.
You said it best:
From start to close, the whole article seems like it is looking for validation from some onlooker to say "wow you really ARE professional, aren't you!". Who cares? Someone judging you based on your professionalism won't have any concept of what CD is in the first place.
That is really not the impression I'm trying to give. What I've failed to explain is CD results in more reliable software as it stops you taking shortcuts. It forces you to address shortcomings in your testing/monitoring/etc if you're going to be able to release when code is checked into master.
Again, miscommunication on my part I suppose. I would prefer to raise the game of teams I work on by emphasising that releasing and crossing our fingers so much that we dont want to release on Fridays is not good enough.
I work at a company that has a CI/CD pipeline, we routinely deploy 10-20 times a day, including on Fridays...but we still say this occasionally. It's a matter of context. There are lots of deploys that we don't do on Fridays precisely because if it's that easy to deploy, what are the odds that you've got a critical yet risky change that a) didn't make it through your pipeline and b) must go out now, at Friday at 4pm?
I don't think this is the right analogy. How about:
If you went to an ER on a Friday afternoon, would you want to be seen by the surgeon who's finishing her shift of the one who's starting hers? Also, how would you feel about seeing the one who's finishing first, and having her hand you off to the one who's starting?
Before you answer, here are some scary numbers: statnews.com/2016/02/01/communicat...
It's always a matter of context :)
I'd say that if the context of your deployment makes it risky, then it's a bit of a smell. I'd start trying to work out why it's risky to deploy this software, and how I can reduce that risk, rather than avoiding it.
Clearly changing a typo in a greeting message, or adding a single line of CSS to adjust a margin on something is always going to be less risky than an functional update that adds a new feature to a product/service. With all the best efforts to reduce the risk, there is always going to be a difference in risk level for any single piece of work, just like there are differences in the effort of work for any given user story. Taking this into account, deployments are always contextual, and like smells (code or otherwise), that's not necessarily a bad thing. It's just a single measure of risk, and something that shouldn't be taken in isolation.
I like the idea because it allows to automate almost everything outside of coding and complicated manual tests. I don't think I'd be comfortable with extremely fast feedback loop though unless someone else writes the tests for the feature I'm adding (just because it works the way I expect it to doesn't mean that's what you were asking for)
Also, a little offtopic but I need a "dev" environment because I want an environment as close to production as possible and my work PC is not strong enough to handle a running app (including all necessary stuff like database filled with test data, proxy server, whatever) on top of a dozen of Chrome tabs, a few Firefox and IE tabs, a couple of messengers, a few projects open in IDE and a text editor
I would argue you need to get out of this habit. You mustn't fall into the trap of not understanding what you're making and hoping someone else will know. This is why discussing what you're writing before you do it is important. Collaboration is key.
And ultimately lets say you do somehow make slightly the wrong thing. Well in most cases, it wont actually be a big deal. And here's the kicker, you can change software! Perfect is the enemy of good.
Perhaps a way around this is decomposing your app into smaller systems. Remember CD (and agile in general) encourages small stories so you shouldn't need to spin up the entire system in order to be confident your change will work.