Why you should deploy on Friday afternoon

Chris James on May 28, 2018

Don't deploy on Friday afternoons! This expression is taken as programmer wisdom but I hate it. I'm going to try and kill it, with words and e... [Read Full]
 

A great article, thanks. I'm working on a project where agile has become "we do one release every two weeks" which always gets delayed because everything gets tested the day of the release or the day before. It's actually painful and not agile at all. I often don't even remember what has been waiting in the staging pipeline for 2 weeks 🤔 I feel like we can do better but the biggest hurdle is that agile has been translated in 2 weeks sprint and nothing else. It's funny that we became rigid with a methodology that's supposed to be flexible and adaptable.

At least we have continuous integration...

 

There have been lots of anti agile post lately. One good bit of advice is to simply decouple project processes and technical processes as much as you can get away with. Some organizations just are going to reinvent waterfall no matter what label you slap on the process.

Just because you are doing scrum to plan what you do doesn't mean you can't do CD. If you want to stop time once every two weeks and celebrate all the stuff that went live over the past two weeks, fine. But don't block deployments on some poor overworked PM staring at an issue tracker. That's the opposite of agile. Instead, it ships when it is better than what you had; if it isn't don't merge it. If what you merged is not good enough or can be improved, change it some more. That's called iterating; it needs to happen more often than once every two weeks to get good results.

 

Cant upvote this enough. I tried to argue that a lot of what CD advocates is what agile is meant to be. You know, being able to respond to business needs as simply and easily as possible!

Either way, decoupling technical and business processes is a good first step. Does your PO/Scrum master/whatever need to know when you copy code from one computer to another?

 

@jillesvangurp I have nothing against agile :-D just that in this particular case of this particular project it has just become another buzz word that didn't change the status quo much.

But don't block deployments on some poor overworked PM staring at an issue tracker. That's the opposite of agile. Instead, it ships when it is better than what you had; if it isn't don't merge it. If what you merged is not good enough or can be improved, change it some more. That's called iterating; it needs to happen more often than once every two weeks to get good results.

I agree, but in this case it's the PM that's also doing the tests that's blocking the release cycle, not the devs :-D It's a combined problem of lack of resources (no QA/test team) and not applying agile at all. Anyhow we can stil try to do CD, it's a cultural problem as Chris said.

There's your problem: you have a human in the loop. You can't do CD with humans in the loop. The whole point of CD is to rely 100% on your automated tests. Your PM is part of the problem, not the solution. If he insists on signing off manually on everything before it ships that means there's a lack of trust for people to do the right things. That's bad; call it out.

When it comes to acceptance testing, your PM should be doing that in production and keep on insisting on incremental improvements until the feature can be signed off on. Likewise, the pressure should be on developers to not merge things that aren't production ready and to keep production functioning correctly. If unacceptable/poorly tested changes are landing in production regularly, that means your PM messed up planning the feature or the team messed up building the feature. Either way, that needs to be fixed if that happens regularly but not by crippling CD. If that happens, you have a competence problem, not a process problem.

The longer PMs drag their heels, the more expensive deployments get. The cost of a deployment grows exponential to the size of the delta. The price of shipping a 1 line change: almost zero. The price of shipping a delta that touches double digit percentages of your code base: weeks of testing; integration headaches; haggling; over time, etc. It gets ugly very quickly. Add to that the cost of not having features that were ready to go weeks ago adding to your revenue. The economics here are brutal. Not doing CD is very expensive. You end up wasting lots of time and money.

The opposite is also true, the cost of fixing issues resulting from small deltas is very low. So, shipping lots of small things ASAP lowers costs by decreasing testing effort, maximizing time to market, and decreasing the cost of defect fixing.

 

Thanks for the kind words.

It can be tough that's why I felt it was important to mention it's a cultural challenge just as much as a technical one. You have to try and convince the right people that releasing more often with less manual process is less risky and scales better.

Good luck!

 

I agree, the hard part is to find a better release schedule

thanks

 

Testing day before release makes assumption there are no issues to fix.

Our team got around it by adding +50% time to sprint for fixes or adjustments. If there are no issues to fix, we have this already booked time to deal with tech debt or fix punch of smaller issues that never get highest priority on their own.

 

In our case what happens is that the "release day" is not a fixed day because there was too much to test and this is a problem that speaks for itself :D

I'll work on changing the approach!

We struggled too, as our tester was part of another department, hardly ever available. Once we managed to explain management it's a bottle neck, and it didn't happen over night, eventually we got dedicated tester for our team. Think it was around two years ago. It has made a massive difference, we get our feedback very quickly, no slipping deadlines, everyone's happy :)

 

I know several experienced people across different companies that have gotten rid of their staging server and now do CD straight to production. I'm considering the same as it mostly just create deploy bureaucracy.

  • Obviously, you need good tests. Anytime you have a problem in production that did not break your tests, that's a problem. Goes without saying really.
  • You need insight into your servers. That means monitoring, hardware telemetry, application telemetry, and logging. These are very different things that get confused a lot because they can all be facilitated with a good logging infrastructure. We use things like metricbeat, filebeat, auditbeat, logstash, and kibana. Great stuff; and mission critical if you don't want to run blind.
  • Stuff will happen anyway and you need to not get hung up on it and instead just move forward with a fix or at least a change that takes out whatever is broken. Learn from it and fix your tests as well of course. What matters here is the mean time between issues and the mean time to get the fix out. If this happens on a daily/hourly basis, fix it.
  • No rollbacks though. Don't deploy versions older than what you just deployed. If something breaks, you fix it with a new version. So instead of restoring some git hash, instead just do a git revert or some other commit that rolls you back and let that roll through your CI. This keeps your CD simple: it only rolls forward. Your version is always HEAD. If HEAD sucks, create a new HEAD that doesn't suck any way you can (hint, git revert gets the job done). Your fixes roll out the same way everything else does. No manual overrides.
  • Some things you shouldn't test for the firs time in production. You need infrastructure to facilitate testing e.g. db schema changes in a sandbox. These are comparatively rare that you don't need a dedicated environment running 24x7 but it sure is nice to spin some sandbox up on demand and test some things out. This requires automation. Most projects I've been on don't have the ability to this at all and rely on staging instead. Staging in those projects is usually a mess.
 

Nice comments

db schema changes in a sandbox.

I would argue that you should be able to test this locally. When we run locally it runs every schema change script on a local containerised DB. When you make a change just TDD the change like normal.

 

Sure works for toy databases. But that alter table that works fine locally might completely cripple your production DB. Once it does, all the fixes are ugly. Some things just resist closed world assumptions.

+1 to this and not just the obvious alter tables.

 

I work at a company that has a CI/CD pipeline, we routinely deploy 10-20 times a day, including on Fridays...but we still say this occasionally. It's a matter of context. There are lots of deploys that we don't do on Fridays precisely because if it's that easy to deploy, what are the odds that you've got a critical yet risky change that a) didn't make it through your pipeline and b) must go out now, at Friday at 4pm?

If you went to an ER on a Friday afternoon and was turned away because the doctors don't trust their own tools and know-how, how would you feel?

I don't think this is the right analogy. How about:

If you went to an ER on a Friday afternoon, would you want to be seen by the surgeon who's finishing her shift of the one who's starting hers? Also, how would you feel about seeing the one who's finishing first, and having her hand you off to the one who's starting?

Before you answer, here are some scary numbers: statnews.com/2016/02/01/communicat...

It's always a matter of context :)

 

I'd say that if the context of your deployment makes it risky, then it's a bit of a smell. I'd start trying to work out why it's risky to deploy this software, and how I can reduce that risk, rather than avoiding it.

 

Clearly changing a typo in a greeting message, or adding a single line of CSS to adjust a margin on something is always going to be less risky than an functional update that adds a new feature to a product/service. With all the best efforts to reduce the risk, there is always going to be a difference in risk level for any single piece of work, just like there are differences in the effort of work for any given user story. Taking this into account, deployments are always contextual, and like smells (code or otherwise), that's not necessarily a bad thing. It's just a single measure of risk, and something that shouldn't be taken in isolation.

 

Here's the thing, no process in the world prevents bugs. You will ship broken code.

EXACTLY

That's why you don't ship broken code on a Friday afternoon. Because people are less available and you want 0 chances of getting a problem while an important stakeholder is already in a plane for the week-end.

 

The point is if i ship something at 4:30 on a friday it will be a small change backed by excellent tests and monitoring. If something goes wrong it will be trivial to fix.

If it's something I wouldn't spot easily im just as likely to ship something on Thursday that's not spotted until the weekend.

Bunching up changes (and forgetting about them) for Monday is more likely to cause problems.

 

Not sure that holds to my experience. We deploy dozens of services multiple times a day; many by CD. We also deploy on Fridays but try to avoid afternoon deploys.

There are less eyes on things at the weekends so subtle problems get missed and you sometimes get rushed work as folks try to squeeze that last thing out to production before the weekend.

We have excellent monitoring and logging; we aren't afraid of breaking things but the human cost of having to debug something at beer o'clock on a Friday is something we don't advocate.

 

While I don't disagree that CD is good, that isn't a good argument to purposefully release software with decreased availability. I understand that any type of issue should be quick to diagnose... but why? It can wait.

You said it best:

no process in the world prevents bugs

From start to close, the whole article seems like it is looking for validation from some onlooker to say "wow you really ARE professional, aren't you!". Who cares? Someone judging you based on your professionalism won't have any concept of what CD is in the first place.

 

that isn't a good argument to purposefully release software with decreased availability.

That is really not the impression I'm trying to give. What I've failed to explain is CD results in more reliable software as it stops you taking shortcuts. It forces you to address shortcomings in your testing/monitoring/etc if you're going to be able to release when code is checked into master.

From start to close, the whole article seems like it is looking for validation from some onlooker to say "wow you really ARE professional, aren't you!".

Again, miscommunication on my part I suppose. I would prefer to raise the game of teams I work on by emphasising that releasing and crossing our fingers so much that we dont want to release on Fridays is not good enough.

 

IMHO it's also down to how your userbase prefers to get updates. Sure, no user wants to use buggy code, but bundling and explaining changes in regular interval could also be better use of their attention span.

 

There's a separation though between releasing and delivery.

Ideal world in terms of simplicity delivering features as they're ready (continous delivery)

However you can still shipe your code and hide new features with toggles. This means your code is always deploying, so you get all the benefits of feedback loops but you control what the user sees. The downside is feature toggles have a cost in terms of maintainability and effort.

 

Yeah I think that after a threshold of complexity and size of user base basically every new feature will have to be behind a/b test or feature toogles.

 

Thank you for sharing your insights and experience. I think you can't stress enough three importance of good automation and solid tests for your codebase to be able to release confidently.

We do not deploy on Fridays though for two reasons: the first reason is that we have a desktop software which needs to be installed by the users. So if we push something and find a bug afterwards, we can't just update the css and after a page refresh everything works. The users will have to install the software again, which will annoy them if it happens too often.

The second reason is for our backends. We have tests and automation in place, but if something should go wrong, it can get pretty expensive pretty soon. So if you say they people should be professional like doctors or lawyers, this would mean that they should also be held responsible alike. So if a lawyer makes a mistake that costs his client €10k, he should better have an insurance to cover the costs. What happens when we as developers make such a mistake? Should we be responsible and compensate the loss? This would be a strong motivation to get it right with every deployment, but it would probably be unrealistic to happen. The minimum would be to go and fix stuff even in the afternoon or on the weekend if it breaks. Good monitoring does not help if it goes to your work inbox and will not be noticed till Monday.

Please mind that I don't want to oppose your arguments for confident releases with a good CE pipeline, but I think there are still good reasons to be careful with the deployment time. I'd rather wait till Monday to deploy to keep and eye on it than push it live and leave for the weekend, even if it does work most of the time.

 

You make some fair points, I tried to communicate it requires you to evaluate your risk tolerance. I would content most companies dont need to spend so much time/effort on weekly (or whatever) releases in the name of not shipping bugs; but there are of course exceptions. Certainly desktop software has very different shipping requirements!

 

We do not deploy on Fridays though for two reasons: the first reason is that we have a desktop software which needs to be installed by the users. So if we push something and find a bug afterwards, we can't just update the css and after a page refresh everything works. The users will have to install the software again, which will annoy them if it happens too often.

I don't get it. After the release, if there's a bug, you need to release next version anyway and people would need to re-install the software again. So what's the difference if you release on Friday, Monday 6 AM or Christmas? I always thought it is easier with regard of deploy time for "boxed" products rather than continuously delivered services.

 

The point was going in the direction of "there will be bugs anyway, so you don't need a QA before the release" (which was not meant this way, I know!). Therefore, I was bringing up a different aspect for desktop software. Of course, if there is a bug, it needs to be fixed, but it can also make sense to wait and collect a few bugs and fix them a few days after the release instead of doing a minor release per bug that was found (depending on the severity).

The main difference about the timing is: If you publish software with a critical bug on Christmas eve, you might not spend Christmas with your family because you are hotfixing. If you do this two days before Christmas or just after New Year, you might spend one or even three days fixing that bug as well, but the bug has your full attention during your regular working hours.

Of course, if a developers does not care, this developer would also not spend Christmas fixing a bug, but I assumed that people reading here is more of the kind that takes developing to the heart ;-)

 

Oh yes, someone finally said that! I've been deploying on Friday afternoons for more than two years but everytime there is someone (from different company or even different team) who says it's unprofessional. Same as you, I think it's the other way around - it's unprofessional to not be able to deliver when it's needed/ready. Improve your process instead of trying to find excuses. Or at least stop pretending you're doing agile.

 

Yeah right. Everithing cool with more testing, a better CI CD Life cycle. I don't like to spend my weekends debugging some random thing related to an important new feature that can't be turned off because of business reasons. I would deploy a poc, a Friends and family on friday afternoon though. It is not lazyness. It is the preasure of knowing that the business is loosing millons of dollars every hour and it is your responsability to fix it and you had a couple of drinks because it is saturday night, and the other persons that were responsible for the oncall can't get reached. Is prioritizing my me time. On my weekends I code other stuff and I rather do that than the things I do 8 hours everyday. On my weekends I disconnect from Office and I engage with my family.

 

You start off by comparing apples and oranges. A standard code release into production is more like a scheduled operation, walking into the ER is like applying an emergency hotfix. It's also fairly typical for hospitals to not have operations over weekends and last thing on Fridays for the same reason most developers don't release code on those times: there's not enough staff on hand to support those things.

You don't understand your system. You're making changes but you're not 100% sure it'll be OK. Ask yourself why this is. Why are you scared of changing your software?

I think it's unreasonable to expect any single dev to understand 100% of a medium to large system. Bear in mind, these things are built by teams, and one product or service might actually have hundreds of microservices behind supporting it. Any single one of those could go wrong in a myriad of ways. Even Rain Man would have trouble keeping up with that.

Here's the thing, no process in the world prevents bugs.

Well, that's just patently wrong. A good suite of automated tests supported with good manual QA will prevent bugs. It won't stop everything, but it's far better than not testing.

It's ok to have some kind of "dev" environment to maybe experiment with but ask yourself why cant just test these things locally in the first place.

Some things just can't be easily tested locally, and some things shouldn't be tested by the developer who wrote them. Having testing environments that very closely match production is vital for proper continuous delivery.

While you still have some form of argument for deploying late in the day and just before the weekend, this only works for

  1. projects that are sufficiently small and where it's easier to understand the whole system, or where the change is so small (fixing a typo in a sentence, etc) that there is no way it could affect anything else
  2. projects that you don't care about too much, and don't mind if something is released in a broken state.

Anything else means that you care enough about the project that you don't want to release bugs that could have been caught with a few hours more testing, or that you really aren't 100% sure about the scope of your changes. That latter is ok; as a developer you should be confident that you're not causing side-effects with your changes, but sometimes there can be no guarantee.

 

A standard code release into production is more like a scheduled operation

My point is for most teams we dont need this overhead, in fact the additional overhead of process and delayed feedback loops makes it even more likely your system to be problematic.

I think it's unreasonable to expect any single dev to understand 100% of a medium to large system.

I agree, but for a given story a developer should have 100%(ish) knowledge of the surface area of what they're changing. Remember agile advocates for small stories so I dont think we're asking too much here. In addition, everything else should be tested so there is very low risk in that respect.

Well, that's just patently wrong.

I think you may have misinterpreted. What I am trying to say is no matter what you do, you will still probably encounter bugs; even with automated tests, manual checking etc.

Re your other points, this approach is successfully used on large systems. Go to any devops-themed conference and they emphasise that this, along with the best practices i described (monitoring, observability, testing, small releases, etc) is the way forward. Re "production like staging environments" I recently went to a conference where the talk had a lot about how the industry is kidding itself if it thinks it can be "production" like in non prod; particulary with distributed systems and the unpredictability of user behaviour. Our time is much better served optimissing for MTTR (mean time to recovery) enterprisedevops.org/article/devop...

 

I like the idea because it allows to automate almost everything outside of coding and complicated manual tests. I don't think I'd be comfortable with extremely fast feedback loop though unless someone else writes the tests for the feature I'm adding (just because it works the way I expect it to doesn't mean that's what you were asking for)

Also, a little offtopic but I need a "dev" environment because I want an environment as close to production as possible and my work PC is not strong enough to handle a running app (including all necessary stuff like database filled with test data, proxy server, whatever) on top of a dozen of Chrome tabs, a few Firefox and IE tabs, a couple of messengers, a few projects open in IDE and a text editor

 

I don't think I'd be comfortable with extremely fast feedback loop though unless someone else writes the tests for the feature I'm adding (just because it works the way I expect it to doesn't mean that's what you were asking for)

I would argue you need to get out of this habit. You mustn't fall into the trap of not understanding what you're making and hoping someone else will know. This is why discussing what you're writing before you do it is important. Collaboration is key.

And ultimately lets say you do somehow make slightly the wrong thing. Well in most cases, it wont actually be a big deal. And here's the kicker, you can change software! Perfect is the enemy of good.

Also, a little offtopic but I need a "dev" environment because I want an environment as close to production as possible and my work PC is not strong enough to handle a running app

Perhaps a way around this is decomposing your app into smaller systems. Remember CD (and agile in general) encourages small stories so you shouldn't need to spin up the entire system in order to be confident your change will work.

 

Interesting, we only deploy on a Friday afternoon. It's the day where everyone's workstack is at their lowest.

 

This article shows your complete authority and experience on the subject. Enjoyed reading.

 
code of conduct - report abuse