re: What programming best practice do you disagree with? VIEW POST

TOP OF THREAD FULL DISCUSSION
re: You keep saying strawman fallacy and then type strawman after strawman I don't think anyone is against trying to make deployments enjoyable and e...

Hi Jeff!

Just to reiterate - nobody is telling you how to release your code - your code base and your process belong to you and nobody else, and nobody is judging you. I'd much rather have a constructive discussion rather than talk about logical fallacies, as I'm sure you would too.

That's rarely the case at all. As I pointed out, teams will often designate a different day for deployment. In that common situation, there's no panic, no one saying "don't deploy to production," nobody is afraid of their deploys. When you leave yourself time to respond to any possible post-deployment emergencies, you give yourself peace of mind. When Thursday Deployment Day rolls around, you're happy to release, and you're not afraid.

You're right - this is common. But...

What if it goes wrong? What if you release a bug? Actually - not 'what if'. You will release a bug, because we're all human and imperfect as you've previously intimated. So there's a bug, in production, on Friday afternoon.

Well, you'll probably have to roll back - it's a big release and I'll be hard to work out exactly which of the many changes made the break happen. The blame is for tomorrow - now you have to work on implementing your recovery plan. Now, I'm not sure how long it'll take to implement, but let's assume that it's longer than rolling back a single commit as it's going to be bigger and might involve a few extra steps (database migrations, perhaps).

By prioritising the vision of safety you've laid out ('do the dangerous thing less often' - excuse me if I'm paraphrasing, but that's the impression you've left me with), you're prioritising an increased mean time to failure (MTTF), at the expense of mean time to recovery (MTTR). Your program will go wrong less often (you are taking extra care over those releases, as you've said), but when it does go wrong (and we all agree that it always will go wrong in the end), it will take longer to fix it.

The alternative is to prioritise MTTR over MTTF - this is what continuous delivery is all about. We deliver the code to production in smaller releases - ideally a single commit, but this is not always possible. We aim to automate as much of the quality checking of these releases as is possible - the usual suite of unit/acceptance tests, but also smoke tests and performance tests, and a lot of metrics in the production environment to see what the effect of each release is in real time. These pipelines are optimised for speed - the releases should get out as quickly as possible.

Then, when things go wrong (and they will go wrong), we can either roll back or roll forward (more often forward) very quickly as the change was small and the reasons for the regression is obvious. Ideally most of the serious possible errors will have been caught earlier by the automated tests, and so the regression shouldn't be too serious, but - as I'll say again - serious things can and will go wrong. In this scenario we aim for them to be small changes that can be fixed quickly.

So what's this got to do with releasing on Friday? Well, if your concern is that a release will take time to fix if it goes wrong (say all of Friday), and should be managed and manually monitored during its release, and so you're only doing it once a week - I'd say you're prioritising a reducing your mean time to failure.

This might be really important for your business, but I'd always argue that it makes more business sense to reduce the mean time to recovery. To quote Roy Osherove (who I have no idea if he's an authority, it's just a good example):

If Amazon.com was down once every three years, but it took them a whole day to recover, consumers won't care that the issue has not happened for three years. All that will be talked about is the long recovery time. But if Amazon.com was down for 3 times a day for less than one second, it would barely be noticeable.

So, there it is. Do we want one day in every three years, or three times a day for one second? Do we prioritise stability, or do we prioritise recovery. As I'm building programs that keep changing due to business requirements, I prioritise recovery, and so I release small multiple times a day, every day. That's why I promote releasing on Friday as a good idea.

One other thing

First I want to clarify that a blanket rule is not a hard and fast rule, it's a default rule you can start with to flesh out your needs.

I think your meaning would be better expressed if you said 'rule of thumb'; 'blanket', as an adjective, means "covering all cases or instances; total and inclusive".

If "Nothing ever, ever goes wrong after deployment?" is a strawman, it means that sometimes things do go wrong with you. And what do you do in when this happens? Work on weekend? Why not shrink the risk with a simple, general guideline? I think that this is the question.

code of conduct - report abuse