Discussion on: Projection Building Blocks: What you'll need to build projections

View post

Replies for: Great post as a concept description. Unfortunately no one ever describes one super important thing - how to deal with broken projections. I don't ...

You raise an excellent point, how do you manage broken projectors?

I'm currently writing a blog post on this topic and a library to take care of it. You can see the library here it talks about how it handles these issues in the readme.
github.com/barryosull/the-projecti...

The short answer for now.

If you're booting new projectors in prep for a release and there's a failure you should:
1) Mark the broken projector as "broken"
2) Mark the other new projectors as "stalled"
3) Report the error
4) Stop the release

On the next release, the boot process should attempt to play all "broken"/"stalled" projectors forward again.

If you're playing projectors as normal and there's an error, you should:
1) Mark the broken projector as "broken"
2) Report the error
3) Continue as normal, ignoring "broken" projectors

We've found this solved the problem nicely.

As to your hypothetical situation, if you release broken code and you can't fix it for a month, then you've got some big problems, regardless of whether it's you're using projections or not.

Versioning your projectors is a great way to keep old projectors running, while booting up news ones. If the new one fails, the old one is still running in the background. This is given more detail on the library linked above.

ww • Feb 5 '18

Thanks for replying.

The thing is that we have been using all the technics you described for a while already. They play nice, indeed. But you described a happy path, even when something is broken.

What I am asking is how to deal with exact situation. I am kinda surprised, why no one ever touches this topic. Everyone writes nice code which never fails? Nah!

Barry O Sullivan • Feb 5 '18

You system deployed on the other part of the world and the next time you, as a developer, have a chance to fix it and deploy your fix will be in a month or so.
What would be your solution to this?

I'm really confused by this. If you deploy code with a bug, and you can't fix it for over a month, then it's going to be broken for a month. There's no way around this, it doesn't matter if it's event driven or not. That's why no one touches the topic.

If you're asking how you ensure the system is stable before releasing, then the answer is automated tests.

If the release process is the bottleneck, you should look into CI/CD, so you can get the fix out faster.

Otherwise you're just stuck.

ww • Feb 5 '18

None of the mentioned technics give me 100% protection(Of course you might say: there is no such thing). They are reducing the likelihood, but not completely.
When there are lots of moving parts: ui, domain, projection code, projection storage, it is really easy to miss something. For example, we have all of these: CI/CD, unit testing, integration testing, manual testing, several levels of UAT and yet, I was once in a situation deploying 4 times a day into prod, just because of those tiny missing things which were breaking the whole system. It is possible, I've seen it, I was fixing it. Luckily the users of this software were just next to. Even though they were not happy with such interruptions.

But currently we are in a difficult situation - all the users are somewhere and there is no way patching it right away.

Where were we? Right. Would you be happy as a client of a banking system, if a tiny mistake in my account blocks your account as well?
See where I am heading?