DEV Community

Dave Cridland
Dave Cridland

Posted on

Two Generals

Imagine, for a moment, that you're sitting in a dark room. You're not alone. Sitting beside you is someone else, a shadowy and frankly untrustworthy person that we'll call Network. On the far side of Network is your friend. You have a bag of peppermints. Your friend has a bag of chocolate drops. You want to swap some peppermints for some chocolate drops. And for some reason I haven't decided on, it all has to be done in silence.

In order to pass a peppermint to your friend, you need to pass it to Network, who will - hopefully - pass it to your friend. Your friend then passes a chocolate drop back to Network, who - again, hopefully - passes on back. You then pass another peppermint, and so on.

Because Network might take ages to pass things on, and might just eat them instead, there's always quite a bit of uncertainty involved. After you have passed you first peppermint, you're not going to know if your friend has any peppermints unless and until you get a chocolate drop back.

Similarly, your friend has no idea if you've got any chocolate drops unless and until they've got another peppermint.

More generally, you know your state at any given moment, and your friend knows their state, but you never know each others.

This outstandingly simple problem is also delightfully insoluble. In the real world, most messages protocols suffer from it, one way or the other. Take email -
what if you never reply to an email I sent you. Did the email not reach you, or did the reply fail on the way back?

Error messages don't get immunity either - in email, bounces can be lost just as easily as the messages they're reporting on.

The problem is known as the Two General's Problem after an allegory on the subject, and is really about state replication rather than just messages. Or chocolate drops, either, although chocolate is yummy.

It turns out it is impossible to reliably replicate state between two entities across a potentially unreliable network, and it turns out the network is always unreliable. So are the entities actually, although somewhat less so.

Most messaging protocols, though, are built on TCP, which adds a few tricks to reduce the problem. In simple - and I mean very simple - terms, TCP numbers every message passing through it, and these are strictly ordered. If I send you a message "1", then send a message "2", and you only get "2", you know "1" has gone missing somewhere, and you can ask for it (and not process "2" until you've got "1").

Similarly, if you get both and tell me you're up to "2", I know you got both of them.

Of course, I can lose your "ACK" saying what you're up to, and you could never get the missing "1", so this isn't solving the Two Generals completely - but it does mean that at any given moment, I have a reliable view of what your state used to be, and you have some idea of what it's meant to be. To put it another way, it reduces the potential number of states quite drastically - it means that state transitions (and messages, therefore) are strictly ordered.

Application protocols use this and build on it. SMTP, in email, mandates that email acknowledgements happen after not after merely receiving the email, but having stored it to disk or forwarded it on.

But once a TCP session is dropped, it's often very hard to figure out how far things got before it did. And TCP sessions drop for any number of reasons - for most people, it's when you walk into a building with your mobile phone and it switches to WiFi. For others, it might be when their satellite loses Line Of Sight.

So XMPP takes this a little further.

A protocol originally designed to handle casual instant messaging, XMPP is used for some highly mission-critical tasks these days, and protocol designers wanted to make it as reliable as possible. Enter XEP-0198.

XEP-0198 builds TCP's counting trick out into the application protocol. While a TCP session is up, then, both ends can acknowledge each message (actually, any "stanza") going across TCP and confidently know which have been processed, just by using a simple counter (since TCP is doing the hard work for us).

The magic happens when the TCP session breaks. XMPP clients can reconnect, and tell the server how far they'd got - this reestablishes the ordering, and both ends can resend any lost traffic, continuing from before. No traffic can be lost if this "session resumption" occurs successfully, and it also saves a lot of traffic a more traditional reconnect would encounter.

Nothing can fully claim to "solve" the Two Generals Problem, of course - but XMPP, using XEP-0198, demonstrates that the problem can be reduced to the point where it only affects long network outages.

Top comments (0)