Dave Cridland

Posted on Nov 18, 2019

Sending a Message

#messaging

How hard can it be?

Messaging is something of a niche - you can find web developers by the truckload, but when you're after someone with messaging experience, there's really very few of us around.

That's probably because messaging is so simple, right? All we need to do is take a message from one place and put it into another. How hard can that be?

A Discussion about what a Message is

Messages can be anything - a heat sensor might emit the current temperature, or we might want to send log messages, or status updates.

But it's probably easiest to consider the case of text chat, since we're probably all familiar with it.

In its simplest form, a text message can be simply the text itself, whom it's from, and (probably) some indication of where to send it. We'll start with this, then.

We could use any format to discuss this, but it's more useful to work with a concrete example, so I'm going to use the XML syntax of XMPP. XMPP is an Open Standard messaging protocol, and it's used more or less everywhere messaging becomes critically important, like the military, governments, and hospitals. Also, it's used quite heavily in games - Fortnite, for example. There are client libraries for every language, and lots of different servers to choose from too.

XMPP uses addressing based on something that looks very like an email address with an optional "resource identifier" added onto the end (which, fact-finders, I'll leave out of the examples). There are differences between email addresses and those of XMPP, mostly around Unicode support (XMPP has it) and legacy support (XMPP doesn't need to support X.400).

So here's a very simple text message in XMPP:

<message from='me@myserver.net' to='you@yourserver.net'>
  <body>Hey, this is my first message!</body>
</message>

This is an entirely legal XMPP message, with all the required metadata - all the stuff that would, in an email message, be in the headers. It's pretty small, and hopefully the XML won't put you off too much. Normal developers never have to deal with XML at all when using XMPP, any more than web developers have to deal with header parsing - but it's convenient to show, and XMPP's use of XML is relatively clean.

Loss, and Tragedy

XMPP works over TCP - a reliable connection - but there's a lot that can go wrong.

If we lose connectivity - if the WiFi goes down or the 4G signal drops - we can't easily know if the message got through before the network died entirely.

Sometimes we do - TCP gives very strong guarantees in some cases, so we know that if we send a second message and that one gets through, the first one certainly did. But the guarantees of TCP are fundamentally about ordering and corruption rather than simple loss.

Rapid network changes - as you get with a smartphone - make what used to be an edge case on desktop a nightmare on mobile. Dealing with other network types can be even worse - XMPP will operate over military radios which can only transmit or receive, not both, and take half a minute to switch modes.

While someone not getting the message above is mildly irritating, the nature of where XMPP is used means that the outcomes can be fair worse than merely irritating. If we send a message about, for example, new medication for a patient, it's of critical importance we know if it was received.

Acknowledgements

The simplest solution is for the receiver to say they got the message. We can handle this in XMPP by adding an extension. There's two we can use, either the older (and more widespread) Delivery Receipts, or the much newer Chat Markers. I'm going to discuss the latter, because it's a little more interesting from a theoretical standpoint.

First, we're going to add in the additional metadata we need. If we're going to refer to a message by saying we received one, we'll need to have a way to identify which message we're talking about. XMPP handles this by a simple id attribute:

<message from='me@myserver.net' to='you@yourserver.net' id='1'>
  <body>Hey, this is my first message!</body>
</message>

In a real implementation, we'd use a UUIDv4 or similar, but for this example we'll just use an integer counter.

Now we need to indicate to the receiver that we support chat markers. We could do this by discovery - having the receiver ask our client directly - but it's simpler in our case to include this in every message:

<message from='me@myserver.net' to='you@yourserver.net' id='1'>
  <body>Hey, this is my first message!</body>
  <markable xmlns='urn:xmpp:chat-markers:0'/>
</message>

XMPP uses namespaced elements for extensions like this. XML namespaces can get a bit wordy, so we use a URN namespace to keep them as small as possible. The good news is that you can create your own without risk of clashing, using a URL you control.

When we receive such a message, we can respond by telling the sender where we are. But thanks to the ordering that TCP gives us (and XMPP builds on), we don't have to send a response to every message - we can just respond to the last one. Previous messages are guaranteed to be delivered if a subsequent one is.

<message from='you@yourserver.net' to='me@myserver.net' id='2'>
  <received xmlns='urn:xmpp:chat-markers:0' id='1'/>
</message>

There we go. When we receive this, we can show the message has been received (or displayed) by putting a couple of ticks next to the message.

But this acknowledgement is, of course, a message (in the sense of "Messaging") as well.

And so it, too, can be lost...

A Short Interlude about Storming Cities

Imagine, for a moment, there is a fortress city, defended so well that a single army cannot hope to conquer it.

Imagine, further, that there is not one, but two armies arrayed against it - one on each side of the city.

Because the city is astride a river, and has the only bridge for miles within its walls, the general of each army can only communicate to the other by sending a messenger to sneak through the city.

Attacking individually would leave the Generals' armies defeated utterly - in order to conquer it, they must attack at the same time. So all they have to do is send a message to the other General suggesting a time, and know it got through.

But what if the message was lost? The first General would be defeated, so unless the first one knows the message got through, he will not attack. Since the second General knows this, they must send a message back, saying they got the first.

But what if this acknowledgement was lost? The second General would attack, but the first might not, thinking the message hadn't got through. The solution is, of course, to acknowledge the acknowledgement, and ... can you see where this is going?

The Two Generals' Problem is an insoluble problem in messaging. It literally declares that there is no way for both sides to agree on the current state (or, more accurately, there is no way for two parties to simultaneously know the state of the other).

So it looks as if, rather than messaging being quite simple, it's actually impossible. And impossible is quite hard.

If at first you don't succeed...

We can try addressing this by sending messages more than once; but with human messages this becomes tricky quite fast. Humans do not react well to duplicated messages, as we don't typically include the metadata required to spot them at that level.

XMPP's ordering rules really help, but they don't make things perfect (but if you don't have ordering rules at all, then things can go really very wrong in a critical messaging environment).

Besides, we can't blindly resend all the time, since we'd never know when to stop with, for example, acknowledgements. And we can't just acknowledge acknowledgements forever, either - it'd never end.

It would be useful if we could somehow fix TCP, WiFi, and all the rest so that sending a message was more reliable in the first place.

A Place Between Success and Failure

When we send a message in XMPP, we don't actually send it to the other party.

In common with most messaging systems, we send it to our server, and that takes responsibility for it and sends it onto the other party (or their server, in federated cases).

This means that there's another, hidden party we can use, and it knows literally every detail of our connection. This in turn means we can change our connection a bit to make it considerably more reliable.

First, we're going to stop considering a message as simply either "sent" or "not-sent". We're going to introduce a fuzzy state of "maybe-sent". Anytime we send a message over the TCP session, we'll place it into the "maybe-sent" state, and keep a copy.

Now we need a way to find out what state that message ends up in. We can't do this instantly, but we can eventually. So we'll just ask the server periodically how many messages it got. This won't be a message in the XMPP sense - it'd be far too silly. Instead, it'll be a new thing (defined in Stream Management):

<r xmlns='urn:xmpp:sm:3'/>

This is governed by the same ordering in the TCP session as our messages, so it's reliable in retrospect, just like the messages themselves. This means when the server receives this, it knows exactly how many messages have arrived, and can tell us:

<a xmlns='urn:xmpp:sm:3' h='1'/>

Perfect - now we know the server has received our first message, and we can remove it (and any previous messages) from the "maybe-sent" and consider it "sent". It might not have got to the other party yet, of course - but it's no longer our responsibility.

The UX for this is usually a single tick against the message - it was popularised by WhatsApp, which itself uses a private version of this same protocol developed against their weird WEP fork of XMPP.

Of course, if we walk out of WiFi and switch to 4G, this still leaves messages in the "maybe-sent" state - so they might be lost (or might not be).

Where did you come from, where did you go?

We can address this by asking the server when we reconnect. When negotiating the extension, we just say "Here's my previous session id. I got to here, what about you?"

The server then tells us where it got to and resends any messages that we missed, and we do the same. This essentially resumes the session exactly where it broke, and means we've extended the ordering and reliability rules from TCP across to a new session. Shiny.

The specification tells us how this works. We'd send something like:

<resume xmlns='urn:xmpp:sm:3'
        h='4'
        previd='some-long-sm-id'/>

And the server responds with:

<resumed xmlns='urn:xmpp:sm:3' h='1' previd='some-long-sm-id'/>

Oh, phew! So the server did get our message, even though we lost WiFi! Fantastic.

Sometimes this too fails - maybe the server gave up waiting for us to reconnect - in which case our messages can be stuck in the "maybe-sent" state. The best option we have here depends on what the message is - some messages, like Chat Markers, can be resent very safely, whereas for human messages we might choose to flag this condition to the user instead, and let them decide.

What we know we know

At this point, these two protocols are working in combination. We can be confident that when we send a message it'll (eventually) get to the recipient, and when we get a chat marker back, we know they'll have received everything up to that point.

Chat Markers also tell us the messages have been displayed, and - just like text messages - they can be resent automatically if we lose the connection.

The Two Generals Problem isn't solved in XMPP, of course - that would be impossible - but we have managed to make it a genuine edge case, even in very unstable network environments.

XMPP achieves all this by building a multi-layered approach to message reliability, combining existing features like TCP's guarantees with both low-level machine acknowledgements and high-level human ones.

The result is why XMPP is used in hospitals and battlefields - whether real or in a game.

DEV Community