Building For Interactivity: When Async Is Not Enough

#ux #api #design

In 2023 I spent a lot of time learning about real-time notifications. I learned about the concepts, the different protocols, and how to implement them to seamlessly push updates to a user interface. I won't claim to be an expert, but I definitely feel comfortable with the different ways to get a message from a server to an end user.

The message I have been consistently pushing for months is to stop building synchronous software. Instead of designing systems on request/response API calls, decouple your components and push updates to relevant parts of your system. On its own, this advice fits the bill for modern software development, especially for serverless ecosystems. But I recently started asking myself "why?" Why is this good advice?

From an end-user perspective, this is good advice because you don't want to force people to wait for things to finish - especially if they take more than a few seconds. In 2024, waiting generally results in poor user experience. Poor user experience leads to unhappy users. Unhappy users lead to nobody using your software 😬.

When focusing on a front-end async experience, your backend implementation usually ends up with a set of endpoints that start background processes and return an identifier for tracking purposes. The background process periodically pushes status updates to the user interface via a persistent connection like a WebSocket or Server-Sent Events (SSE). In case your client disconnects, you also end up with an endpoint for your processes that fetches the current status so you always can get to the data you need. A simple example would be the transcription API below.

From a simple numbers perspective, going from a synchronous request/response call to an asynchronous job increases the number of endpoints you need to build and maintain from 1 to 3 (at least). On the plus side, your users won't be sitting there waiting for a long task to finish. But is that enough reason to make the switch? It seems like a lot more code, time, effort, and complexity to keep your users from sitting and watching a spinner.

Wait time matters

According to a Neilsen Norman Group study, there are three important limits to consider when optimizing application performance.

0.1 seconds - The amount of time a person feels the system is reacting instantaneously.
1 second - Max time that users will notice a delay, but remain focused on their train of thought.
10 seconds - User short-term memory is lost and their mind starts wandering away from the task.

As wait times get closer and closer to each limit, your users lose more and more focus. Tasks longer than 10 seconds generally will break a user's flow, often leading to site abandonment instead of starting the task over from scratch.

That said, making your users simply not look at a spinner while your async job runs is hardly any better. If you bring them back to a dashboard or navigate your users to a screen that's waiting to be updated, they're still waiting. They might not know that something is happening. They might even think that something went wrong if you aren't keeping them updated.

Sure, it used to be okay to navigate someone back home while you did something in the background. But that's not the case anymore. Let's look at a real-life example.

The evolution of messaging

Messaging is the perfect example of async communication. You send a message to a recipient, carry on with your day, and eventually get a response back. You don't have to drop everything and wait for a response, you're free to do whatever you want.

In the United States, mail delivery began in 1775 and was done primarily on horseback. If you wanted to send your friend a message, you'd have to wait anywhere from weeks to months to get it delivered. And that was only one way!

Any async process can be "improved" by lowering the total time it takes from start to finish. And that's exactly what happened with mail delivery. Over the next couple of centuries, delivery time was reduced by changing the method of transportation. Horses turned into stagecoaches. Stagecoaches turned into railroads. Railroads turned into automobiles. Automobiles turned into airmail. Fast forward a bit and now we have messages being delivered instantly over the internet.

Once we reached internet speeds, we peaked on delivery time. You can deliver a text message or DM to anyone in the world in milliseconds. But it's still not good enough.

We got used to instant delivery of messages real fast. But we always want more. If I send a text message, I want to know how long I'm going to wait until whoever I'm texting is going to message me back. If I know I have the person active on the other end, I'll stop what I'm doing and focus on the conversation a bit more. If I know they aren't going to message me back for hours, I'll put my phone away and do something else.

So how did we continue to improve? Well, we now have a brand new set of indicators that tell us how our async process is doing. In many apps, presence indicators tell me if the person I'm messaging actively has the app open. When I send a message, I not only get a delivery receipt, but a read receipt. This means I know if my message made it to the other person's device and if they've seen it or not. On top of all that, I can also see a typing indicator that tells me the other person is actively responding.

I'm not telling you anything you don't already know. This has been around for a while and is something we've grown to expect. But what you might not have pieced together between daily text messages and the software you're building is that you need a level of interactivity.

Interactivity is the key

Modern messaging at its core is still async communication. The primary function is delivering a message from a sender to a receiver. But we've maxed out delivery speed and people are still impatient.

This is because async processes are essentially black boxes to consumers. As a builder, you need to do your best to turn them into "gray boxes". Meaning you must provide as much information as you can as often as you can. Remember, if you don't keep a user engaged in less than 10 seconds, you're going to lose them.

Every time you interact with a user, your timer resets.

Let's imagine it takes 20 seconds to get a response back from someone you're actively texting. 20 seconds is double the time our short-term memory will usually allow for any given wait period. What do we do?

When you hit the send button, your attention timer starts. It's up to you to provide enough interactivity to string the user along for the full duration. Let's take a look at the interactions that occur between hitting send and receiving a message back 👇.

Added up, we have a total of 20 seconds, but none of the individual steps take longer than 9 seconds, so we're in the clear as long as we send interactive updates at the right times. The updates show that something is happening and you can continue to draw out the user's attention span.

What if that's not an option?

An ideal scenario would have a bunch of events that indicate progress is being made. Unfortunately, not all use cases are ideal. Some workflows simply take a long time without any measurable progress indicators. For others, the status updates might take minutes to progress, leaving you out of luck keeping your users engaged.

You might be thinking about providing a timed progress bar. If your workflow takes an average of 4 minutes to complete, then set a timer in your user interface for 4 minutes and 15 seconds and increment the progress every few seconds. Let me ask you a question: do you like looking at progress bars?

If you need your users to wait for 30+ seconds but don't have any way to give meaningful interactivity, distract them in other ways that keeps their short-term memory engaged.

A fantastic case study in this approach is Matchbooks. Matchbooks is an app that allows children to use generative AI to safely create their own stories. Kids will describe the main character of their story and provide the details of their storyline. Once they add all the inputs, the app creates wonderfully illustrated books and stories matching exactly what was described. This process takes a few minutes to complete, and for those of you who don't know, children's attention span is way shorter than that.

Rather than providing status updates of the story generation, the app routes kids to a pictionary game for them to play. The kids play quick, 10-second games where they are asked to figure out what animal is being drawn on screen. Once they say the correct animal out loud, it immediately progresses to a new one. This process repeats until the story finishes generating in the background.

Once the story is done, a "Continue" button lights up and the kids resume their journey with their book.

With this approach, the kids are fully distracted and engaged while a minutes-long process is going on as part of a workflow. The kids stay happy because they are entertained and when they're ready, they can continue to the story they created. The same concept applies to other domains as well, just keep it relevant to your user.

How to start

Interactivity takes time. It's way more than setting up WebSockets and publishing a few messages. It's about context. It's about relaying the right information to reset your users' internal timers. To figure out what the right information is, my best advice to you is to act like a user. Use your own software and ask yourself "what should I be doing while this is going on" or "what extra information could be helpful here?" See if you can come up with statuses or metadata that can be presented back to your users.

When you do figure out what you want to surface to users, building it is your obvious next step. Once again, the solution is generally a bit more involved than simply setting up WebSockets in your app. Based on your security, compliance, and specific interactivity needs, the type of notification mechanism can vary greatly. Spend some time researching and figuring out which one works best for you.

Personally, I lean on Momento Topics for my interactivity build-outs. They enable browser-to-browser, browser-to-backend, and backend-to-backend connectivity without creating any infrastructure. I've built fun, interactive, reaction apps for my presentations in just a couple of hours and recently built a chatbot with only two Lambda functions and a function url. It drastically speeds up development time while offering a low-latency, fanout event bus available to all parts of a distributed application.

But this is not a sales pitch for Momento Topics. This is about building better software. Software that not only frees your users from wait times but keeps them entertained as well. Find what matters and delight your users. Implementation details are up to you, they always are.

So give it a shot! Dive in and figure out what you can do to keep your interactions frequent and meaningful. Don't just send users back to a landing page. Keep them hooked and remember to think outside of the box, that's how the tech industry innovated messaging when we couldn't get any faster.

Happy coding!