Make your web application graciously survive a backend general failure.

#failure #frontenddev #webdev #jamstack

(This is not a coding post)

Software engineers wishful thinking is: “My application will never fail” but in the meantime they prepare for the worst. All engineers will try to make their application resilient, good engineers know their application will eventually fail and prepare for that.

Frontend deployment has evolved a lot in the last 5 years. JAMstack (https://jamstack.org/) is gaining adoption all over the board. For marketing websites, e-commerce, enterprise and consumer applications. Main benefits include: better performance, security, cheaper and easier scaling and improved developer experience. The use of edge CDN gives this web applications some characteristics of mobile applications, For example no need for servers. Clients (browsers) when accessing the application most of the time get static cached assets that are served on the edge by CDNs which makes starting time really quick. Deploy cycles involves busting the caches of the CDN so a new version gets served to the user.

In this scheme backend API services are called to perform all operations, querying data, processing input data (Forms), workflows, etc. But, what happens when the Backend API is down? Our users still have the application interface but the backend is not responding. In this cases most applications have ways to let know users something is not working as expected. (same way mobile application does)

Surviving a backend failure at starting time?

Frontend applications that are served as static assets are cached on the user’s browser and in the edge CDN. So there is no actual request to company’s infrastructure to retrieve them unless there is an update and the caches are bust, so next request will get redirected to origin so the new assets can be cached.

Since Frontend application delivery is separated from the company’s infrastructure there can be the case when a Frontend Application is running but the infrastructure is not. (Backend API failure or more drastic general failure)

In this case everything the user tries to do on the application will irremediably fail. If the backend API is not responding and the company anticipates the resolution will take some time a more specialized solution is employed.

What applications do while downtime?

Company can use feature flags integration to activate a Maintenance Page (LaunchDarkly, Rollup)
Maintenance Pages supports User communication directly to support services team via communication integrations, (Intercom, HelpCruch, LiveChat, etc.)
Log errors the user is experiencing (Sentry, NewRelic)

This are common, sometimes obvious solutions client applications use to make the general application failure less stressful to the end user.

Ex: While company’s infrastructure is down a user access the client application and is greeted by a maintenance page stating that the company is going through some issues, their are working to solve it and in the mean time customer support can address their concerns via a live chat bubble in the maintenance page directly. All access logs and page metrics are still being logged as usual.

This example describes a better experience: Certainly better than a cryptic error page, a faulty application that does nothing or simple or an empty white page. Depending on your company’s business this could represent an irremediable lost of clients, trust, resources.

What else?

Depending on the business type some other solutions are possible to implement that further improves customer satisfaction and displays attention to detail and service.

A low volume-high cost sales service could provide, if possible, an offline sales experience with higher human touch. (Insurance is a good example, while backend systems are down clients could still submit an insurance application form via email or live chat that will be processed later). This is, instead of using the regular sales process, a more simplified sales formulary is displayed and data gathered is sent via live chat integration. Customer support will follow up with client once systems are back online. (this is possible since this formulary won’t have backend API dependencies)
Showcase of their products or guides on how to use them. Act as marketing website in the meantime.
A resource management system can provide a way to let the clients create and locally (browser) store simplified versions of their resources, once the system gets back online those locally stored resources will get converted to regular resources and persisted on the central database. Example: A task management application provides a formulary to create a Task, stored locally in the browser, deferring its storage and processing for the time the system gets back online. The user still can perform some work that will be completed once the system is back online.

Conclusions

While joyful coding your application, prepare for the worst.
Evaluate how your application might fail and provide a better experience if that happens
Leverage third party services that could improve end users experience in failure time. Especially around communication.

Happy hacking…