A Holistic Approach to Error Handling in, and Error Monitoring in Applications

#errors #errormonitoring #customerfirst

Gone are the days where default 400s and 500s errors are displayed:

An example of modern error page:

From a customer’s point-of-view, they get a ERROR and it's not nice regardless. Error monitoring and handling has a huge and direct impact to customer service. Using error monitoring and handling tools with the Agile software framework is great to help identify and solve errors as early as possible, minimising potential embarrassment down the road.

As a developer, I write applications and, boy, do they throw a lot of errors! This is a fact whether I like it or not - some errors are important and some are not; some are handled gracefully while others are exceptions which I have missed.

Developers might find it useful to monitor everything about these errors as the knowledge is critical for a few reasons (for example, Bugsnag is an error monitoring tool I’ll be reviewing in this article). Mainly, customers that use the applications may already be facing errors and affecting their overall experience, which in turn, can be detrimental to the business and its reputation. Secondly, applications are entry points for security vulnerabilities and errors can be a telltale sign for suspicious errors or events that are happening. Lastly, these errors are generated at run-time, which some code-scanning tools cannot detect and requires a different kind of tool to fill this gap.

Application logs can be pretty ephemeral and sometimes do not persist in the system. It can get rotated quickly to avoid storage issues. Also, another example would be when running applications using Docker, the container logs are removed when the container is removed. There is almost no way to retrieve the logs post-removal. Therefore, manually looking through these logs for errors is not feasible feat in today's context.

Here are two common methods to collecting logs, and its pros and cons

Method 1. Write application logs to its local file system, and rely on the system to forward to a log aggregator

Pros	Cons
Centralized system for logs	Slow to ingest
Log analysis can tell the full story by correlating network logs and system logs	Relies on system to do the log forwarding
-	Developers might not be able to obtain these system level logs due to access control

Example of Method 1:

Method 2. Forward application logs out directly from the application to an error monitoring server via API

Pros	Cons
Direct to API	Log analysis on application logs solely sometimes cannot tell the full story
Immediate visibility to developers	-
Control of managing these application logs by developers	-

Example of Method 2:

Luckily, these two methods are not exhaustive and definitely not mutually exclusive. An enterprise grade application would usually combine both approaches to holistically handle error monitoring as it will give the best outcome versus having only one without the other. There are many error monitoring tools in the market that fit these needs and this article does a good comparison on how the different tools fare against each other.

Here’s how easy it was to install and use "Method 2"

I accepted Bugsnag’s 3-minute installation challenge and I passed with flying colors. I registered an account with Bugsnag and created a new Javascript project. It gave me an API key and instructions to install their package.

After doing that, I pasted snippets of code from their documentations into my NodeJS app.

I fired up my application locally and it threw an error.

Bugsnag immediately captured the errors and displayed it on a dashboard.

Logs may contain sensitive data and should always be sanitised before sending over the Internet to a third party software.

Here’s what to do after setting up error monitoring

This is where the error monitoring tool provides the most value to the business. The tools has done its job of capturing errors and organizing them into categories. The rest is up to the project team to define an internal process for analyzing and handling different types of errors.

As previously mentioned, the main goal of setting up error monitoring is to improve customer experience. Thus, errors that impact customers the most can be surfaced from this tool and the project team will decide on the follow-up action, such as communicating with customer to build a closer relationship, opening a bug or feature ticket internally to fix the error, etc.

Some other errors might be security alerts that flag out on the monitoring tool, and these errors have to be communicated and handled separately with the security team. For example, some user inputs might be causing internal server errors but the packets received from the system are seen as legitimate over the network. In this case, it maybe worth an investigation and more information might be needed for security team.

Here’s what makes of a decent error monitoring tool

Intelligent

Most, if not all, the error monitoring tools are smart enough to help the user to analyze the errors. These tools are aware of the environments, priorities and categories of errors. It also keeps a history and trends the errors to surface those that are affecting the business the most.

Integration

Error monitoring tools are easy to integrate with other DevOps tools in the market such as Jira (for opening ticket), PagerDuty (for paging ops who are on standby). In addition, the large variety of integrations is definitely a plus. It can be as easy as installing a package and supplying an API key. This ability to integrate easily with other tools helps to make follow-ups easier and faster in the business workflows.

Here’s something to ponder.

While an application can feature many great services and focus on building more features, a more holistic and valuable approach includes handling and monitoring errors. This helps managers understand the good and the bad of applications, clear technical debts, as well as deliver a better customer experience. Have you given a thought about handling your errors yet?