DEV Community: Victoria Johnston

AI Log #4: Vectorization & NumPy in Machine Learning

Victoria Johnston — Wed, 06 Dec 2023 10:40:49 +0000

I am an experienced software engineer diving into AI and machine learning. Are you also learning/interested in learning?
Learn with me! I’m sharing my learning logs along the way.

Vectorization is a pivotal concept in machine learning; it enables us to handle multi-feature data with efficiency and speed. This learning log delves into vectorization, with a focus on NumPy's role in optimizing operations.

NumPy

NumPy is an essential Python library in machine learning known for its powerful handling of arrays and matrices. It's the go-to tool for vectorized operations.

Why use vectorization?

Vectorization optimizes data processing by handling entire arrays or datasets simultaneously, rather than element-by-element, as in traditional loops. This approach is especially powerful in NumPy due to its ability to leverage the parallel hardware capabilities of computers. Operations like the dot product are executed in parallel, markedly enhancing code efficiency and execution speed compared to sequential for loops.

The advantage of vectorization becomes even more pronounced in large datasets and complex operations, such as during the gradient descent process in machine learning, where it allows for the simultaneous computation of new feature values.

Vectorization also helps simplify our code by reducing complexity. The simple NumPy commands below will show what I mean by this.

tl;dr - if we want our data processing to run faster and more efficiently, we need to use vectorization!

How to use vectorization?

In machine learning, models usually have multiple features. These features are often represented in vector format. Consider a model with four features. In such a case, the example data, xi, consists of input features represented as a vector: [*xi*1, *xi*2, *xi*3, *xi*4]. Each of these features is associated with a corresponding weight ([w1, w2, w3, w4]), determined during the model's training phase.

To use vectorization effectively, we want to:

Use NumPy arrays to represent these vectors: Vectorization is implemented using NumPy arrays, which enable efficient storage and manipulation of data.
Avoid for loops: Traditional for loops process data element by element and are generally slower. Vectorization replaces these loops with array operations, significantly enhancing computational speed.
Leverage dot product and other NumPy functions: The dot product is a common vectorized operation in machine learning. NumPy's various functions, such as np.dot, np.sum, and np.mean, are designed to operate on whole arrays, making them ideal for vectorized computations.

Key Concepts in NumPy

Dimensionality: This refers to the number of indices required to select an element from an array. A one-dimensional array needs one index, a two-dimensional array requires two, and so forth.
Shape: This attribute describes an array's dimensions. A 2x3 2D array has a shape of (2,3), while a one-dimensional array (vector) might have a shape of (n, ) for 'n' elements. Individual elements, not being arrays, have a shape of ().
Multidimensional Arrays: NumPy arrays can extend beyond two dimensions, adding complexity and flexibility to the data structure and its manipulation.

Common Vectorized NumPy Operations

Creating Arrays

np.zeros(num): Generates an array of 'num' zeros, defaulting to float64. For example, np.zeros(3) yields an array shaped (3, ).
np.random.random_sample(num): Produces an array of random float64 numbers drawn from a uniform distribution over [0, 1). The argument shape should be a tuple that defines the dimensions of the output array. For example calling random_sample((2, 3)) creates a 2x3 array.
np.random.rand(num): Also generates random float64 numbers from a uniform distribution over [0, 1). Unlike random_sample, rand accepts multiple arguments, each representing a dimension of the desired output array. For instance, rand(2, 3) directly creates a 2x3 array. It offers a more intuitive way to specify the shape of multidimensional arrays.
np.arange(num): Creates an array filled with numbers from 1 up to num-1.
np.array([...]): Creates an array with specified values.

Indexing Arrays

a[0]: Accesses the first element of the array. This operation is consistent with standard coding practices. Attempting to access an index out of range will result in an error.
a[-1]: Accesses the last element of the array.

Slicing Arrays

a[:3]: Retrieves elements at indices up to (but not including) index 3.
a[3:]: Selects elements from index 3 to the end of the array.
a[:]: Accesses all elements within the array.
a[start:stop:step]: Provides a more flexible slicing option, allowing the selection of elements over a specified range and step.

Single Array Operations

-a: Negates each element in the array.
np.sum(a): Calculates the sum of all elements in the array.
np.mean(a): Computes the average value of the elements in the array.
a2**: Squares each element in the array.

Element-Wise Array Operations

np.array([-1, 1]) + np.array([1, -1]): Results in an array [0, 0]. These operations are applied element-wise.
np.array([1, 2]) * 2: multiplies each element of the array by 2.

Dot Product

np.dot(a1, a2): Multiplies vectors element-wise and then sums the results, a fundamental operation in many machine learning algorithms. For example, using np.dot(a1, a2) where a1 = [1, 2] and a2 = [3, 4] would compute (1*3) + (2*4).

Matrices

Matrices, or two-dimensional arrays, are integral in machine learning, and NumPy provides a comprehensive set of functions for their creation and manipulation.

Matrix Creation

np.zeros((m, n)): Generates a matrix of zeros with m rows and n columns.
np.random.random_sample((m, n)): Creates a matrix filled with random numbers.
np.array([[1], [2], [3]]): Creates a matrix with specified values.

Reshaping Matrices

X.reshape(2, 3): Changes the shape of the matrix to 2 rows and 3 columns.
X.reshape(-1, 1): Reshapes the matrix into a column vector.

Slicing Matrices

X[r, start:stop:step]: Accesses elements in row r within a specified range and step.
X[:, start:stop:step]: Retrieves elements across all rows within a specified range.
X[:, :]: Selects all elements in the matrix.
X[1, :]: Accesses all elements in row 1.

Advanced Matrix Operations

np.c_[...]: Concatenates arrays along their column boundaries.
np.ptp(arr, axis=0): Calculates the peak-to-peak (maximum - minimum) range of elements column-wise.

These indexing, slicing, and operation techniques in NumPy enable efficient handling and manipulation of data in machine learning, demonstrating the practical benefits of vectorization.

I only showed a few examples. NumPy is an incredibly extensive library, and its mathematical functions go far beyond simple array operations!

Summary

Vectorization, particularly when utilizing NumPy, is an essential concept in machine learning. It dramatically streamlines and accelerates computations, making processing large datasets and executing complex operations much more efficient.

By leveraging NumPy's array-centric design and functions, we can perform bulk operations on data without the need for slow, iterative loops. This approach not only speeds up the execution but also makes the code more readable and concise.

Disclosure

I am taking Andrew Ng’s Machine Learning Specialization, and these learning logs contain some of what I learned from it. It’s a great course. I highly recommend it!

AI Log #3: What is Gradient Descent?

Victoria Johnston — Wed, 29 Nov 2023 14:06:26 +0000

I am an experienced software engineer diving into AI and machine learning. Are you also learning/interested in learning?
Learn with me! I’m sharing my learning logs along the way.

Disclosure: I have already learnt basic machine learning, but I am starting this learning log from the beginning because I need a refresher 😅.

Cost Function Recap

In my previous learning log, I covered model parameters and cost functions. Here is a quick summary:

Parameters of a Model: Variables that can be altered during training to enhance the model. For example, in the model y = wx + b, w and b are parameters, with one input feature, x.
Cost Function, J: Indicates the accuracy of a model’s predictions against example data. A smaller J implies closer predicted values (ŷ) to actual values (y), signifying better parameter choices and an improved model.
J Calculation: J is computed across the example dataset and varies with the choice of parameter values. Thus, in a model with parameters w and b, the cost function is represented as J(w, b).

Gradient Descent

When training a model, our goal is to discover a function that best fits our example data. We achieve this by adjusting parameter values to obtain the lowest possible cost, J.

Let’s first consider the most obvious way to find optimal parameter values.

We could test every possible parameter value. At first glance, this solution may seem to work; however, there are two key pitfalls. Firstly, it is impossible to test all possible parameter values thoroughly as parameters are real numbers; they have infinite range and decimal depth! Secondly, it would be incredibly inefficient and time-consuming.

We need a more systematic and innovative solution.

Enter gradient descent. This algorithm involves iteratively testing different parameters and calculating the cost at each step. A ‘step’ is synonymous with iteration — each represents a move closer to the optimal parameter values. Gradient descent is similar to our naive solution in that it requires trying out different parameter values and many steps. The critical difference lies in choosing the next set of parameter values. In gradient descent, we choose the next values based on gradient calculations, leading to a much more directed and efficient process.

Here is how it works on a high level:

Starting Point: We begin with specific parameter values, usually chosen randomly or through an informed guess.
Direction of Descent: We determine the direction to move by calculating the gradient. Updating Parameters: We adjust the parameter values to reduce J using the gradient.
Repetition: This process continues until we meet certain end conditions. Each repetition is a step.

Here is what gradient descent may look like on a cost graph for a model with one parameter. In this diagram, we use gradient descent to take steps that get us closer to the minimum.

Gradient descent can handle any number of parameters (it just becomes increasingly more complex to visualise!). In general, as long as we have a cost function, we can use gradient descent to find values for parameters that minimise the cost, J.

Mechanics of Gradient Descent

To truly grasp gradient descent, let’s delve into the details.

Remember back to high school maths… What does the gradient help us calculate? The slope! The gradient in this context helps us determine the slope of the cost function. Knowing this slope enables us to identify the direction of the steepest descent.

This concept made more sense when I tried to visualise it. Say I plot a graph with the cost J at different parameter values. Here is an example of how a cost graph with two parameters might look like:

Imagine standing on this slope and looking for the quickest way down. Which direction should I move? Down the slope! Ideally, I would move in the steepest direction from my current position. This concept is the essence of gradient descent!

A critical aspect of this process is the size of the step we take down the slope. This is influenced by a crucial factor known as the learning rate, α. This rate is a positive number dictating the magnitude of each parameter change.

Gradient Descent Math

Now it’s time for some math.

Let’s introduce the mathematical formulae central to gradient descent. For each parameter, we calculate a new value using its gradient. The learning rate, α, plays a vital role here, determining the size of the step we take toward the steepest descent.

The formula for updating each parameter is unique, ensuring specific and effective adjustments. For instance, the updates for parameters w1, w2, etc., in a multi-parameter model are as follows:

Why do we use the partial derivative?

In gradient descent, using partial derivatives instead of full derivatives is a deliberate choice. This is because, in multi-parameter models, each parameter uniquely influences the model’s output. The partial derivative hones in on the impact of a single parameter change, keeping all others constant. For example, adjusting w1 affects the cost J, independent of changes in w2 or b.

Consequently, the gradient descent update for each parameter is tailored with its own partial derivative, allowing for precise, individualised adjustments:

Learning Rate

As mentioned, the learning rate, α, is a pivotal element that dictates the step size in our journey towards the cost function’s minimum. Thus, picking a reasonable learning rate is critical. Here is how choosing a bad learning rate can negatively affect gradient descent and, in some cases, stop it from working at all:

Overshooting with Large Steps: A high α makes our steps too large. We risk stepping past the minimum and onto another slope. We then may continue to bounce from slope to slope without settling down on the same slope. Here is what overshooting may look like:

Usually, we can tell if this is happening if the gradient fluctuates back and forth (e.g. between positive and negative) in each iteration. Sometimes, we can get lucky and move closer to the optimal minimum.

Divergence: In extreme cases, these oversized steps don’t just result in bouncing but escalate, moving us further from the minimum and causing the algorithm to diverge entirely:

Inefficient with Tiny Steps: A tiny learning rate can significantly slow the journey, resulting in minuscule progress towards the minimum. Gradient descent will still work, but it will be highly inefficient.

Tips for Fine-Tuning the Learning Rate

To optimise the learning rate:

Experiment with increments (e.g., 0.001, 0.01, 0.1), seeking a rate that consistently reduces the cost.
Start with a very small alpha to ensure consistent cost reduction, then gradually increase.
Aim for the largest learning rate that still guarantees consistent and rapid cost reduction.

When to Stop

Determining the right moment to stop gradient descent is a strategic decision. We can choose different end conditions; here are a few examples:

Convergence: We stop when parameter changes are negligible, suggesting proximity to the minimum.
Fixed Number of Steps: We set a predetermining number of iterations for practical reasons, especially when computational resources are a constraint.
Threshold-Based: We halt when the decrease in cost J falls below a set threshold, indicating diminishing returns on further iterations.

Summary

Gradient descent optimises model parameters by iteratively adjusting them based on gradient calculations, aiming to minimise the cost function, J.
The algorithm involves determining the steepest descent direction using partial derivatives and adjusting each parameter with a calculated step size influenced by the learning rate (α). The process repeats until it meets specific end conditions, like convergence or a fixed number of iterations.
The choice of learning rate is crucial as it influences the efficiency of reaching the minimum cost.

Gradient descent is a highly versatile algorithm applied in a wide array of machine learning methods, including complex models like deep learning.

Disclosure

I am taking Andrew Ng’s Machine Learning Specialization, and these learning logs contain some of what I learned from it. It’s a great course. I highly recommend it!

AI Log #2: What is a Cost Function in Machine Learning?

Victoria Johnston — Fri, 17 Nov 2023 07:41:35 +0000

I am an experienced software engineer diving into AI and machine learning. Are you also learning/interested in learning?
Learn with me! I’m sharing my learning logs along the way.

Disclosure: I have already learnt basic machine learning, but I am starting this learning log from the beginning because I need a refresher 😅.

Log #2: Cost Functions in Machine Learning

Cost functions indicate how well a machine learning model performs against a set of example data.

The output of a cost function is referred to as ‘the cost’ or ‘J’. A small cost suggests that the model performs well, which means that the difference between the model’s predicted value and the actual value is generally small. In contrast, a high cost suggests the opposite and that the difference between the predicted and actual values is considerable.

Different cost functions can exist for various machine-learning methods, even within a single machine-learning method! For instance, the cost function used on a regression model may differ from that on a classification model or neural network. We also have different cost functions for regression models, for example, mean error (ME), mean absolute error (MAE), mean squared error (MSE), etc.

Cost Functions in Model Training

Cost functions are essential for training a model. During training, we use them to determine the weights and biases for our model’s function.

Say I have a simple regression problem (covered in my last learning log). I have a series of example data (training data), each with one input feature (x) and the actual output (y).

Here is the function for the simple regression model:
f(x) = ŷ = wx + b

We refer to ‘w’ and ‘b’ as the parameters. Parameters of a model are variables that can be changed during training to improve the model.

During training, I want to find values for these parameters that produce a predicted value (ŷ) as close to the actual value (y) across all my example data.

We refer to the difference between ŷ (predicted) and y (actual) as the error. A cost function is a computed error across the set of example data; it factors each example data’s error in its calculation. In other words, it is a computed ‘collective’ error across the group of example data, not the error for a single example data!

Thus, when I write this above: “a predicted value (ŷ) as close to the actual value (y) across all my example data.”

I also mean “the lowest computed cost across all my example data.”

or “the lowest cost function across all my example data.”

We achieve this by experimenting with different parameters and calculating the cost at each step until we find values that yield the lowest cost.

The cost function is so important; without it, we would not determine how good specific parameters are and thus would be unable to choose optimal parameter values for our model. Our model is as good as the parameter values we choose for it.

Regression Cost Functions

I will only showcase some cost functions in this learning log and cover the rest in detail in future logs. At this stage, it is more important to understand why cost functions are essential and how we use them.

As I mentioned already, there are different types of cost functions. Here are some regression cost functions:

The most straightforward cost function is called mean error (ME). It works exactly how it sounds: the mean error across the set of example data. However, it is generally not recommended because the errors can be positive or negative, so adding them together may not reflect the collective error effectively.

An alternative cost function is the mean absolute error (MAE), where we can calculate the mean of the absolute values of the errors instead.

One of the most commonly used regression cost functions is the mean squared error (MSE). It is calculated by summing all the squares of the errors and dividing by the number of examples times 2, a.k.a calculating the mean of the squared errors and dividing the whole thing by 2. Note: some variations of MSE do not divide by 2, but including it makes the downstream derivation calculus cleaner.

Here is what the MSE cost function looks like in mathematical notation:

In some versions of the function, the output is the cost, ‘J’.

Visualising Cost

We can see the cost of a simple regression model by observing the difference between ŷ (value of y on the line) and y.

As we learned in the last learning log, we want the line to fit the example data as well as possible. Intuitively, the line of best fit would collectively have the most minor errors across the example data.

Cost Graph

We use another graph, the cost graph, to visualise the cost across different parameter values. For simplicity, let’s say that our model only depends on one parameter, θ. We can plot a graph with θ on the x-axis and the cost, J, on the y-axis.

Now, we can visualise the cost for all values of θ. The best value of θ would be the one that yields the lowest cost — the value of θ at the bottom of the U shape. The further we move from the optimal value of θ, the higher the cost and the worse the model.

As a general rule, the cost function graph for a regression problem will be U/bow/hammock-shaped. When there is only one parameter, we can visualise it in two dimensions, where it will look like a U shape. If there are two parameters, we can visualise it in three dimensions, and it will look more like a hammock.

The same trend follows as we add more parameters, but visualisation becomes increasingly tricky.

Summary

Cost functions are critical in machine learning.
We use cost functions when training a model to determine how good the model is.
During training, we continuously calculate the cost with different parameter values to decide which parameter values yield the best model.
Different machine learning methods use different cost functions.

Disclosure

I am taking Andrew Ng’s Machine Learning Specialization, and these learning logs contain some of what I learned from it. It’s a great course. I highly recommend it!

AI Log #1: Machine Learning To Linear Regression

Victoria Johnston — Mon, 13 Nov 2023 09:57:17 +0000

I am an experienced software engineer diving into AI and machine learning. Are you also learning/interested in learning? Learn with me! I’m sharing my learning logs along the way.

Disclosure: I have already learnt basic machine learning, but I am starting this learning log from the beginning because I need a refresher 😅.

Log 1: Linear Regression

My learning journey starts with Linear Regression because it is a fundamental building block for understanding machine learning.

However, before diving into linear regression, I will jog my memory on machine learning, supervised learning, and regression models.

Machine Learning

This definition was the one I found most helpful:

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data — Bing

The way I understand it:

Machine Learning is when we use machines to determine an outcome without explicitly coding the logic that determines that outcome. Instead, we determine the outcome based on example data plus fancy math (algorithms and statistical models).

Here is an example I will use throughout this learning log.

Say I want to predict the amount of snow (centimetres) based on temperatures (degrees) below freezing.

Non-machine learning way - I use some hard-coded logic: ‘The amount of snow will be the absolute number of the temperature. So if it’s -1°C, we get 1 cm of snow.’

The non-machine learning method can, at times, be suitable. However, a clear set of rules cannot define all situations! Sometimes, finding patterns in data and using algorithms and statistical models is more appropriate, and this method would require machine learning. Unlike traditional programming, where we code all possible conditions and outcomes, machine learning algorithms learn to make decisions from data.

Interestingly, the machine learning method is the solution that I think of instinctively if someone asks me how I would predict the amount of snow based on temperature.

Machine Learning Lingo

Input Features (x): The data we feed in, like temperature in degrees.
Outcome Variable (y): What we’re predicting, like snowfall in centimetres.
Predicted Outcome (ŷ): The outcome our model predicts.
Model (f): The math that transforms ‘x’ into ‘ŷ’.
Example Data: The historical data we use for training.
We use ‘m’ to denote the amount of example data we have.
We refer to a specific example data as x_i_ or y_i_, where i is the example data’s index (or row).

All together now: f(x) = ŷ

I want to make something very clear:

f(x_i_) will not necessarily be y_i_ because the statistical model we determine may not predict every example perfectly; in fact, this is unlikely to happen as we would need a model that fits all the points perfectly!

For this reason, we have the symbol ŷ as the output of f(x); it is the predicted value and may differ from the actual value!

Supervised Learning

Machine learning has two primary subcategories: supervised and unsupervised learning.

Supervised learning is when the type of our desired outcome is known, a.k.a. we tell the model what we want it to predict.

We tell the model, ‘Here — given these inputs, I want you to predict this outcome type.’

The example data we train with must have input features and outcome variables.

The snow/temperature example I used above is an example of supervised learning. In our example: ‘Here — given negative temperature in degrees, I want you to predict the centimetres of snow.’

I will cover unsupervised learning in future diary entries, but out of curiosity, it is when we don’t tell the model the outcome we want. We give it data and ask it to find something interesting! The model finds patterns, and we can hopefully use the patterns to help us solve our problem.

Regression

Regression is a method of supervised learning that predicts a continuous numeric value. A continuous numeric value is any real value, like an integer or floating point number.

For example, the number of centimetres of snow that the model can predict is 3.00cm.
It can also be 2.33212cm.
It can also be 1.2393cm.
And so on.

I am using my snow/temperature example above because it, too, can be a regression problem. The outcome we want? The number of centimetres of snow.

Linear Regression

Linear regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range. — ML cheatsheet

How I make sense of it:

It falls under a regression method because its output is continuous.
It is ‘linear’ because its model relies on a mathematical equation with a constant slope; it models a linear relationship between input and output.

There are two main types: simple linear regression and multi-variable linear regression.

Simple linear regression.

It is also known as univariable linear regression.
Remember this straight-line equation from high school math that we used on 2D graphs (only with an x and y axis)?

y = mx + c

x is the value on the x-axis, y is the value on the y-axis, m is the gradient (the slope), c is the intersect

The equation for simple linear regression in machine learning is the same, but the gradient is ‘w’ for weight, and the intersect is ‘b’ for bias.

f(x) = ŷ = wx + b

The model function definition sometimes includes ‘w’ and ‘b’.

f_wb_(x) = wx + b

A simple linear regression equation could be
f(x) = -1/2x
If the temperature is -2 degrees, the predicted snow amount would be -1/2 * 2 = 1 cm.
Another simple linear regression equation could be
f(x) = -1/2x + 1
If the temperature is -2 degrees, then the predicted amount of snow would be (-1/2 * 2) + 1 = 2 cm

We are using math to predict the outcome! The logic is not explicitly declared or hard-coded. We use past examples to help us determine the equation we use to do it, and the equation’s weight and bias are likely to keep changing as we add more examples.

Multi-variable linear regression.

Multi-variable linear regression is a linear regression that relies on multiple variables. Very often, our models will depend on more than one variable! Regarding our snow/temperature example, we will likely want to use more than temperature to determine the amount of snowfall. We could also use the altitude, humidity, etc.

Multi-variable linear regression employs the same technique as linear regression but with a continuous slope across multiple dimensions.

Here is what its equation looks like:

f(x_1_, x_2_) = ŷ = w_1_x_1_ + w_2_x_2_ + b

Calculate the weights and bias.

Instinctively, for our linear regression equation to be most effective, we would want it to fit our example data as closely as possible. We want to find values for our weights and biases that minimise the error between ŷ and the actual value of y across all our example data.

We use a ‘cost function’ to quantify the difference across all example data; thus, we want to find the weights and biases that produce the minimal cost function output.

We can then use a method known as ‘gradient descent’ to determine which weights and biases produce the minimal cost function output. In future learning logs, I will cover cost functions and gradient descent in more detail.

Summary

Machine Learning = machines help us determine an outcome without explicitly coding the logic that determines that outcome.
Supervised learning = subcategory of machine learning where we know the outcome type we want.
Regression algorithm = A method of supervised learning where we predict a numeric outcome.
Linear regression = A type of regression algorithm where the predicted output is continuous and has a constant slope.

Disclosure

I am taking Andrew Ng’s Machine Learning Specialization, and these learning logs contain some of what I learned from it. It’s a great course. I highly recommend it!

Mastering Express.js Error Handling: From Custom Errors to Enhanced Error Responses

Victoria Johnston — Fri, 18 Aug 2023 07:33:12 +0000

Introduction

Hello again, it’s Victoria here 👋

Let’s be real. In software development, errors are a given! It’s not about avoiding errors, but rather how you manage them that truly separates a well-constructed piece of software from the rest.

Today, we’re going to delve into the world of error handling within Express.js, a versatile and minimalist web application framework for Node.js. We’ll go from understanding how Express.js processes errors, to creating and applying custom errors, all while enhancing the quality of your code!

I. Understanding Error Handling in Express.js

Express.js comes equipped with built-in error handling mechanisms. However, to leverage these effectively, it’s crucial to comprehend how errors propagate within Express.

Consider a classic Express.js middleware:

app.get('/users', async (req, res, next) => { 
  try { 
    const users = awaitgetUsersFromDatabase();
    res.json(users); 
  } catch (error) { 
    next(error); 
  } 
});

In this scenario, if getUsersFromDatabase() throws an error, the catch block captures it and passes it to next(). This handoff facilitates the error's journey down the middleware stack until it encounters our dedicated error handling middleware. By delegating error handling to specific middleware, we ensure consistency across our routes and sidestep code duplication. This insightful article from the Express.js guide offers a deeper look.

II. Custom Error Handling Middleware in Express.js

Error handling middleware in Express.js resembles other middleware, save for one critical difference: it accepts four arguments ((err, req, res, next)) instead of three. Here's a rudimentary example:

app.use((err, req, res, next) => { 
  console.error(err); 
  res.status(500).send('An error occurred!'); 
});

In this instance, all errors, irrespective of their type or severity, receive the same bland response. To make this more dynamic, we need to generate and employ custom errors.

This basic error handling middleware does its job, but it lacks usability! Each error triggers the same monotonous response, “An error occurred!” Along with the error status code, this message makes it challenging to identify the root cause of the issue. We might see the original error logged, but without context, debugging can be cumbersome. To inject expressiveness and utility into our errors, let’s craft some custom ones.

III. Creating Custom Errors

Constructing custom errors not only imbues our errors with greater context but also streamlines the process of identifying and resolving them.

Let’s fashion a DatabaseError and a ValidationError class:

class DatabaseError extends Error { 
  constructor(message) { 
    super(message); 
    this.name = 'DatabaseError'; 
    this.statusCode = 500; 
  } 
} 

class ValidationError extends Error {
  constructor(message) { 
    super(message); 
    this.name = 'ValidationError'; 
    this.statusCode = 400; 
  } 
}

These classes augment the native Error class and incorporate a statusCode property. We'll harness this property when formulating error responses. If you're wondering about the significance of creating custom errors, this comprehensive write-up provides a thorough explanation.

IV. Using Custom Errors

With our custom errors primed and ready, let’s modify our earlier middleware example to utilize them:

app.get('/users', async (req, res, next) => { 
  try { 
    const users = awaitgetUsersFromDatabase(); 
    res.json(users); 
  } catch (error) { 
    if (error instanceofSomeDatabaseSpecificError) { 
      next(new DatabaseError('Failed to retrieve users from database!')); 
    } else { 
      next(error); 
    } 
  } 
 });

Now, when we encounter a database-specific error, we create a new instance of our DatabaseError and pass it to next(). Other types of errors continue to be passed as-is.

V. Customizing Error Responses

Armed with our custom errors, it’s time to polish our error handling middleware to yield appropriate responses based on the error type:

app.use((err, req, res, next) => { 
  if (err instanceof DatabaseError || err instanceof ValidationError) { 
    console.error(`${err.name}: ${err.message}`);
    res.status(err.statusCode).json({ error: err.message }); 
  } else { 
    console.error(err); 
    res.status(500).send('An error occurred!'); 
  } 
});

VI. A Comprehensive Example

Let’s consolidate all these techniques into a single application:

class DatabaseError extends Error { /* ... */ }
class ValidationError extends Error { /* ... */ }

const getUsersFromDatabase = async () => { /* ... */ };
const validateRequest = req => { /* ... */ };

app.get('/users', async (req, res, next) => { 
  try {
    validateRequest(req);
    const users = await getUsersFromDatabase();
    res.json(users); 
  } catch (error) { 
    if (error instanceof SomeDatabaseSpecificError) { 
      next(newDatabaseError('Failed to retrieve users from database!')); 
    } else if (error instanceof SomeValidationSpecificError) { 
      next(new ValidationError('Invalid request!')); 
    } else {
      next(error); 
    } 
  } 
});

app.use((err, req, res, next) => { /* ... */ });

Conclusion

Error handling is an art that demands conscious practice to perfect.

In this article, we’ve examined how to gracefully manage errors in Express.js. By employing custom error classes and designing an Express.js error handling middleware that leverages them, we can generate expressive, context-aware error responses. This approach not only simplifies the user experience but also eases our debugging process.

Until next time, fellow programmers!

For more coding tutorials, please subscribe to my YouTube channel: https://www.youtube.com/@CtrlAltVictoria and Twitter https://twitter.com/ctrlaltvictoria 💕🚀

Backend Error Handling: Practical Tips from a Startup CTO

Victoria Johnston — Thu, 17 Aug 2023 13:48:04 +0000

Hey there! I’m the CTO of a web3 startup, and before that, I was a senior engineer working on infrastructure and full-stack systems at Google. I’ve learnt a few things along the way about coding, and today, I’m excited to dive into something that we often overlook until it’s too late: error handling and logging.

When I first jumped ship from Google to start my own venture, it was all about moving fast, hacking stuff together, and making things work. One thing that I initially put on the back burner was implementing a robust error handling and logging system. After all, why waste time preparing for errors when you can just code to avoid them, right? Well, fast forward to countless hours spent debugging and troubleshooting, and I’ve learnt my lesson.

In this post, I’ll share with you some practical tips and best practices on error handling and logging that I’ve picked up, alongside examples from my own experiences. If I knew back then what I know now, I would have set up error handling right from the start. But as they say, hindsight is 20/20.

Let’s get into it!

Why is Error Handling Important?

Before we dive into the practicalities, let’s take a moment to consider why error handling is important. Well, for starters, errors are inevitable. No matter how careful you are, how experienced your team is, or how thorough your QA process might be, things can and will go wrong. That’s just the reality of software development.

Debugging: A proper error handling and logging system can make debugging significantly easier and faster. By having detailed error messages and logs, you can trace back the series of events that led to the error, making it easier to reproduce and fix.
Resilience: Handling errors appropriately can make your system more resilient. Instead of crashing the whole system, a well-placed try/catch can contain the error and allow the system to recover and continue running.
User Experience: Users don’t like seeing raw error messages or, worse, having the app crash on them. Good error handling can allow you to provide user-friendly error messages and fallbacks, leading to a better user experience.
Security: Detailed error messages can reveal more about your system than you might want. You can avoid potential security risks by controlling what gets revealed in an error message.

Getting Hands-on with Error Handling

Now, let’s get down to business. How do you handle errors effectively in a backend environment? There’s no one-size-fits-all answer, as it heavily depends on your tech stack, team size, project complexity, and a dozen other factors. That being said, there are some universally good practices to adhere to.

1. Centralized Error Handling

It’s a good practice to centralize your error handling as much as possible. This approach simplifies code readability and maintainability and ensures consistency. If you’re using a framework like Express.js, you can use middleware for this.

Here’s a simplified example:

app.use((err, req, res, next) => {
  console.error(err.stack);
  res.status(500).send('Something broke!');
});

This error-handling middleware would catch errors that occur in your route handlers and send a generic response. But what about sending more user-friendly messages or dealing with different error types? That’s where custom error classes come in.

2. Custom Error Classes

Custom error classes in JavaScript (or whatever language you’re using) allow you to create specific error types, each potentially having its own error handling. Here’s a simple example:

class ValidationError extends Error {
  constructor(message) {
    super(message);
    this.name = "ValidationError";
    this.statusCode = 400;
  }
}

class DatabaseError extends Error {
  constructor(message) {
    super(message);
    this.name = "DatabaseError";
    this.statusCode = 500;
  }
}

With these classes, you can throw specific errors in your code, and your error-handling middleware can behave differently depending on the error type.

app.use((err, req, res, next) => {
  if (err instanceof ValidationError) {
    res.status(err.statusCode).send(err.message);
  } else if (err instanceof DatabaseError) {
    res.status(err.statusCode).send('A database error occurred');
  } else {
    res.status(500).send('Something broke!');
  }
});

These are just simplified examples. In a real-world scenario, you might log the errors to an external service, handle more error types, etc.

3. Proper use of Try/Catch

Try/catch blocks are your bread and butter for catching errors as they occur. It’s important only to catch errors that you can handle. If you can’t handle the error (for example, you don’t know why it would occur), it’s usually better to let it bubble up to the global error handler.

Here’s a good use of a try/catch block:

try {
  const user = await getUserFromDb(userId);
} catch (err) {
  if (err instanceof NotFoundError) {
    // We know why this error occurred and we can handle it
    return createNewUser(userId);
  }
  // We can't handle any other errors, rethrow them
  throw err;
}

Here, we’re only catching a specific error that we know might happen and that we can handle. Any other errors get rethrown and can be handled by our global error handler.

The Importance of Good Logging

Now, while error handling is about dealing with errors as they occur, logging is about recording what happened so you can look back on it in the future. This can be extremely helpful when debugging.

There are several things you should consider when implementing logging:

What to log: You want to log any information that might be useful for debugging. This can include input parameters, output results, and any intermediate variables. However, be aware of privacy and security issues. Never log sensitive information like passwords.
When to log: Ideally, you want to log as much as possible, but there’s always a trade-off between detail and performance/storage. Consider using different log levels (error, warning, info, debug) to control this.
Where to log: For local development, logging into the console might be sufficient. But for a production system, you’ll want to use a logging service that can handle large volumes of logs, manage retention policies, and provide search and analysis tools. This could be a cloud service like Google’s Cloud Monitoring, a self-hosted solution like Elasticsearch, or a log management service like Loggly or Datadog.

Here’s an example of how you might log a function’s input and output:

function add(a, b) {
  console.log(`add was called with ${a} and ${b}`);
  const result = a + b;
  console.log(`add result is ${result}`);
  return result;
}

In a production scenario, you’d replace console.log with a call to your logging library or service, and you might add more detail (like a timestamp or the name of the function).

Final Thoughts

I hope this post has given you a practical insight into error handling and logging in backend development. If there’s one thing I want you to take away from this, it’s that error handling and logging are not an afterthought. They are an integral part of your code that can save you countless hours of debugging and many headaches. So, invest the time upfront to set up a good error handling and logging system. You’ll thank yourself later!

Stay tuned for future posts where I plan to delve deeper into some of these topics. If you have any questions, feel free to drop a comment or reach out to me. Happy coding!

For more coding tutorials, please subscribe to my YouTube channel: https://www.youtube.com/@CtrlAltVictoria and Twitter https://twitter.com/ctrlaltvictoria 💕🚀