DEV Community: darode

The art of caching for backend applications

darode — Wed, 18 Jan 2023 22:20:42 +0000

Caching is a common technique we use when creating our applications to increase performance by storing frequently-used data in ephemeral but fast storage. A cache can help you to reduce the amount of time that an application takes to access that data, which can improve the overall user experience and make the application more responsive.

When you have enough experience programming you start learning new things apart from the typical frameworks, libraries, design patterns, etc. One of these is how you can manage your information to improve your application performance, and here it’s where a cache comes in place.

In this article, we will explore the basics of caching in software development and discuss some of the key considerations when implementing a cache for your application.

What is caching?

First of all, we need to explain in depth a little bit more about caching.

As we said, caching is a way to store data in a temporary location so that it can be accessed faster and more scalable by our application.

This can be useful for reducing the amount of time that an application takes to retrieve data that is used frequently, such as the results of a database query or the contents of a frequently-visited web page. By storing this data in a cache, the application can access it faster the next time it is needed, which can improve the overall performance of the application dramatically.

This is the main reason why it’s very important for a software developer to master all aspects of caching and use it to make faster and more reliable applications.

This is the theory but, a lot should be said when you are going to implement a cache.

Key features of a cache

Before coding anything or installing the new trendy cache engine you need to think about the key aspects of a cache and make a lot of decisions.

Depending on your use case, the resources you have, the programming language you are using and the way your application is going to use the data, you will choose a different kind of cache and a different configuration for it.

The main parameters you should pay attention to are the following:

type of cache: you can choose to store data in a local or distributed cache
data to store: you can choose between storing data as is or compiling it to save time for future usages in your app
filling strategy: you can implement a lazy or eager cache depending on your use case the strategy chosen could improve or ruin your performance
size of cache: you need to know the amount of data you are going to store because it will define the number of cache entries you can store
keys: you should pay attention to the key or keys you use to store your data so it’s searchable the way you need it
TTL (time to live): is another essential parameter that defines how long data will be stored in the cache
eviction policy: this policy decides when a particular key will be removed from the cache to store new data

I know... there are a lot of parameters to configure and a ton of decisions to be made even before you write a single line of code... that’s why a cache it’s so amazing!

To make it easier we will review some of the most important in the following sections.

Local or distributed cache

Essentially we can identify two types of cache: local or distributed.

A local cache works in the same machine that is going to use the information. In other words, a local cache uses the same machine resources as your application does.

You can find two approaches to implementing a local cache: memory-based or disk-based

A memory-based cache stores data in memory that could be inside your application memory or using another process, shared memory, etc. The main advantage of this implementation is that it is the fastest way to access data. On the other hand, it has a particular caveat and it’s that data is only available as long as the cache engine is running.

A disk-based implementation uses disk storage which is slower than memory but can be persisted even if the application restarts. Another advantage is that usually, disk is cheaper and bigger than memory so you can store more data at lower costs.

A distributed cache works across a network and multiple machines access it, it’s useful to scale applications and to share data between different servers supporting huge applications. The typical way to implement these caches is to install an instance or a cluster of a specific cache engine server. These systems are meant to scale horizontally so it’s very common for them to have some features like clustering and sharding out of the box.

Which one is better? It depends on your use case... Because we have thousands of use cases for a cache, you have thousands of options to choose from. I have created the following comparison table to help you choose between a distributed or local cache for your application.

As you can see local and distributed caches are really different, but both make a lot of sense depending on your use case.

Let’s say you have data, like for example, weather information for a particular city and you want to cache it. In this example data it’s the same for all requests, doesn’t change so often and it’s small (in size). This use case could use a small local cache in disk or memory so you will avoid scaling a remote cache if you have a lot of requests.

Another use case that will need a distributed cache could be a requests access control system. These systems, for example, will store a cache with information about the number of requests per minute per API user. So you need to have all the information from all servers in a shared database to count all requests properly. In this use case, you need to use a distributed cache so you can share the information between all servers.

Important to notice that here we are not speaking about a particular technology. It is because you can implement local or remote cache using the same technology in most cases. For example, you can use a Redis instance for a local or remote cache. Some of you could think it’s a lot of overengineering but makes a lot of sense if you want to change a local cache for a distributed one in the future. In this case, you will be able to use the same technology without changing a single line of code of your application.

Choosing the best filling strategy

When we talk about the filling strategy we are talking about the way data will be inserted into the cache. Here we have two ways to do it: eager or lazy.

If you are going to use an eager approach it means you can handle a miss in your cache so you will insert or update data on a regular basis, but it doesn’t matter if some data is missing in your cache. The best way to implement this approach is using a background process that compiles data and loads the data in your cache.

The lazy approach assumes that you cannot miss information in your cache so basically, your application should try to fetch data from your cache and if a miss occurs the app will go to a secondary data source (usually a relational database) to fetch the information and then insert it in the cache for future usages. To implement a lazy approach you need to create a logic to access your data in your application to handle missing data and use a fallback. Depending on your programming language or framework you will find different options to do it.

Ok but... what about a hybrid approach? Great question! For sure you can combine both strategies and, in most cases, it’s the best option if you want to get the most performance without missing any data.

Designing your keys

The most important aspect is that the keys should be searchable for your application so you need to pay attention and take your time to make the right decisions here.

Keeping this in mind, you should adapt your keys to the way your application is going to retrieve data. So, if your application is going to retrieve data based on let’s say “product_id”, you should create a key using this information, for example, a key could have this format:

product-info:123

If your application needs multiple parameters you can concatenate all necessary parameters to create your key. For example, you would want to store product information with translations per language, like product description, etc. In this case, a valid key could have the following format:

product-info:123:en product-info:123:es

A common pitfall when choosing keys is storing huge amounts of data inside the same key. Because the cache systems are very fast indexing and managing information it doesn’t mean that they are not affected by network speed. So, if you are using a distributed cache you need to pay attention to the size of the data you store inside each key because it will be transferred through the network.

To avoid this problem you can split your cache into several keys with specific information depending on how your application will use it.

In the example we mentioned before, instead of storing all localized information in the same key you can split it in two. In one key you can store generic information like price, rating, packing size, etc. and use another key to store localized information like description, name, etc.

This way you are going to have two keys, like this:

product-info:123 product-lang:123:es product-lang:123:en

Your application can use whatever is needed while retrieving the least data on each request.

Calculating the best TTL value

Another important parameter in your cache is TTL (time to live).

As said before, TTL is like the expiration date of your data and defines when the value of a particular key could be deleted or replaced, so it’s no longer available for your application. Choosing the correct TTL is important to maximize your cache hit ratio without having deprecated data.

A high TTL value will increase the hit ratio but could turn your cache into an unusable source of information because the data is not up to date. If the TTL value is low your hit ratio will be lower but the information will be refreshed very often.

You can configure a static TTL value, the same for all keys, or a dynamic one, a different value depending on any key parameter. It’s very useful to define dynamic TTL values so you can improve the hit ratio keeping important keys up to date. For example, you can assign different TTL values for each type of product, language, etc.

Configuring the correct eviction policy

The last important parameter to configure your cache is the eviction policy. It decides which key will be deleted or replaced to store new information.

The most common eviction policies are:

random: removes any key
LRU (least recently used): removes the key that was not used recently
LFU (least frequently used): removes the key that was not accessed very often
TTL-based: removes keys nearly to expire

Each strategy has pros and cons but as a quick reference, you could use them as follows.

You can use random eviction policy when you don’t expect a key to be used more than others, and it’s a good default strategy to start caching.

If your cache has some keys that are accessed very often you should use LRU or LFU eviction policies so you want to ensure to preserve the most popular keys.

If your cache uses different TTL values for each key a good option is to use a TTL-based eviction policy so you preserve keys with higher TTL which could be more important.

Choosing an eviction policy sounds a little tricky but in reality, is a matter of trial and error. Usually changing the eviction policy doesn’t break the cache so you can try one and change to another one to see if it’s better for your use case.

What is the best technology for my cache?

Finally, we should talk a little bit about specific technologies or engines you can use to implement your cache.

First of all, let’s talk about local caches. You can use your hard drive to implement a cache. Yeah, I know... it’s not a proper engine but, for some use cases of local cache, caching information in a local file it’s more than enough and for sure it’s the easiest way to do it.

The most used engines are in-memory databases, you can use them for local and remote caches. They are blazing fast and really easy to use because you don’t need to create indices and configure endless parameters. This kind of database could have some problems because some engines will clear all data if the process is restarted. Some examples of this technology are Redis and Memcached.

Of course, you can use any database engine with more traditional capabilities (indices, expressive queries, etc.) NoSQL databases are very interesting to implement a cache because they support a really huge amount of requests per second. These kinds of databases have one advantage which is that they support data persistence when restarting the process and allow you to query your data (filtering, projections, etc.). In this group, you can find some databases like MongoDB, Elasticsearch or Cassandra.

In case you decide to use a distributed cache, you need to check the horizontal scalability options the different technologies have and the additional costs. For example, if you deploy a cluster or HA architecture your setup will start with at least three instances, so the costs will at least triple if not quadruple.

As you can see there is no one technology that is better than the other, but any of them could be a good one for your cache.

Conclusions

As a brief summary, a cache is a way to store the most used data of your application to increase the performance of your application.

The most important features of a cache are:
type (local or distributed)
the filling strategy
how do you define the keys
the TTL you choose to improve the hit ratio
the eviction policy to delete data

The most important thing is to ensure you have a proper hit ratio to increase your application performance as much as possible.

You can use specific technology like Redis or Memcache which are in-memory databases specially designed to create the cache, or you can use other engines like NoSQL databases (MongoDB, Elasticsearch, etc.) which are designed to handle a high number of requests and allow you some advanced features like filtering, etc.

Implementing a cache for an application it’s easier than it seems and it’s a wonderful experience when you see the increase in performance you get and how configurable these kinds of technologies are.

Good luck and cache it all!

Scaling backend applications for dummies

darode — Tue, 18 Oct 2022 22:21:06 +0000

Maybe you, as every developer, have faced a situation when your application starts to slow down and shows poor performance.

I know, it’s disappointing but it just happens. When you start working in an application and it gets more users and requests, some issues related to overloads, unexpected system failure and service downtime will occur. It’s completely ok, it just means you need to scale up your systems.

If you have enough experience you may plan for some scenarios upfront based on previous projects and that’s a really good starting point because it means your architecture is, at least at some point, scalable.

If you are new to this concept don’t worry, I have created a series of articles to introduce some concepts of scaling applications from the basics.

Scale... what?

Before going further I think it's time to clarify some concepts.

When we talk about scaling in software development we refer to the action to add more resources (hardware) to run a particular application.

But if my program runs perfectly, why do I need to scale?

Great question! Usually when you develop an application you do it in a certain context and when this context changes your application could be affected.

Let me explain with an example. You are programming an ecommerce like Shopify. At the beginning you didn’t pay attention to database performance and it’s completely ok, the app has a few users and response times are quite good. Someday the performance drops and you realize that after the marketing campaign now your application has 100k users at the same time. Your application is the same but it doesn’t work as before, so what happened… exactly! the context your application runs on changed.

These situations tend to happen from time to time. This is why it is so important to take care about the performance and the scalability of your application from the beginning.

I’m not saying that you should create the next Google or Facebook with data centers around the world from day one, but you need a strategy and tools to scale your application. And that’s why this series of article exists.

How can I scale my application?

Now that we are on the same page so it's time to go deeper into scalability concepts.

First of all I want to depict the typical flow when scaling an application.

The first part of scaling an application is to know what it’s going on under the hood. In other words, we need to find what makes performance decrease in order to apply a solution to the problem.

The second part is to apply a solution, obviously. At this point we are really lucky because there are a lot of solutions documented to help us when scaling our applications.

If you ask me for a brief summary of this process should be something like this:

Analysis:

Enable observability tools
Make a root cause analysis
Improve:
Optimize code and business logic
Optimize database usage (e.g. slow queries)
Use background processing
Apply advanced scaling patterns

In this article and the following ones we will cover all these aspects and phases of the process. Let’s start talking about observability.

Enable observability tools

This sounds really obvious but it is not.

Before applying any solution to your scaling problem you need to know where the problem is and what is causing it. We should apply different solutions depending on the problem. We are not going to apply the same improvements for slow queries in a database and for an overloaded server that cannot handle more requests.

So first of all you need to audit your application. In the best case scenario your application has some observability tools in place like Kibana, Grafana, Prometheus or similar. If so, you are lucky because you can check this data to spot the problem. If not, my advice is that you need to implement an observability tool right now.

Because this article is an introduction to general concepts I’m not going to give you a deeper explanation on how to install and use any of these tools. You have hundreds of articles about it on the internet. But I want to give you some tips on what data you should gather to get a good observability of your application.

The most important information you want to know about your application is:

-application id: to identify your application in case you have many of them
-request status: true/false or and error code, it’s up to you
-request time: time elapsed from start to finish of request in seconds or milliseconds
-server information: server name/cluster id or similar, helps you to spot specific problems
-profiling information: information about time to run some methods or classes of your application, call to external apis, etc.

You should gather this information for each request. If your app has a huge amount of requests consider gathering data for at least 10%-20% of requests. You can store this information as logs, but if you use an observability tool to display it you will be more productive because you get all your data at a glance.

If you want to get your observability to the next level you can take a look at APM libraries. APM allows you to gather low level information like errors and stack traces from your app with a few lines of code using libraries. These libraries are available for most programming languages and frameworks and are compatible with all observability tools.

In this section we should mention another source of information really useful to deal with performance problems and it’s the classic server monitoring. This information is usually available through specific server monitoring applications like Nagios, Centreon, Zabbix, Pandora FMS or more generic one like observability tools mentioned above.

This information about usage of resources like CPU, RAM, network, etc. could give you a lot of information about your application performance and help you to analyze the problem. So don’t underestimate this kind of information when conducting a performance analysis of your application.

With all the insights of what is happening inside our application we can spot performance problems and start making improvements.

Make root cause analysis

Now, you have all the information available and you need to start searching for the problems.

I’m my experience the typical problems related to poor application performance are caused usually by:

-requests to external APIs that take too long
-not optimized database queries
-non scalable code
-lack of resources

Problems with third party requests should be easy to spot because they tend to be consistent. Usually they are caused by a problem in the vendor’s systems and could be temporary or structural. Besides being easier to find, these problems are not so easy to fix, mainly because you cannot control the code of this API. So basically you should contact your vendor and see what they can do. If you are lucky enough and it’s an internal API developed by another team, just grab a coffee and talk with your colleague to see what is happening and how you can help.

If your problem is related to non optimal queries then the approach is completely different. You should analyze your queries according to your database engine using some “explain” command and look for some improvements. We will cover this topic in the following sections.

Another typical problem is a non optimal code, this is so general, I know… But in my experience a lot of problems are caused by code that is not optimized in terms of business logic and use of resources. Some examples are: repeated queries during the same request, doing business logic in requests that could be deferred, legacy code not useful but not refactored, and so on…

A lack of resources problem is sometimes easy to spot if you have correct observability in place. Sometimes even if your code, queries, etc are optimized and perfect you can reach the limit of your database engine, web server, etc. In this case you will see errors with messages like “connection limit reached”. We will explore some options to solve them in the following sections.

Using the observability information you should be able to identify problems and it’s kind easily. Once you do that the next step is to know how to start solving them.

Scaling by improving code performance

First of all I want to talk about code performance because in my opinion it is the easiest problem to solve but the last one people address.

Before optimizing database queries or making major refactors in your code, or improving your hardware you should review your implemented business logic.

But… What should you look for? Great question! I can give you a list of example:

repeated execution of code during same request, the result should be cached, don’t repeat again same thing
legacy code no longer needed but that it's being executed, remove it immediately
heavy weight loops or iterating same collection several times, try to loop collections as few as you can and avoid loops of loops when possible

Those tips are the most common pitfalls but for sure there are plenty of bad practices to avoid. This is an introductory course so, this should be treated as an example to guide you on making a root cause analysis.

If you improve these issues you can get more performance with a low effort and without making a huge refactor or even expending a lot of money to get more power for your servers.

Scaling by improving database performance

The second place will go to query performance optimization.

In my experience one of the most common issues when working with databases is just writing the structure or inserting information without paying attention to performance.

For example, creating the correct index could make your application x10 times faster.

So it’s crucial to ensure you have the correct indexes in place and that your queries and data structures are optimal to get the most of your database engine.

I won’t dive deeper into database optimization because this is an introductory article. You have a lot of resources online for your database engine

Scaling by leveraging background processing

When your code and database queries are optimized and you still have some part of your process that takes a long time you have the last silver bullet… You can try background processing.

Some parts of business logic can slow down your application requests like logging, inserting a huge amount of data, etc. Sometimes these actions don’t belong to request’s business logic and you can defer them.

A very good example is the “create user” request that sends the welcome email inside the same request. In this case the request just needs to insert user information into the database and return OK when done, you can send the email using a background process a few seconds later.

But.. How can you do it? Ok, it’s easier than it seems.

You can implement an event bus pattern using any message queue servers available like RabbitMQ, Kafka, AWS SQS, etc.

The idea is to send an event informing about some action, for example “user created”, in the main request. Then you will have a separate process, usually a daemon, running in background listening to these events, for example “WelcomeEmailSenderListener” that sends the welcome email.

Using background processing you can make your requests faster and it allows you to scale each part of your system separately allocating the resources they need.

Some advance scaling patterns

When you implement all techniques above and your applications continue to perform really badly, then you need to apply some advanced scaling patterns. These patterns will help you out when gaining performances for your applications.

Some of theses patterns are:

-Implement some caching system
-Scaling servers vertically
-Scaling servers horizontally
-Implement database sharding

We will cover these patterns in the following articles in depth but to give you a brief introduction all of them are based on improving the usage of the hardware resources you have to run your application.

In the real world we mix and match to get the best of all of them so you will end up with a more complex infrastructure and applications but on the other hand you will have a more reliable system that allows you to increase the number of requests and users effortlessly.

Conclusions

Scaling is very important to ensure your application can deal with a huge amount of requests and a lot of users which is a good sign that business is performing well.

Enabling a good observability to gather performance data from your applications is key to understanding why your application has a low performance.

It’s a good practice to check your code and database performance before adding more hardware to run your applications. Some easy to fix issues like removing legacy code, doing a refactor in business logic or creating some indexes in the database could improve your app performance ten times.

You can improve your application performance by using more advanced scaling patterns like: caching, vertical and horizontal scaling, sharding, etc. that allow you to get the most of the hardware you use to run your application.

This is the first article of a series that will cover some aspects of scaling applications, you will find more information about how to scale your application in the next articles.