DEV Community

Julien Dubois for Microsoft Azure

Posted on

Java distributed caching in the cloud with the new Azure discovery plugin for Hazelcast

Many thanks to Alparslan Avci and Mesut Celik from Hazelcast for their very precious help when writing this blog entry!

The need for distributed caching in the cloud

Scaling stateless applications in the cloud is easy, scaling a database is a lot more trouble.

If you want to run applications efficiently in the cloud, you'll need to use a caching mechanism: if you don't cache your data, any spike in traffic will automatically result in a spike in your database requests. And your database is probably the part of your architecture that scales less, in particular if you're using an SQL database. Also, this is probably the most costly part of your architecture, so even if it does scale up, your bill is going to scale up too, and you probably don't want that.

That's why in JHipster, we have worked very hard, for years, and with several major cache providers, in order to give you the best solution possible.

Typically, we have two types of usage for cache:

  • Hibernate second-level cache (often called "L2 cache"), that directly caches your entities.
  • Spring Cache abstraction, that will cache more complex business objects.

Both have their advantages, and that's why we set them both up in JHipster.

Then, we have different caching implementations, that fall in the following 3 main categories:

  • Local caches (like Caffeine): they are the most efficient, but they don't scale across nodes. So when you add more application nodes, you will have cache synchronization issues, and you will see old (stale) data.
  • Remote caches (like Redis): they use a remote server, so they will scale up easily, but their performance is often order of magnitudes slower (to get an object you need to fetch it over the network, and then deserialize it - when for a local cache you are just doing a pointer to an object which is already inside your heap).
  • Distributed caches (like Hazelcast): they would act as a local cache, but are able to scale when you add application nodes thanks to a distributed synchronization mechanism. High-end solutions like Hazelcast can also work as remote caches, but will benefit from having a local cache (often called "near cache") for improved performance. Those look like the best solution, but they come at a price: they are more complex to set up! Years ago, people would use multicast to find nodes automatically, but in the cloud you cannot use such a network feature, and you need a specific mechanism for the nodes to find each other.

In this post, we're going to have a look at a new solution, the Azure discovery plugin for Hazelcast: it uses a specific Azure API in order to scale automatically, solving the complex issue of configuring the distributed cache.

Introduction to the new Azure discovery plugin for Hazelcast

The Azure discovery plugin for Hazelcast is an Open Source project developed by Alparslan Avci from Hazelcast, and available on GitHub at https://github.com/hazelcast/hazelcast-azure.

Such a plugin has been released many years ago, so you might find information about an older version: this new release is a complete rewrite, that is much better has it uses the Azure Instance Metadata service.

The Azure Instance Metadata service is a REST Endpoint accessible to all IaaS virtual machines created via the Azure Resource Manager. The endpoint is available at a well-known non-routable IP address (169.254.169.254) that can be accessed only from within a virtual machine.

The Azure discovery plugin for Hazelcast will need to authenticate to that REST endpoint, and then will query it to know which other virtual machines should be in the cluster.

Installing the Azure discovery plugin for Hazelcast

All the code shown in this article comes from a specific Open Source project that I have created, and which you can check at https://github.com/jdubois/jhipster-hazelcast-azure.

This is a modified JHipster application, as this plug in only works with the newly-released Hazelcast 4, which is not does not fully supports Spring Boot yet (Spring Boot Actuator isn't compatible with Hazelcast 4, but this will fixed with the next Micrometer release).

To install Hazelcast 4 and the new Azure discovery plugin for Hazelcast, you need to add them as dependencies in your pom.xml file:

<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast</artifactId>
    <version>4.0</version>
</dependency>
<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast-hibernate53</artifactId>
    <version>2.0.0</version>
</dependency>

As we are not going to use it, if you have the Spring Boot support for Hazelcast in your dependencies (which would be the case for a JHipster application), you can remove this dependency, namely com.hazelcast:hazelcast-spring.

Configuring the Azure discovery plugin for Hazelcast

For this example, we will use JHipster's CacheConfiguration object, but greatly simplify it so you can use it easily in any classical Spring Boot application, and also to make it easier to understand:

package com.mycompany.myapp.config;

import com.hazelcast.config.Config;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.core.env.Profiles;

@Configuration
public class CacheConfiguration {

    private final Logger log = LoggerFactory.getLogger(CacheConfiguration.class);

    private final Environment env;

    public CacheConfiguration(Environment env) {
        this.env = env;
    }

    @PreDestroy
    public void destroy() {
        log.info("Closing Cache Manager");
        Hazelcast.shutdownAll();
    }

    @Bean
    public HazelcastInstance hazelcastInstance() {
        log.debug("Configuring Hazelcast");
        Config config = new Config();
        config.setInstanceName("hazelcasttest");
        // If running in Azure, use the Hazelcast Azure plugin
        if (env.acceptsProfiles(Profiles.of("azure"))) {
            log.info("Configuring the Hazelcast Azure plug-in");
            config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
            config.getNetworkConfig().getJoin().getAzureConfig().setEnabled(true);
        }
        return Hazelcast.newHazelcastInstance(config);
    }
}

The important part here is the following line:

config.getNetworkConfig().getJoin().getAzureConfig().setEnabled(true);

There are many more available configuration options detailed at https://github.com/hazelcast/hazelcast-azure, but that default setup is already good enough for most usual needs: the plugin will use the Azure Instance Metadata service to find other virtual machines in the same resource group, and will create a cluster with all of them.

Some important notes on this cluster:

  • It will use the default private Virtual Network that is created when creating the Virtual Machines. That means that all communications will be hidden to the outside world, and that you don't need to configure a firewall for it (it is already open by default on that private network).
  • This network uses the 10.0.0.0\24 IP addresses, so you will probably see your Virtual Machines using IPs like 10.0.0.4 and 10.0.0.5 to communicate.

Configuring the virtual machines to access the Azure Instance Metadata service

The plugin will only work if it can query the Azure Instance Metadata service and get the list of the other virtual machines that are configured in the current resource group.

You will need to go to the Azure Portal, and for each virtual machine you will need to do the following operations.

Enable a system assigned managed identity to each virtual machine

For each virtual machine, look at the Identity menu item in the left menu, and enable a "System assigned" identity:

Enable a system assigned managed identity

This will allow you to give roles to that virtual machine.

Give the "READ" role on the resource group to each virtual machine

Still in the Azure Portal, go the resource group in which your virtual machines have been created.

In the left menu, select Access control (IAM), and give the Reader role to all the Virtual Machines that you want in your cluster:

Give the READ role

Running everything and testing the cluster

Now that everything is configured, you can run your Spring Boot applications in each Virtual Machine.

If everything is correctly set up, each of them should have Hazelcast starting, and the nodes should all join the same cluster, like in this screenshot below:

Cluster running

In that example, we can see that we have 2 nodes, using the private network that we described above (the IPs are 10.0.0.4 and 10.0.0.5). You can see that the current node is 10.0.0.4 (it has a this next to it), and obviously you'll see the opposite on the other node (10.0.0.4 ... this).

If you want to test that the cluster is working fine, an easy solution with the sample application that we created on https://github.com/jdubois/jhipster-hazelcast-azure is to do the following:

  • Open up port 8080 on the public IP of each virtual machine
  • Access each application using that public IP
  • Modify some cached data on one node (as we use the Hibernate L2 cache, all database data is cached), and verify on the other node that the data changed accordingly

Synchronized data

Conclusion

A distributed cache is the best solution to scale cloud-native applications, but configuring it in the cloud can be challenging: the new Azure discovery plugin for Hazelcast is a great solution to solve that issue efficiently.

This solution will work great when running virtual machines in Azure, but how will it work with more advanced Azure services?

If you want to check the code used in this article, it is available at https://github.com/jdubois/jhipster-hazelcast-azure.

Top comments (0)