DEV Community

loading...

Why Redis Cache times out in Azure Function App on Consumption Plan? - A Journey

tehmas profile image Asad Raheem ・4 min read

I decided to move a power-user feature to an Azure Function App. Redis Cache was extensively being used. In a controlled environment, it resulted in better scalability and performance.

The problem?

Redis time-out exceptions were being thrown on production. Always? No. Sometimes? Yes and that was even a bigger problem as it was difficult to trace the root cause.

I was following the approach mentioned in Microsoft documentation.

private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
    string cacheConnection = ConfigurationManager.AppSettings["CacheConnection"].ToString();
    return ConnectionMultiplexer.Connect(cacheConnection);
});

public static ConnectionMultiplexer Connection
{
    get
    {
        return lazyConnection.Value;
    }
}
Enter fullscreen mode Exit fullscreen mode

First Hunch

Redis Server Load might have exceeded the plan. To my surprise, that was not the case. Redis was hardly ever exceeding 10% server load.

Second Hunch

Redis server is single-threaded. Object size might be too large in the cache.

Avoid using certain Redis commands that take a long time to complete, unless you fully understand the impact of these commands. For example, do not run the KEYS command in production. Depending on the number of keys, it could take a long time to return. Redis is a single-threaded server and it processes commands one at a time. If you have other commands issued after KEYS, they will not be processed until Redis processes the KEYS command.

That was also not the case.

Third Hunch

Another feature synchronously accessing Redis for a large object might be causing this issue but it just didn't make sense. Such features weren't being frequently used.

Fourth Hunch

Noisy neighbors. Azure Redis Cache Standard Tier C0 plan was being used. It turns out C0 plans aren't meant for production use.

The Basic tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are meant for simple dev/test scenarios since they have a shared CPU core, little memory, and are prone to "noisy neighbor" issues.

Upgraded the plan and waited patiently. The issue still didn't resolve.

Time for Experimentation

Made a testing gear for generating a large number of asynchronous requests to access Redis Cache using the same lazy initialization pattern.

Viola! The much-awaited timeout finally occurred on my local system. It was occurring when multiple threads were trying to access the cache. Due to the lazy loading pattern mentioned above, the cache connection was asynchronously tried to be initiated by every request. According to the documentation:

The Lazy instance is not thread safe; if the instance is accessed from multiple threads, its behavior is undefined. Use this mode only when high performance is crucial and the Lazy instance is guaranteed never to be initialized from more than one thread. If you use a Lazy constructor that specifies an initialization method (valueFactory parameter), and if that initialization method throws an exception (or fails to handle an exception) the first time you call the Value property, then the exception is cached and thrown again on subsequent calls to the Value property.

But how was this occurring on production? The answer, consumption plan.

The function app is not always running on the consumption plan. The Redis connection was being initialized whenever the function was triggered by an Azure Storage Queue message. The problem was occurring if the function app received a burst of messages either when it wasn't already running or it was scaling out.

Solution

Pass a LazyThreadSafetyMode mode in the constructor. Yes, that's it. Other than None, there are two options PublicationOnly or ExecutionAndPublication. For my use-case, I needed PublicationOnly as stated in the documentation:

When multiple threads try to initialize a Lazy instance simultaneously, all threads are allowed to run the initialization method (or the parameterless constructor, if there is no initialization method). The first thread to complete initialization sets the value of the Lazy instance. That value is returned to any other threads that were simultaneously running the initialization method, unless the initialization method throws exceptions on those threads. Any instances of T that were created by the competing threads are discarded.

private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
{
    string cacheConnection = ConfigurationManager.AppSettings["CacheConnection"].ToString();
    return ConnectionMultiplexer.Connect(cacheConnection);
}, LazyThreadSafetyMode.PublicationOnly);

public static ConnectionMultiplexer Connection
{
    get
    {
        return lazyConnection.Value;
    }
}
Enter fullscreen mode Exit fullscreen mode

The fix itself was simple but figuring out the exact conditions on the production environment was difficult.

Note: In the above code snippets, ConfigurationManager is being used to access App Settings. I wrote that here to stay consistent with the documentation. Since Azure Function App v2, Environment.GetEnvironmentVariable should be used.

I hope this helps.

Discussion

pic
Editor guide