Martin Oehlert

Posted on May 1

Scaling Azure Functions: Consumption vs Premium vs Dedicated

#serverless #azure #azurefunctions #dotnet

Azure Functions Beyond the Basics
Continues from Azure Functions for .NET Developers (Parts 1-9)

Part 1: Running Azure Functions in Docker: Why and How

Part 2: Docker Pitfalls I Hit (And How to Avoid Them)

Part 3: Scaling Azure Functions: Consumption vs Premium vs Dedicated (you are here)

Your Consumption plan function works fine in dev. Then production traffic arrives, the app scales to zero during a quiet period, and the next request takes 6.8 seconds. The question that follows is always the same: do you switch to Premium at $146/month, or is there something between free-with-cold-starts and always-warm-but-always-billing? Azure Functions has five hosting options now (Consumption, Flex Consumption, Premium, Dedicated, and Container Apps), each with a different billing model and a different answer to that question. This article covers the four App Service-based plans; Container Apps is a different deployment model aimed at containerized microservices. All code samples are in the companion repo.

Consumption: true serverless, true cold starts

Microsoft now labels the Consumption plan as "legacy" in its hosting docs and is directing new serverless workloads to Flex Consumption. But Consumption is still where most Functions apps start, and where many should stay. You deploy your code, the platform handles the rest. No servers to manage, no capacity to plan. You pay only when your functions execute.

How the scale controller works

The scale controller monitors event rates for each trigger type and decides how many instances to run. Since runtime v4.19.0, it uses target-based scaling by default. The formula is one division:

desired instances = event source length / target executions per instance

What "event source length" means depends on the trigger. For Storage Queues, it's queue length. For Service Bus, active message count. For Event Hubs, unprocessed events per partition. For Cosmos DB, pending changes in the change feed. The controller reads these signals and adjusts instance count accordingly.

The controller adds up to four instances at a time. HTTP triggers get new instances at most once per second. Non-HTTP triggers scale at most once every 30 seconds. This is fast enough for gradual traffic ramps but won't help with sudden spikes from zero.

Instance limits and billing

Each Consumption instance gets 1.5 GB of memory and one CPU core. The maximum instance count is 200 on Windows and 100 on Linux (with a 500-instance-per-subscription-per-hour rate limit on Linux).

Billing has two components:

Executions: $0.20 per million, with 1,000,000 free per month
Execution time: $0.000016 per GB-second, with 400,000 GB-seconds free per month

Memory is rounded up to the nearest 128 MB bucket. Execution time rounds to the nearest millisecond, with a minimum billable unit of 128 MB x 100 ms. For a function that runs a few thousand times a day at under a second each, you'll stay well inside the free grant.

Cold start reality on .NET

After roughly 20 minutes of inactivity, the Consumption plan scales to zero. The next request waits for the platform to provision a fresh instance and start your application from scratch.

On .NET isolated worker, that cold start typically lands between 2 and 7 seconds. Heavy DI registrations push it past 10. The in-process model was faster, but Microsoft is retiring it in November 2026.

For timer triggers, queue processors, and other background work, a few seconds of cold start is invisible. For HTTP endpoints that a user is waiting on, it's a problem.

What Consumption can't do

The hard constraints that push teams to other plans:

No VNet integration. If your function needs to reach resources inside a virtual network, Consumption is off the table.
10-minute execution timeout. The default is 5 minutes, configurable to 10. Long-running orchestrations or batch jobs need a different plan.
No per-function scaling. All functions in the app scale together. A chatty timer trigger can cause the platform to allocate instances that your HTTP trigger didn't need.
600 active outbound connections per instance. Hit this with parallel HTTP calls to external APIs and requests start failing.

Linux Consumption is retiring September 30, 2028. Microsoft is directing all new Linux serverless workloads to Flex Consumption. If you're starting a new project on Linux, skip Consumption entirely.

Flex Consumption: the middle ground

Flex Consumption is the plan Microsoft now recommends for new serverless workloads. It addresses the two biggest Consumption limitations: no VNet support and no way to reduce cold starts without jumping to a $146/month Premium plan.

The plan scales to zero like Consumption, but adds always-ready instances that you can configure to stay warm. It supports VNet integration out of the box. And it scales to 1,000 instances instead of Consumption's 200.

Always-ready instances vs on-demand

By default, Flex Consumption behaves like regular Consumption: zero instances when idle, on-demand instances when events arrive. The difference is you can configure always-ready instances that stay running regardless of traffic.

Always-ready instances are assigned to scale groups:

http: all HTTP and SignalR triggers
durable: orchestration, activity, and entity triggers
blob: Event Grid-based blob triggers
function:<FUNCTION_NAME>: a specific function

Setting always-ready to 2 for the http group keeps two instances permanently running for HTTP functions. Those handle traffic first. If demand exceeds their capacity, the platform adds on-demand instances on top.

az functionapp scale config set \
  --resource-group my-rg \
  --name my-func-app \
  --always-ready http=2

On-demand instances scale to zero when idle. Always-ready instances are billed continuously whether they're executing functions or not. If you enable zone redundancy, the minimum is 2 always-ready instances per group.

Billing: per-second, not per-execution

Flex Consumption bills differently from Consumption. Instead of per-execution pricing with sampled memory, you choose a fixed instance size upfront and pay per GB-second of active execution time:

On-demand rates are $0.000026 per GB-second and $0.40 per million executions. The monthly free grant is smaller than Consumption: 250,000 executions and 100,000 GB-seconds (compared to Consumption's 1,000,000 and 400,000).

Always-ready instances have a separate billing structure with no free grant. The baseline (idle) rate is $0.000004 per GB-second, roughly 6.5x cheaper than the on-demand execution rate. When always-ready instances are actively executing, the execution time rate is $0.000016 per GB-second (the same as Consumption's rate, and cheaper than on-demand).

The minimum billable execution is 1,000 ms (1 second). After that, billing rounds to the nearest 100 ms. This is less granular than Consumption's per-millisecond rounding, so very fast functions (under 100 ms) cost relatively more on Flex.

Each instance also gets an extra 272 MB platform buffer that isn't billed. This is memory the Functions host and worker process use, not your function code.

Scale behavior

Flex Consumption scales per function by trigger type. HTTP and SignalR triggers scale together. Durable Functions triggers scale together. Blob triggers (Event Grid source) scale together. Everything else scales independently per function. This fixes a real problem from Consumption, where a noisy timer trigger could cause unnecessary instance allocation for your HTTP functions.

Maximum instances: 1,000 (default limit is 100, configurable via CLI). All Flex Consumption apps in a subscription and region share a regional quota of 250 cores by default. The formula: instances x cores per instance (0.25 for 512 MB, 1 for 2,048 MB, 2 for 4,096 MB). One app running 1,000 instances at 512 MB consumes the entire quota (1,000 x 0.25 = 250 cores). You can request an increase through Azure support, but plan for this limit when running multiple Flex apps in the same region.

The constraints to know about

Flex Consumption comes with real limitations:

One app per plan. Consumption and Premium let you put up to 100 function apps on one plan. Flex is one-to-one.
No deployment slots. Rolling updates are in public preview as an alternative (zero-downtime deployments without slot swaps), but if your deployment strategy depends on slot swaps, this is a blocker today.
Linux only. No Windows support.
Isolated worker only. The C# in-process model is not supported.
App init timeout: 30 seconds. If your startup code takes longer, the instance fails to initialize. This is not configurable.
Blob trigger uses Event Grid only. The polling-based blob trigger is not available.

Flex Consumption also supports Azure Files storage mounts, letting you mount SMB shares as local directories. This is useful for large binaries, ML models, or shared reference data that you don't want to package in your deployment.

The Linux-only constraint is less of an issue than it sounds. Linux is where .NET Functions performance is best, and the in-process model (the main reason teams stayed on Windows) is being retired anyway.

VNet integration works the same way as Premium: subnet delegation to Microsoft.App/environments, support for private endpoints on storage accounts, Key Vault references over VNet, and native virtual network triggers for non-HTTP event sources.

Premium: warm instances, guaranteed

Premium (Elastic Premium) is the plan teams reach for when cold starts become unacceptable. It keeps at least one instance running at all times, so your functions never start from zero. That guarantee comes with a price floor: even with zero traffic, you're billed for that minimum instance.

What you get for $146/month

Billing is per-second based on vCPU-seconds and GB-seconds allocated across instances. No per-execution charge. The EP1 cost breaks down to ~$116.80/vCPU/month + ~$8.32/GB/month at pay-as-you-go rates in US regions. Savings plans (1-year or 3-year commitments) offer roughly 17% off.

There is no free grant on Premium. From the moment your plan exists, the meter is running.

Pre-warmed instances and elastic scale

Premium uses two layers to eliminate cold starts:

Always-ready instances run continuously, regardless of load. You configure how many per app, up to 20. These are billed 24/7, executing or not. If you have multiple function apps on the same Premium plan, the plan's minimum instance count equals the highest always-ready count among all apps.

Prewarmed buffer instances sit behind the always-ready pool. The default is 1. When all active instances are handling traffic, the prewarmed instance swaps to active and the platform immediately provisions a new buffer instance to take its place. This means scale-out events get a warm instance instead of a cold one.

You can define a warmup trigger that runs during the prewarming window. This is where you force-initialize lazy dependencies, open database connections, and prime HTTP connection pools before the instance receives real traffic:

public class Warmup
{
    private readonly HttpClient _httpClient;
    private readonly Lazy<ExpensiveAnalyticsClient> _analytics;

    public Warmup(HttpClient httpClient, Lazy<ExpensiveAnalyticsClient> analytics)
    {
        _httpClient = httpClient;
        _analytics = analytics;
    }

    [Function("Warmup")]
    public void Run([WarmupTrigger] object warmupContext)
    {
        _ = _analytics.Value;
        _ = _httpClient.GetAsync("/health", HttpCompletionOption.ResponseHeadersRead);
    }
}

The warmup trigger only fires during scale-out, not on restarts or deployments. It's available on Premium and Flex Consumption, not on the Consumption plan.

Elastic scale can burst up to 100 instances on Windows and 20-100 on Linux depending on region. Scaling beyond the minimum is best-effort: the platform allocates instances as fast as it can, but rapid spikes can outpace the prewarmed buffer. When that happens, you get cold starts even on Premium.

VNet and other features

VNet integration is supported but not automatic. You configure it at creation time or after, using regional VNet integration with a dedicated subnet (at least 100 available IPs). Private endpoints for inbound traffic are fully supported: you can create a private IP in your VNet and restrict all public access.

Non-HTTP triggers from VNet-secured resources (Service Bus with private endpoints, for example) require enabling Runtime Scale Monitoring. Without it, the scale controller can't read the event source metrics to decide when to scale.

Other features that set Premium apart:

Execution timeout: 30 minutes default, configurable to unbounded. Consumption caps at 10 minutes.
Deployment slots: 3 (including production). Consumption gets 2, Flex gets 0.
Apps per plan: up to 100 function apps on a single Premium plan, sharing the VM pool.
Custom Linux container images are supported.

When Premium is the wrong call

The most common mistake is jumping to Premium from Consumption solely because of cold starts, without evaluating the alternatives.

If VNet was your only reason, Flex Consumption now gives you VNet integration with scale-to-zero pricing. No need to pay $146/month for network access.

If your workload is sporadic (a few hundred invocations a day), the math doesn't work. That function costs pennies on Consumption. On Premium EP1, it costs $146/month regardless of usage. The cold start tax has to be genuinely painful to justify that gap.

And watch the SKU names. EP1 is Elastic Premium. P1v2 is a Dedicated App Service plan. They behave completely differently: EP1 scales dynamically based on event volume, P1v2 gives you a fixed VM that you scale manually. If your Terraform or Bicep has sku = "P1v2" and you expected autoscaling, check again.

Dedicated: fixed compute, fixed bill

The Dedicated plan runs your functions on a standard App Service plan. Same infrastructure, same pricing, same scaling model as a web app. Multiple function apps and web apps can share the same plan.

This is the plan you pick when you already have App Service infrastructure and want to add functions without creating a separate billing line item.

Pricing and compute

These are Windows pay-as-you-go prices for US East. Linux is cheaper (roughly 40-50% less for P-series tiers). P1v2 is a previous-generation SKU; Microsoft recommends P1v3 for new deployments.

Billing is hourly, prorated to the second, per scaled-out instance. Reserved instances (1-year or 3-year) can save up to 55% on Linux. The cost is fixed: you pay the same whether your functions execute zero times or a million times per day.

Scaling: you manage it

There is no event-driven scaling on Dedicated. The scale controller that powers Consumption and Premium does not apply here.

Your options:

Manual scale-out: set the instance count in the portal or via CLI
Rule-based autoscale (Standard tier and above): trigger scale-out based on CPU percentage, memory usage, or a schedule

Autoscale on App Service is slower than Premium's elastic scale. It reacts to sustained load patterns, not individual event bursts. App Service also has a newer "automatic scaling" feature for HTTP-based traffic, but it's not supported when Functions apps are in the plan.

Maximum instances: 10-30 per plan, or 100 in an App Service Environment (ASE).

Always On must be enabled in the App Service configuration. Without it, the Functions runtime goes idle after a period of inactivity. Unlike Consumption's scale-to-zero (which the platform manages), an idle Dedicated plan just means your functions silently stop processing. You're still billed for the compute.

When Dedicated fits

Dedicated makes sense in specific circumstances:

You already have an underutilized App Service plan. Adding functions to existing compute costs nothing extra. The plan is already paid for.
You run mixed workloads. A web app and a set of background processing functions on the same plan, sharing resources.
You need deployment slots. Up to 20, far more than Premium's 3.
Predictable billing matters more than efficiency. Some finance teams prefer a fixed monthly line item over variable serverless costs.

The downside is resource contention. If your web app and function app share an S1 instance and the web app spikes, your function throughput drops. There's no isolation within the plan.

Cold start mitigation: what to try first

If you're staying on Consumption or Flex Consumption, cold starts are part of the deal. The strategies below are ordered by impact, highest first. Not all of them apply to every plan.

1. ReadyToRun compilation

The single highest-impact change for .NET cold starts on Consumption and Flex Consumption. Two lines in your .csproj:

<PropertyGroup>
    <PublishReadyToRun>true</PublishReadyToRun>
    <RuntimeIdentifier>linux-x64</RuntimeIdentifier>
</PropertyGroup>

ReadyToRun pre-compiles your assemblies to native code. The JIT compiler still runs for hot paths at runtime, but the initial load skips the bulk of compilation overhead. In practice, this cuts cold start time roughly in half.

The trade-off: your deployment package grows 2-3x because the assemblies contain both the native precompiled code and the original IL. For a typical Functions app, that's still well under the 1 GB deployment limit.

2. Placeholder optimization for .NET isolated

The Functions platform can pre-provision a worker process before your app code loads. Enable it with an app setting:

WEBSITE_USE_PLACEHOLDER_DOTNETISOLATED=1

This requires .NET 6+, a 64-bit process, and the latest Azure Functions SDK versions. The placeholder worker starts the .NET runtime and gets the IPC channel ready while your code is still being loaded, shaving off part of the startup sequence.

Combine this with ReadyToRun for the best result on Consumption.

3. Trim your DI registrations

Every service you register in Program.cs adds to startup time. On a warm instance this is negligible. On a cold start, it compounds.

Register HTTP clients and SDK clients as singletons so they're constructed once and reused. Wrap expensive dependencies in Lazy<T> so they're only built when a function actually needs them:

builder.Services.AddSingleton(_ => new HttpClient(new SocketsHttpHandler
{
    PooledConnectionLifetime = TimeSpan.FromMinutes(2)
})
{
    BaseAddress = new Uri("https://api.example.com")
});

builder.Services.AddSingleton<Lazy<ExpensiveAnalyticsClient>>(sp =>
    new Lazy<ExpensiveAnalyticsClient>(() =>
    {
        var logger = sp.GetRequiredService<ILogger<ExpensiveAnalyticsClient>>();
        return new ExpensiveAnalyticsClient(logger);
    }));

The PooledConnectionLifetime on SocketsHttpHandler rotates DNS entries without disposing the HttpClient instance. This avoids socket exhaustion (the same problem IHttpClientFactory solves, but without requiring per-request factory calls in a singleton context).

Fewer functions per app also helps. Each function adds discovery and registration overhead at startup.

4. Warmup trigger (Premium and Flex Consumption only)

On plans that support prewarmed instances, the warmup trigger lets you run initialization code before the instance takes real traffic. Force-construct your lazy dependencies, open database connections, and send a throwaway HTTP request to prime the connection pool. See the Premium section above for the code.

The warmup trigger only fires during scale-out. It does not fire on restarts, deployments, or slot swaps. One per app, and the function must be named Warmup (case-insensitive).

What works where

Not every strategy applies to every plan:

On Dedicated with Always On enabled, cold start is largely a non-issue because instances stay running. On Premium, the always-ready and prewarmed instances handle most of it. ReadyToRun and DI trimming matter most on the serverless plans where instances start from scratch.

Choosing a plan: the decision matrix

Which plan for which workload

Consumption if your traffic is sporadic, you don't need VNet access, and your users can tolerate a few seconds of cold start. Timer triggers, low-volume queue processors, webhook receivers that aren't latency-sensitive. If your bill on Consumption is under $10/month, there's no reason to move.

Flex Consumption if you need VNet integration or more than 200 instances, but still want scale-to-zero pricing. Evaluate this before jumping to Premium. The always-ready instances give you a dial between pure serverless and always-warm, and you pay only for what you configure. The constraints (one app per plan, no deployment slots, Linux only) are the deciding factors.

Premium EP1 if your HTTP endpoints are latency-sensitive and cold starts are genuinely costing you users or revenue. Also the right choice for functions that run continuously or need more than 10 minutes of execution time. If you're running multiple function apps, a shared Premium plan can amortize the $146/month minimum across them.

Dedicated if you already have an App Service plan with spare capacity, need more than 3 deployment slots, or your finance team requires a fixed monthly line item. Don't create a Dedicated plan specifically for Functions unless you have a concrete reason: the lack of event-driven scaling makes it the least "serverless" option.

The mistake to avoid

The most common path is: start on Consumption, hit cold start problems in production, jump straight to Premium at $146/month. Flex Consumption sits between them and didn't exist when many teams made that decision. If you're evaluating today, Flex Consumption with 1-2 always-ready instances gives you warm starts with scale-to-zero pricing for on-demand instances. Test it before committing to Premium's minimum.

Are you running Consumption or Premium in production right now?

Azure Functions Beyond the Basics
Continues from Azure Functions for .NET Developers (Parts 1-9)

Part 1: Running Azure Functions in Docker: Why and How

Part 2: Docker Pitfalls I Hit (And How to Avoid Them)

Part 3: Scaling Azure Functions: Consumption vs Premium vs Dedicated (this article)

DEV Community