Tatsuro Shibamura for Polymind

Posted on Mar 31 • Edited on Apr 24 • Originally published at blog.shibayan.jp

Deep Dive into Azure Cosmos DB Physical Partitions

#azure #cosmosdb

When using Azure Cosmos DB effectively, the most important aspect is designing the partition key. However, when we talk about “partitions” in Cosmos DB, there are actually two types: logical partitions and physical partitions.

This topic is briefly covered in the official documentation, so it’s a good idea to review it first. In most cases, physical partitions are fully managed by Azure, so you rarely need to think about them.

Partitioning and horizontal scaling - Azure Cosmos DB | Microsoft Learn

Learn about partitioning, logical, physical partitions in Azure Cosmos DB, best practices when choosing a partition key, and how to manage logical partitions.

learn.microsoft.com

When designing your data model, the partition key corresponds to logical partitions. Both logical and physical partitions have their own limits, but in most cases, you only need to care about the 20 GB limit of logical partitions.

Although you usually don’t need to think about physical partitions, understanding them can be very helpful during design. So here’s a concise summary based on my experience.

Basics of Physical Partitions

As described in the official documentation, physical partitions are part of Cosmos DB’s internal implementation and represent the unit of compute that processes queries and operations.

Each physical partition has four replicas, and quorum-based consistency is implemented across them.

Global Distribution With - Under the Hood - Azure Cosmos DB | Microsoft Learn

This article provides technical details relating to global distribution of Azure Cosmos DB

learn.microsoft.com

In globally distributed configurations, replication is performed at the physical partition level.

A container always has at least one physical partition. However, Azure automatically increases the number of physical partitions when:

Storage exceeds 50 GB, or
Provisioned throughput exceeds 10,000 RU/s

Throughput and Physical Partitions

Each physical partition has a maximum throughput of 10,000 RU/s. In practice, however, I’ve found that hitting the 50 GB storage limit happens more often than hitting the RU limit.

A common issue occurs when:

A container originally has 4000 RU/s on a single physical partition
Storage exceeds 50 GB
The system splits into 2 physical partitions

Since RU is evenly distributed across physical partitions:

1 partition → 4000 RU
2 partitions → 2000 RU each
4 partitions → 1000 RU each

This effectively reduces the throughput available per partition, which can lead to sudden 429 Too Many Requests errors.

In such cases, you should follow the official guidance and adjust RU accordingly:

Best Practices for Scaling Provisioned Throughput - Azure Cosmos DB | Microsoft Learn

Learn about best practices for scaling manual and autoscale provisioned throughput for databases and containers.

learn.microsoft.com

If you already know your data will exceed 50 GB, it can be a good strategy to provision enough RU upfront so that the required number of physical partitions are allocated early.

Also note that when new physical partitions are created, data replication occurs, temporarily consuming additional RU. This can lead to temporary throughput degradation.

Connection Modes and Physical Partitions

To improve performance, the C# and Java SDKs recommend using Direct mode.

SQL SDK Connectivity Modes - Azure Cosmos DB | Microsoft Learn

Learn about the different connectivity modes available on the Azure Cosmos DB SQL SDKs.

learn.microsoft.com

As the name suggests, Direct mode connects directly to backend physical partitions, reducing overhead.

To make this work, the SDK must know which physical partition stores a given partition key.

If you’ve enabled Application Insights, you may have seen requests to pkranges during application startup. This API returns the mapping between partition keys and physical partitions:

Get Partition Key Ranges - Azure Cosmos DB REST API | Microsoft Learn

Get partition key ranges REST API syntax. Request and response headers, body, status codes and examples.

learn.microsoft.com

This API is not intended for direct use in user code.

Because this mapping is fetched when a container is first accessed, the first request can be slower. However, recent versions of the .NET SDK provide an API to explicitly initialize this mapping:

CosmosClient.CreateAndInitializeAsync Method (Microsoft.Azure.Cosmos) - Azure for .NET Developers | Microsoft Learn

Creates a new CosmosClient with the account endpoint URI string and TokenCredential. In addition to that it initializes the client with containers provided i.e The SDK warms up the caches and connections before the first call to the service is made. Use this to obtain lower latency while startup of your application. CosmosClient is thread-safe. Its recommended to maintain a single instance of CosmosClient per lifetime of the application which enables efficient connection management and performance. Please refer to the performance guide.

learn.microsoft.com

By calling this during application warm-up, you can minimize overhead for subsequent requests.

Cross-Partition Queries and Physical Partitions

In Cosmos DB design, cross-partition queries should generally be avoided. However, this rule can be relaxed if all data fits within a single physical partition.

In the following documentation, the term “partition” refers to physical partitions:

Query a Container - Azure Cosmos DB | Microsoft Learn

Learn how to query containers in Azure Cosmos DB by using both in-partition and cross-partition queries.

learn.microsoft.com

If all logical partitions are contained within a single physical partition, indexes can be used effectively, making cross-partition queries relatively low-cost.

Problems arise when a container spans multiple physical partitions, as queries must be executed across all of them, increasing cost.

The most efficient cross-partition query is one that targets only partition keys within a single physical partition, but this is not something you can control directly from user code.

Instead, you should rely on SDK features. A good example is the ReadMany API, which efficiently handles cross-partition access.

That said, designing everything around cross-partition queries should always be avoided.

Change Feed and Physical Partitions

Physical partitions are closely related to the Change Feed feature.

While Change Feed guarantees ordering at the logical partition level, its actual processing is based on physical partitions.

This is easier to understand when looking at the Pull model.

In the Pull model, parallel processing is achieved using the FeedRange class, which is assigned per physical partition. By distributing FeedRange across multiple machines, you can process Change Feed in parallel.

Change Feed Pull Model - Azure Cosmos DB | Microsoft Learn

Learn how to use the Azure Cosmos DB change feed pull model to read the change feed. Understand the differences between the change feed pull model and the change feed processor.

learn.microsoft.com

FeedRange Class (Microsoft.Azure.Cosmos) - Azure for .NET Developers | Microsoft Learn

Represents a unit of feed consumption that can be used as unit of parallelism.

learn.microsoft.com

As a result, even a single Change Feed Processor can only scale up to the number of physical partitions. In other words, processing is effectively single-threaded per physical partition, which simplifies writing logic that depends on ordering.

However, this also means that scale-out limits are reached relatively quickly.

To address this, you should minimize processing within each Change Feed Processor and instead create multiple smaller processors for different purposes.

Metrics and Physical Partitions

By now, you should have a solid understanding of physical partitions, so let’s move on to monitoring.

Even with careful partition key design, data can still become skewed toward specific physical partitions (so-called hot partitions).

To identify them, you can split metrics by PartitionKeyRangeId.

Monitor Normalized Request Units - Azure Cosmos DB | Microsoft Learn

Learn how to monitor the normalized request unit usage of an operation in Azure Cosmos DB. Owners of an Azure Cosmos DB account can understand which operations are consuming more request units.

learn.microsoft.com

Recently, new metrics such as Physical Partition Throughput have been introduced, making it easier to troubleshoot partition-level issues.

You can also view these metrics in Cosmos DB Insights in Azure Monitor, which significantly improves observability.

Operations on Physical Partitions (Preview)

At Build 2022, several Cosmos DB updates were announced, including preview features that allow direct operations on physical partitions.

One of them is Merge partitions, which combines physical partitions.

Merge partitions (preview) - Azure Cosmos DB | Microsoft Learn

Reduce the number of physical partitions used for your container with the merge capability in Azure Cosmos DB.

learn.microsoft.com

Since RU is evenly distributed across physical partitions, fewer partitions can result in higher effective throughput per partition. Merging excess partitions helps optimize performance.

Another feature allows redistributing throughput across physical partitions.

Redistribute Throughput Across Partitions (Preview) - Azure Cosmos DB | Microsoft Learn

Learn how to redistribute throughput across partitions in Azure Cosmos DB to optimize performance. To improve partition performance, follow these step-by-step instructions and best practices.

learn.microsoft.com

No matter how carefully you design your partition key, data skew is inevitable in real-world scenarios. This feature allows you to assign more RU to hot partitions without increasing total RU, improving overall efficiency.

Conclusion

By understanding physical partitions, you can push Cosmos DB optimization further in terms of both performance and cost.

However, this is not mandatory knowledge for most use cases.

Rather than focusing on physical partitions from the beginning, the most important thing is to design a good partition key during data modeling.

Most physical partition-related optimizations can be handled later, so invest your effort where it matters most: your initial data model design.

DEV Community

Deep Dive into Azure Cosmos DB Physical Partitions

Partitioning and horizontal scaling - Azure Cosmos DB | Microsoft Learn

Basics of Physical Partitions

Global Distribution With - Under the Hood - Azure Cosmos DB | Microsoft Learn

Throughput and Physical Partitions

Best Practices for Scaling Provisioned Throughput - Azure Cosmos DB | Microsoft Learn

Connection Modes and Physical Partitions

SQL SDK Connectivity Modes - Azure Cosmos DB | Microsoft Learn

Get Partition Key Ranges - Azure Cosmos DB REST API | Microsoft Learn

CosmosClient.CreateAndInitializeAsync Method (Microsoft.Azure.Cosmos) - Azure for .NET Developers | Microsoft Learn

Cross-Partition Queries and Physical Partitions

Query a Container - Azure Cosmos DB | Microsoft Learn

Change Feed and Physical Partitions

Change Feed Pull Model - Azure Cosmos DB | Microsoft Learn

FeedRange Class (Microsoft.Azure.Cosmos) - Azure for .NET Developers | Microsoft Learn

Metrics and Physical Partitions

Monitor Normalized Request Units - Azure Cosmos DB | Microsoft Learn

Operations on Physical Partitions (Preview)

Merge partitions (preview) - Azure Cosmos DB | Microsoft Learn

Redistribute Throughput Across Partitions (Preview) - Azure Cosmos DB | Microsoft Learn

Conclusion

Top comments (0)