Deep Dive into Azure Cosmos DB Physical Partitions
When using Azure Cosmos DB effectively, the most important aspect is designing the partition key. However, when we talk about “partitions” in Cosmos DB, there are actually two types: logical partitions and physical partitions.
This topic is briefly covered in the official documentation, so it’s a good idea to review it first. In most cases, physical partitions are fully managed by Azure, so you rarely need to think about them.
When designing your data model, the partition key corresponds to logical partitions. Both logical and physical partitions have their own limits, but in most cases, you only need to care about the 20 GB limit of logical partitions.
Although you usually don’t need to think about physical partitions, understanding them can be very helpful during design. So here’s a concise summary based on my experience.
Basics of Physical Partitions
As described in the official documentation, physical partitions are part of Cosmos DB’s internal implementation and represent the unit of compute that processes queries and operations.
Each physical partition has four replicas, and quorum-based consistency is implemented across them.
In globally distributed configurations, replication is performed at the physical partition level.
A container always has at least one physical partition. However, Azure automatically increases the number of physical partitions when:
- Storage exceeds 50 GB, or
- Provisioned throughput exceeds 10,000 RU/s
Throughput and Physical Partitions
Each physical partition has a maximum throughput of 10,000 RU/s. In practice, however, I’ve found that hitting the 50 GB storage limit happens more often than hitting the RU limit.
A common issue occurs when:
- A container originally has 4000 RU/s on a single physical partition
- Storage exceeds 50 GB
- The system splits into 2 physical partitions
Since RU is evenly distributed across physical partitions:
- 1 partition → 4000 RU
- 2 partitions → 2000 RU each
- 4 partitions → 1000 RU each
This effectively reduces the throughput available per partition, which can lead to sudden 429 Too Many Requests errors.
In such cases, you should follow the official guidance and adjust RU accordingly:
If you already know your data will exceed 50 GB, it can be a good strategy to provision enough RU upfront so that the required number of physical partitions are allocated early.
Also note that when new physical partitions are created, data replication occurs, temporarily consuming additional RU. This can lead to temporary throughput degradation.
Connection Modes and Physical Partitions
To improve performance, the C# and Java SDKs recommend using Direct mode.
As the name suggests, Direct mode connects directly to backend physical partitions, reducing overhead.
To make this work, the SDK must know which physical partition stores a given partition key.
If you’ve enabled Application Insights, you may have seen requests to pkranges during application startup. This API returns the mapping between partition keys and physical partitions:
This API is not intended for direct use in user code.
Because this mapping is fetched when a container is first accessed, the first request can be slower. However, recent versions of the .NET SDK provide an API to explicitly initialize this mapping:
By calling this during application warm-up, you can minimize overhead for subsequent requests.
Cross-Partition Queries and Physical Partitions
In Cosmos DB design, cross-partition queries should generally be avoided. However, this rule can be relaxed if all data fits within a single physical partition.
In the following documentation, the term “partition” refers to physical partitions:
If all logical partitions are contained within a single physical partition, indexes can be used effectively, making cross-partition queries relatively low-cost.
Problems arise when a container spans multiple physical partitions, as queries must be executed across all of them, increasing cost.
The most efficient cross-partition query is one that targets only partition keys within a single physical partition, but this is not something you can control directly from user code.
Instead, you should rely on SDK features. A good example is the ReadMany API, which efficiently handles cross-partition access.
That said, designing everything around cross-partition queries should always be avoided.
Change Feed and Physical Partitions
Physical partitions are closely related to the Change Feed feature.
While Change Feed guarantees ordering at the logical partition level, its actual processing is based on physical partitions.
This is easier to understand when looking at the Pull model.
In the Pull model, parallel processing is achieved using the FeedRange class, which is assigned per physical partition. By distributing FeedRange across multiple machines, you can process Change Feed in parallel.
As a result, even a single Change Feed Processor can only scale up to the number of physical partitions. In other words, processing is effectively single-threaded per physical partition, which simplifies writing logic that depends on ordering.
However, this also means that scale-out limits are reached relatively quickly.
To address this, you should minimize processing within each Change Feed Processor and instead create multiple smaller processors for different purposes.
Metrics and Physical Partitions
By now, you should have a solid understanding of physical partitions, so let’s move on to monitoring.
Even with careful partition key design, data can still become skewed toward specific physical partitions (so-called hot partitions).
To identify them, you can split metrics by PartitionKeyRangeId.
Recently, new metrics such as Physical Partition Throughput have been introduced, making it easier to troubleshoot partition-level issues.
You can also view these metrics in Cosmos DB Insights in Azure Monitor, which significantly improves observability.
Operations on Physical Partitions (Preview)
At Build 2022, several Cosmos DB updates were announced, including preview features that allow direct operations on physical partitions.
One of them is Merge partitions, which combines physical partitions.
Since RU is evenly distributed across physical partitions, fewer partitions can result in higher effective throughput per partition. Merging excess partitions helps optimize performance.
Another feature allows redistributing throughput across physical partitions.
No matter how carefully you design your partition key, data skew is inevitable in real-world scenarios. This feature allows you to assign more RU to hot partitions without increasing total RU, improving overall efficiency.
Conclusion
By understanding physical partitions, you can push Cosmos DB optimization further in terms of both performance and cost.
However, this is not mandatory knowledge for most use cases.
Rather than focusing on physical partitions from the beginning, the most important thing is to design a good partition key during data modeling.
Most physical partition-related optimizations can be handled later, so invest your effort where it matters most: your initial data model design.
Top comments (0)