Table Partitioning in S4 HANA

Trupti Raikar — Thu, 25 Dec 2025 18:24:54 +0000

Data is spread across multiple servers in a multiple-host SAP S/4 HANA system to enhance scalability and performance. HANA allows for various data distribution methods. Each index server is usually assigned to a separate host . Partitioned tables are divided into multiple partitions, each of which may be kept on a separate index server. The load can be balanced by allocating different tables to various servers. By enabling the same table to be present on several
servers, table replication speeds up joins and queries. The various partitioning methods include

Hash Partitioning
Hash partitioning is used to get around the 2 billion row limit and divide rows evenly among partitions for load balancing. Each distinct key value is mapped to one partition based on the hash result, ensuring even distribution of rows . The primary purposes of hash partitioning are to: properly distribute data among hosts in a multiple-node system. Boosts the execution of parallel queries by having each node scan its local partition. Because every partition manages distinct data slices, there is less competition for inserts and updates. Maintain manageable partition sizes while enabling large-table scalability.

Query for Hash partitioning,
CREATE COLUMN TABLE ZSALES_DATA (
SALES_ID BIGINT,
CUSTOMER_ID INT,
REGION NVARCHAR(20),
AMOUNT DECIMAL(15,2)
)
PARTITION BY HASH (CUSTOMER_ID) PARTITIONS 8;
Here, each CUSTOMER_ID is hashed internally by HANA, which then allocates the row to one of 8 th partitions.

Round-Robin Partitioning
In this case, rows are evenly and sequentially divided among partitions in a round-robin fashion. For example, first row in partition 1, second row in partition 2, third row in partition 3, fourth row in partition 1, and so forth. In situations where there is no natural partitioning key (no suitable column for hash or range), round-robin partitioning works best. When it comes to bulk loads or
analytics, you want parallelism and balanced data. There is no selective filtering; query patterns search through all tables. Because the rows are spread out across all partitions, it is inefficient for key-filtering queries.

CREATE COLUMN TABLE ZSALES_ROBIN (
SALES_ID BIGINT,
REGION NVARCHAR (20),
AMOUNT DECIMAL (15,2)
)PARTITION BY ROUNDROBIN PARTITIONS 4;

Range partitioning: it divides a large table into sub-tables (partitions) based on value ranges of a specific column, typically a date, numeric, or sequence field. Each partition stores rows that fall within a defined range. For example, data by year, month, or region code range .

CREATE COLUMN TABLE ZSALES_RANGE (
SALES_ID BIGINT,
FISCAL_YEAR INT,
REGION NVARCHAR(20),
AMOUNT DECIMAL(15,2)
)
PARTITION BY RANGE (FISCAL_YEAR) (
PARTITION P2019 VALUES LESS THAN (2020),
PARTITION P2020 VALUES LESS THAN (2021),
PARTITION P2021 VALUES LESS THAN (2022),
PARTITION P2022 VALUES LESS THAN (2023),
PARTITION PMAX VALUES LESS THAN (MAXVALUE)
);

Best practices in partitioning

Range partitioning is the default choice for time-series or housekeeping data, where most queries filter by a chronological or sequential column such as CALMONTH, DOC_DATE, or LOAD_TS. By using this method, the database can perform partitioning, which significantly improves performance by examining only the relevant date ranges when executing queries. Because entire partitions can be merged or deleted without affecting active ones, this also makes lifecycle operations, such as archiving or dropping old data, easier. However, a composite range-hash strategy is advised if recent time ranges accumulate disproportionate data (hot partitions). This
involves dividing by time range first, followed by hash sub partitioning for even load distribution.

Hash partitioning is used for key-based access patterns, such as when queries frequently filter on high-cardinality columns like CUSTOMER_ID or DOC_ID. It guarantees that data is distributed
evenly across nodes or partitions. This method helps scale workloads horizontally by supporting parallel processing during large joins and scans.

Rows are distributed evenly and sequentially in a circular pattern across partitions using round- robin partitioning, which ignores column values. Despite ensuring balance, it is independent of
data, which means that it works best for staging, temporary, or transient tables that are used during bulk data loads or ETL procedures that frequently involve full scans. It should be avoided,
nevertheless, for selective queries or OLTP-style lookups because related rows might be located in different partitions, which would result in increased access overhead and worse performance for key-based reads

How is Column Store fetch faster than Row Store?

Trupti Raikar — Wed, 24 Dec 2025 18:06:53 +0000

Consider a table called ORDERS that has 100,000,000 rows having columns: Amount (8 bytes decimal), OrderDate (8 bytes), Country (dictionary-encoded; code 1 byte), OrderID (4 bytes int), CustomerID (4 bytes int), and five additional miscellaneous columns with an average of 8 bytes each. In a row store, the approximate row width is ≈ 4+4+1+8+8 + (5×8) = 69 bytes, which rounds up to 72 bytes.

Ex: SELECT SUM(Amount) FROM ORDERS WHERE Country = 'IN'. 

If the executable query is 'IN' = 5% of rows (selectivity 5%), the cost of fetching data from a row store table is as follows:

Case A:Complete table scan: Bytes read ≈ rows × row_size = 100,000,000 × 72B = 7.2 GB.However, only 11% of that is required by the query, which only requires one column (Amount). Thus, nearly 90% of what is read is meaningless. It also means more CPU work and slower performance.

Case B: Applying a B-tree index to the Country. Using the index, find the 5% matching = 5,000,000 row IDs. The base table's bytes touched are approximately 360 MB, or 5,000,000 × 72B. In addition to the CPU overhead for random pointers, index pages are read in tens to hundreds of MB. Even though there are fewer total bytes than a full scan, performance can still lag behind a tight sequential column scan due to random access + cache misses.

Data fetch cost in a column store table: Each column is stored independently in a column-based database. Therefore, it only reads the necessary columns (Country and Amount) for the same query. Bytes read ≈ rows × row_size = Amount column → 100 million × 8 bytes = 800 MB + Country column → 100 million × 1 byte = 100 MB.

To figure out which rows belong to "IN," it first scans the country (100 MB). After that, it adds up only the rows that are marked after scanning the Amount column up to 800 MB. The CPU operates very efficiently because this data is sequentially stored and processed in batches. Parts of the data may even be ignored if it is compressed or grouped, which would lower the read size to less than 900 MB. Therefore, a column store requires only about 900 MB of tightly packed, easily readable data rather than 7.2 GB as a row store does. This makes it approximately eight times faster and much more CPU-friendly.

DEV Community: Trupti Raikar

Table Partitioning in S4 HANA

How is Column Store fetch faster than Row Store?