Table Partitioning in S4 HANA

#architecture #database #performance

Data is spread across multiple servers in a multiple-host SAP S/4 HANA system to enhance scalability and performance. HANA allows for various data distribution methods. Each index server is usually assigned to a separate host . Partitioned tables are divided into multiple partitions, each of which may be kept on a separate index server. The load can be balanced by allocating different tables to various servers. By enabling the same table to be present on several
servers, table replication speeds up joins and queries. The various partitioning methods include

Hash Partitioning
Hash partitioning is used to get around the 2 billion row limit and divide rows evenly among partitions for load balancing. Each distinct key value is mapped to one partition based on the hash result, ensuring even distribution of rows . The primary purposes of hash partitioning are to: properly distribute data among hosts in a multiple-node system. Boosts the execution of parallel queries by having each node scan its local partition. Because every partition manages distinct data slices, there is less competition for inserts and updates. Maintain manageable partition sizes while enabling large-table scalability.

Query for Hash partitioning,
CREATE COLUMN TABLE ZSALES_DATA (
SALES_ID BIGINT,
CUSTOMER_ID INT,
REGION NVARCHAR(20),
AMOUNT DECIMAL(15,2)
)
PARTITION BY HASH (CUSTOMER_ID) PARTITIONS 8;
Here, each CUSTOMER_ID is hashed internally by HANA, which then allocates the row to one of 8 th partitions.

Round-Robin Partitioning
In this case, rows are evenly and sequentially divided among partitions in a round-robin fashion. For example, first row in partition 1, second row in partition 2, third row in partition 3, fourth row in partition 1, and so forth. In situations where there is no natural partitioning key (no suitable column for hash or range), round-robin partitioning works best. When it comes to bulk loads or
analytics, you want parallelism and balanced data. There is no selective filtering; query patterns search through all tables. Because the rows are spread out across all partitions, it is inefficient for key-filtering queries.

CREATE COLUMN TABLE ZSALES_ROBIN (
SALES_ID BIGINT,
REGION NVARCHAR (20),
AMOUNT DECIMAL (15,2)
)PARTITION BY ROUNDROBIN PARTITIONS 4;

Range partitioning: it divides a large table into sub-tables (partitions) based on value ranges of a specific column, typically a date, numeric, or sequence field. Each partition stores rows that fall within a defined range. For example, data by year, month, or region code range .

CREATE COLUMN TABLE ZSALES_RANGE (
SALES_ID BIGINT,
FISCAL_YEAR INT,
REGION NVARCHAR(20),
AMOUNT DECIMAL(15,2)
)
PARTITION BY RANGE (FISCAL_YEAR) (
PARTITION P2019 VALUES LESS THAN (2020),
PARTITION P2020 VALUES LESS THAN (2021),
PARTITION P2021 VALUES LESS THAN (2022),
PARTITION P2022 VALUES LESS THAN (2023),
PARTITION PMAX VALUES LESS THAN (MAXVALUE)
);

Best practices in partitioning

Range partitioning is the default choice for time-series or housekeeping data, where most queries filter by a chronological or sequential column such as CALMONTH, DOC_DATE, or LOAD_TS. By using this method, the database can perform partitioning, which significantly improves performance by examining only the relevant date ranges when executing queries. Because entire partitions can be merged or deleted without affecting active ones, this also makes lifecycle operations, such as archiving or dropping old data, easier. However, a composite range-hash strategy is advised if recent time ranges accumulate disproportionate data (hot partitions). This
involves dividing by time range first, followed by hash sub partitioning for even load distribution.

Hash partitioning is used for key-based access patterns, such as when queries frequently filter on high-cardinality columns like CUSTOMER_ID or DOC_ID. It guarantees that data is distributed
evenly across nodes or partitions. This method helps scale workloads horizontally by supporting parallel processing during large joins and scans.

Rows are distributed evenly and sequentially in a circular pattern across partitions using round- robin partitioning, which ignores column values. Despite ensuring balance, it is independent of
data, which means that it works best for staging, temporary, or transient tables that are used during bulk data loads or ETL procedures that frequently involve full scans. It should be avoided,
nevertheless, for selective queries or OLTP-style lookups because related rows might be located in different partitions, which would result in increased access overhead and worse performance for key-based reads

DEV Community

Table Partitioning in S4 HANA

Top comments (0)