In a gbase database handling tens of billions of rows, smart partitioning is no longer optional — it's mandatory. This guide covers partition type selection, pruning validation, daily maintenance, and common pitfalls to help you build a solid partitioning strategy for your GBASE cluster.
Supported Partitioning Types
GBase 8a offers four partitioning methods:
- RANGE – split by column value intervals (ideal for time‑series data)
- LIST – split by discrete enumerated values (e.g., region, status)
- HASH – distribute evenly via hash modulus
- KEY – similar to HASH, but the database chooses the hash function
In practice, RANGE partitioning on a date column is the most common choice.
RANGE Partitioning
Monthly Partitioning Example
CREATE TABLE orders (
order_id BIGINT NOT NULL,
customer_id BIGINT NOT NULL,
order_date DATE NOT NULL,
amount DECIMAL(12,2),
status VARCHAR(20),
PRIMARY KEY (order_id, order_date) -- partition key must be part of PK
)
PARTITION BY RANGE (TO_DAYS(order_date)) (
PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
PARTITION p202406 VALUES LESS THAN (TO_DAYS('2024-07-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE -- catch-all
);
Key rules:
- The partition key column must be included in the primary key or unique key, otherwise the DDL will fail.
- For daily partitioning, keep the total partition count under 1024 to avoid metadata overhead.
LIST Partitioning Example
CREATE TABLE sales_regional (
sale_id BIGINT NOT NULL,
region VARCHAR(20) NOT NULL,
sale_date DATE NOT NULL,
amount DECIMAL(12,2),
PRIMARY KEY (sale_id, region)
)
PARTITION BY LIST COLUMNS (region) (
PARTITION p_north VALUES IN ('北京','天津','河北','山西','内蒙古'),
PARTITION p_east VALUES IN ('上海','江苏','浙江','安徽','山东'),
PARTITION p_south VALUES IN ('广东','广西','海南','福建'),
PARTITION p_central VALUES IN ('湖北','湖南','河南','江西'),
PARTITION p_west VALUES IN ('四川','重庆','云南','贵州','西藏'),
PARTITION p_other VALUES IN ('其他')
);
Partition Pruning
Pruning kicks in only when the WHERE clause references the partition key. Verify it with EXPLAIN PARTITIONS:
EXPLAIN PARTITIONS
SELECT * FROM orders
WHERE order_date BETWEEN '2024-03-01' AND '2024-03-31';
- ✅ If only
p202403appears in thepartitionscolumn, pruning works. - ❌ If all partitions are listed, pruning is disabled — check your query.
Common pruning killers:
- Applying a function:
WHERE MONTH(order_date) = 3❌ - Implicit type conversion:
WHERE order_date = 20240301❌ - Mixing partition-key and non‑key conditions with
OR❌ — considerUNION ALLinstead.
Day‑to‑Day Partition Maintenance
Add a New Partition
ALTER TABLE orders
REORGANIZE PARTITION p_future INTO (
PARTITION p202407 VALUES LESS THAN (TO_DAYS('2024-08-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Always update statistics immediately!
ANALYZE TABLE orders;
Drop Historical Partitions (orders of magnitude faster than DELETE)
ALTER TABLE orders DROP PARTITION p202101, p202102, p202103;
Merge or Split Partitions
-- Merge two small partitions
ALTER TABLE orders
REORGANIZE PARTITION p202401, p202402 INTO (
PARTITION p2024_q1 VALUES LESS THAN (TO_DAYS('2024-03-01'))
);
-- Split a large partition
ALTER TABLE orders
REORGANIZE PARTITION p_future INTO (
PARTITION p202407 VALUES LESS THAN (TO_DAYS('2024-08-01')),
PARTITION p202408 VALUES LESS THAN (TO_DAYS('2024-09-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Common Mistakes
- Too many partitions: Over 500 partitions cause metadata bloat and slower plan generation. Aim for 50–200 partitions.
-
Low‑cardinality partition key: Using a column like
status(3 values) causes severe data skew. -
No catch‑all partition: Without
MAXVALUE, out‑of‑range inserts fail immediately. -
Skipping
ANALYZE TABLEafter partition changes: The optimizer sees empty partitions and may choose poor plans.
Automate Future Partition Creation
A simple script added to crontab ensures you never forget to create next month's partition:
#!/bin/bash
NEXT_MONTH=$(date -d "+1 month" +%Y%m)
NEXT_MONTH_START=$(date -d "+1 month" +%Y-%m-01)
NEXT_NEXT_MONTH_START=$(date -d "+2 month" +%Y-%m-01)
PARTITION_NAME="p${NEXT_MONTH}"
mysql -h 192.168.1.10 -P 5258 -u gbase -pyour_password sales_db <<EOF
ALTER TABLE orders
REORGANIZE PARTITION p_future INTO (
PARTITION ${PARTITION_NAME} VALUES LESS THAN (TO_DAYS('${NEXT_NEXT_MONTH_START}')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
ANALYZE TABLE orders;
EOF
Crontab: 0 1 1 * * /bin/bash /usr/local/scripts/auto_add_partition.sh
Design Decision Flow
- Estimate data size: below 10 million rows → no partition needed; above 10 million → consider partitioning.
- Date column? → RANGE. Discrete categories? → LIST. No obvious pattern? → HASH.
- Granularity: daily inserts > 5 million → partition by day; 1–5 million → by month; < 1 million → by quarter or year.
- Ensure the partition key is in the primary key, then validate pruning with
EXPLAIN.
A well‑designed partitioning layout is the foundation of query performance in a gbase database. Spend the time to get it right at design time, and you'll save yourself a world of trouble later.
Top comments (0)