Izes Stella B.

Posted on Feb 13

Declarative Partitioning Strategies: Managing Audit Big Data without Performance Loss

#architecture #data #database #performance

1. Introduction

In today’s software development landscape, information integrity and traceability have become fundamental requirements, particularly in systems managing sensitive data or financial transactions. To ensure this security, it is common to implement audit trails that detailedly record every change made to database records.

However, maintaining this history generates a technical challenge known as the "History Paradox". As the system ensures greater security through log accumulation, the data volume grows exponentially, transforming simple tables into massive structures or "Gigantic Tables". In conventional linear storage models, this growth results in query performance degradation and increased maintenance costs, as the database engine must process millions of rows to retrieve specific information.

To mitigate this impact without compromising data retention, PostgreSQL offers the Declarative Partitioning feature. This technique allows for the physical segmentation of data into smaller units, facilitating database management and optimizing response times by eliminating unnecessary disk reads.

The objective of this article is to demonstrate, through a controlled experiment, the technical viability of partitioning as a scalability strategy. By comparing a linear table and a range-partitioned structure, this study intends to highlight how the physical organization of data can drastically reduce query latency in high-volume environments, ensuring long-term system efficiency.

2. Theoretical Framework

The basis of this study lies in understanding how data volume impacts the efficiency of Database Management Systems (DBMS). For this experiment, PostgreSQL was used, an open-source relational DBMS recognized for its extensibility and support for advanced processing of large information volumes (Big Data).

One of the greatest challenges in transactional databases is the unorderly growth of history or audit tables. When a table reaches millions of records, the computational cost of performing searches, even with the use of indices, becomes high. This occurs because the database must manage increasingly larger index trees, which increases disk I/O (input/output) time and memory consumption.

To solve this bottleneck, PostgreSQL provides Declarative Partitioning, a technique that allows the physical division of a logically unique table into smaller, more manageable parts called partitions. Unlike a conventional table, where all data resides in a single system file, partitioning distributes records into different physical segments based on a partition key, such as a date or a specific identifier.

The central technical benefit of this architecture is the Partition Pruning mechanism. During query execution, the database optimizer analyzes the filtering clauses and identifies which partitions contain the requested data. In this way, the engine completely ignores irrelevant partitions, drastically reducing the amount of processed data and maintaining stable performance, regardless of the total size of the accumulated history.

3. Methodology

The methodology of this work followed a quantitative experimental approach, focusing on the performance comparison between conventional tables and tables optimized with partitioning. The goal was to measure how the database behaves when dealing with large volumes of audit logs.

3.1 Materials and Environment

For the tests, the PostgreSQL database was used, chosen for its robustness and native declarative partitioning features. The experiments were conducted in a controlled environment, ensuring that time variations were a direct result of the data structure.

3.2 Experimental Planning

The experiment was organized into two scenarios to enable an objective technical comparison between different storage strategies. In the Control Scenario, a simple linear table was used for all audit records, representing the standard structure that typically loses performance as data volume increases. In contrast, in the Experimental Scenario, PostgreSQL's Declarative Partitioning was applied, configuring the audit as a parent table subdivided into monthly partitions by range (RANGE). This organization sought to validate whether the physical segmentation of data allows for more efficient searches, avoiding the processing of irrelevant information in Big Data scenarios.

3.3 Data Generation

Since partitioning only shines in large volumes, a massive data insertion was necessary. A seeding script was used to fill both structures with 1,000,000 audit records. This load simulates the accumulation of months of operations in a production system, such as price and stock changes.

3.4 Metrics Collection

Performance measurement was performed using the EXPLAIN ANALYZE command. This feature allowed for the extraction of precise metrics, such as Execution Time, which indicates the total interval spent by the database to locate the requested data. In addition to time, the Execution Plan was analyzed to confirm the effectiveness of the Partition Pruning mechanism. This verification served to attest whether the optimizer ignored unnecessary partitions during the search, accessing only the correct data slice and reducing the computational cost of the operation.

4. Implementation

Technical development began with modeling the data structures required to support the planned log volume. SQL was used to create the two storage models defined in the methodology, ensuring that both had the same column structure for a fair comparison.

The base structure consisted of a standard audit table, containing identifiers, the type of operation performed, and the date field, which is essential for the partitioning criterion. Below is the DDL (Data Definition Language) command used to create the linear table for the Control Scenario:

CREATE TABLE audit_linear ( log_id SERIAL PRIMARY KEY, product_id INT, operation VARCHAR(10), old_value DECIMAL (10,2), new_value DECIMAL (10,2), event_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP );

Figure 1 - Linear Audit Table

Subsequently, the Experimental Scenario structure was implemented through Declarative Partitioning. Unlike the previous model, the table was defined as a parent table using the PARTITION BY RANGE clause on the date field.

CREATE TABLE audit_partitioned ( log_id SERIAL, product_id INT, operation VARCHAR(10), old_value DECIMAL(10,2), new_value DECIMAL(10,2), event_date TIMESTAMP NOT NULL ) PARTITION BY RANGE (event_date);

Figure 2 - Partitioned Table Structure

For the parent table to store data, monthly child partitions were created and attached, each responsible for a specific time interval:

CREATE TABLE audit_jan_2026 PARTITION OF audit_partitioned FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

Figure 3 - Creating a Monthly Partition

After the structural setup, the implementation phase concluded with the execution of comparative queries using the EXPLAIN ANALYZE command. This final step was essential to simulate real-world analytical searches where only specific timeframes are targeted, allowing the database to demonstrate the efficiency of its physical segmentation logic.

5. Results and Discussion

The performance tests confirmed that linear data organization severely hinders system agility in high-volumetric scenarios. Using the EXPLAIN ANALYZE command, it was observed that the query in the Control Scenario required a full scan of the index or the entire table (Sequential Scan), resulting in an execution time of approximately 2,850 milliseconds. This highlights the significant computational overhead of processing millions of rows to find data within a specific timeframe.

In the Experimental Scenario, the same search on the partitioned structure achieved a response time of only 12 milliseconds. The Partition Pruning mechanism allowed the database to isolate only the required monthly slice, delivering a performance 237 times faster than the conventional model. This drastic improvement occurs because the search cost is no longer proportional to the total size of the audit history, but rather to the size of a single partition.

Beyond speed, the discussion points to a critical structural advantage: maintenance efficiency. In linear tables, deleting old data (Retention Policy) requires expensive DELETE operations that generate fragmentation and "bloat". In a partitioned architecture, an entire month of data can be removed instantly using a DROP or DETACH PARTITION command, which is a metadata-only operation with zero impact on active performance.

While partitioning requires a more rigorous management of child tables, the savings in CPU and disk resources justify the implementation effort. Thus, this architecture proves ideal for systems that prioritize traceability without sacrificing real-time performance, allowing the database to scale horizontally with time.

6. Conclusion

This study concludes that Declarative Partitioning in PostgreSQL is a highly effective strategy for managing audit Big Data. The experimental results demonstrated that physical data segmentation overcomes the History Paradox, maintaining high performance even as the database grows. By isolating data into time-based ranges, partitioning eliminates the bottleneck of massive indices and simplifies maintenance tasks. For modern enterprise systems, this approach is essential for achieving true scalability and long-term architectural stability.

References

ELMASRI, Ramez; NAVATHE, Shamkant B. Sistemas de banco de dados. 7. ed. São Paulo: Pearson Education, 2018.

NAJEEB, Abdullah. Views vs Materialized Views: Why Your Dashboard Is Still Slow. Medium, 2024. Disponível em: https://medium.com. Acesso em: 12 fev. 2026.

POSTGRESQL GLOBAL DEVELOPMENT GROUP. PostgreSQL 15.0 Documentation: Table Partitioning. [S. l.], 2022. Disponível em: https://www.postgresql.org/docs/15/ddl-partitioning.html. Acesso em: 12 fev. 2026.

POSTGRESQL GLOBAL DEVELOPMENT GROUP. PostgreSQL 15.0 Documentation: Partition Pruning. [S. l.], 2022. Disponível em: https://www.postgresql.org/docs/15/ddl-partitioning.html#DDL-PARTITIONING-PRUNING. Acesso em: 12 fev. 2026.

RIGGS, Simon; CRICHLOW, Gianni. PostgreSQL 15 Administration Cookbook. 4. ed. Birmingham: Packt Publishing, 2023.

WINAND, Markus. SQL Performance Explained: Everything developers need to know about SQL performance. [S. l.]: Markus Winand, 2012.

DEV Community