Managing ERP Bill of Materials Denormalization in 3 Steps

#erp #database

Denormalization: The Hidden Cost of Bill of Materials in ERP

In Enterprise Resource Planning (ERP) systems, Bill of Materials (BOM) management is a critical issue. A BOM, which is a hierarchical list of a product's sub-components and their quantities, directly affects many operational processes, from production planning to inventory tracking. However, the complexity and frequently changing nature of this hierarchical structure also bring the risk of denormalization, which can lead to performance issues in databases. Especially in large-scale manufacturing systems, calculating the BOM from scratch for every query can create serious performance bottlenecks.

While working in a manufacturing ERP, we were constantly receiving complaints about delayed shipping reports and inventory accuracy. In our initial investigations, I realized that one of the underlying causes of these problems was the performance degradation experienced when querying BOM data. The BOM has a complex structure, starting from a main product and going down to the lowest level of raw materials. Calculating all the sub-parts required to create a product requested by the customer could sometimes take seconds, but under heavy system load, this time could extend to tens of seconds. This situation not only reduced operational efficiency but also eroded user confidence in the system.

Denormalization is a database design technique that allows data redundancy to improve query performance. Unlike normalization, it requires more effort to maintain data consistency. For hierarchical and frequently queried data like BOMs, denormalization, when applied correctly, can provide significant performance gains. However, incorrect or uncontrolled denormalization can lead to data inconsistency, update anomalies, and maintenance difficulties. In this post, I will share a three-step approach I used to manage BOM denormalization in ERP systems, which both enhances performance and preserves data consistency.

Step 1: Understanding the Existing Structure and Identifying the Performance Root Cause

Before starting any optimization work, a detailed analysis of the current situation is essential. Understanding how our BOM data is stored, which queries consume the most resources, and why these queries run so slowly is the first step to finding the right solution. At this stage, it is important to focus not only on theoretical knowledge but also on data obtained from the real production environment.

When we performed this analysis in a manufacturing company's ERP system, the most frequently used queries included those listing all components and their quantities required to complete a specific assembly batch. These queries were typically executed using recursive Common Table Expressions (CTE) or a loop-based approach in PostgreSQL. Looking at the EXPLAIN ANALYZE outputs of the queries, we saw that the cost of JOIN operations and repetitive calculations deep within the tree was quite high. Especially in cases where the same sub-component was used by multiple main products, the information for this component was fetched repeatedly in every query.

During this analysis, we also found that some BOMs were much deeper and more complex than expected. For example, when calculating the sub-components required for an assembly product, we sometimes had to go down 15-20 layers deep. The JOIN and filtering operations performed at each layer exponentially increased the total query time. Furthermore, inconsistencies or missing information in data entry could cause queries to run differently than expected and further degrade performance. In short, the root of the problem lay not only in the design but also in the data itself and the query logic.

ℹ️ Example Analysis Output (Simulated)

An EXPLAIN ANALYZE output like the one below showed the bottlenecks in the BOM query:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Sort  (cost=15000.23..16000.50 rows=500000 width=120)
   Sort Key: item_id, level
   ->  Recursive CTE  'bom_cte'
         ->  Initial Step:
               ->  Seq Scan on products p  (cost=0.00..100.50 rows=1 width=80)
                     Filter: (product_code = 'MAIN_PROD_XYZ')
         ->  Iterate 10 times
               ->  Hash Join  (cost=200.34..500.67 rows=50000 width=100)
                     Hash Cond: (bc.parent_item_id = p.item_id)
                     ->  Seq Scan on bom_components bc  (cost=0.00..150.80 rows=10000 width=60)
                     ->  Hash  (cost=180.90..180.90 rows=1000 width=40)
                           ->  CTE Scan on bom_cte  (cost=0.00..100.00 rows=1000 width=40)

This output showed that the recursive query was repeated 10 times, processing a large amount of data in each iteration. Especially the Hash Join and Seq Scan operations were the main factors negatively affecting performance.

In this first step, it is important to focus not only on technical details but also on business processes. Determining which levels of the BOM are queried more frequently, which data fields are critical, and which business processes are directly affected by the speed of these queries ensures that optimization efforts are directed to the right places.

Step 2: Defining and Implementing the Denormalization Strategy

Once the performance bottlenecks of the existing structure have been identified, it's time to define the denormalization strategy. This is the process of reducing the number and complexity of queries in database design by combining or repeating related data in a single table. In the context of BOMs, this usually means storing sub-component information in the same table as the parent component or in a separate "flattened" table.

In our manufacturing ERP, we decided to move the BOM to a flattened table. This new table stored each component-sub-component relationship in a separate row. However, we didn't just store the relationships; we also added the total quantity required for each sub-component for the main product, and even some basic inventory information, to this table. This way, to see all components and quantities of a main product, we only needed to query this single table.

The most important point to consider when defining a denormalization strategy is to ensure data consistency. Any change in the main BOM table should be supported by mechanisms that automatically update the denormalized tables. In our case, we ensured this consistency by performing PostgreSQL's TRUNCATE and INSERT operations within a transaction, or for more complex scenarios, by using the LISTEN/NOTIFY mechanism, or by making updates at regular intervals with a simple cron job. For example, when a sub-component in a BOM changes, this change is first recorded in the main bom_components table, and then this change is reflected in the denormalized table via a trigger or a background service.

⚠️ Risk of Update Anomalies

The biggest risk of denormalization is update anomalies. If data in the main table is updated but its counterpart in the denormalized table is not, the system starts working with incorrect data. Therefore, the robustness of update mechanisms and error management are critically important. One of the approaches we implemented was to add a validation step after each update.

When defining this strategy, we first identified which data truly needed to be repeated. Instead of denormalizing all fields, we selected the most frequently queried fields that had the greatest impact on performance. This helped us maximize performance gains while minimizing data redundancy. For example, information such as the unit price or stock codes of each sub-component, which do not change frequently and are often used in queries, were denormalized.

Step 3: Continuous Monitoring, Optimization, and Maintenance

Implementing a denormalization strategy is only one part of the process. Ensuring data consistency, continuously monitoring performance, and adapting to the system's evolving needs over time are vital for long-term success. ERP systems are dynamic environments, and BOMs can evolve over time. Therefore, it is necessary to regularly evaluate the effects of denormalization and make necessary adjustments.

We defined a set of metrics to monitor the denormalized tables and the main BOM structure. These included the number of rows in the denormalized table, the data inconsistency rate relative to the main table (if any), and the average and worst-case response times of BOM queries. We also regularly tracked the most used queries and their performance using tools like PostgreSQL's pg_stat_statements. This monitoring mechanism allowed us to quickly intervene when any performance degradation or data inconsistency symptoms appeared.

For example, in the past month, with the introduction of a new product line, the average depth of BOMs had increased. This increase slightly reduced the performance of some complex queries in the denormalized table. After noticing the situation, we optimized the relevant queries and, in addition to the denormalized table, added another structure that stored a specially prepared "cached" query result for these new deep trees. This preserved overall performance while meeting the needs of the new product line.

💡 Automated Maintenance and Improvement

Regular execution of VACUUM and ANALYZE operations in PostgreSQL directly affects the performance of both main and denormalized tables. Optimizing autovacuum settings and ensuring regular ANALYZE operations are critical for maintaining the long-term effectiveness of denormalization. In our system, we had precisely adjusted the autovacuum parameters according to data change rates.

Denormalization is not a one-time process; it is a continuous improvement process. Regularly reviewing the data model, queries, and maintenance procedures is key to sustaining the operational efficiency and reliability of the ERP system. This three-step approach helped us maximize performance gains and maintain the highest level of data consistency when managing BOM denormalization. As a result, shipping reports are now generated on time, and complaints about inventory accuracy have largely disappeared.

Conclusion: Enhancing ERP Performance with Controlled Denormalization

BOM denormalization in ERP systems is a powerful technique that can significantly increase operational efficiency when applied correctly. However, the potential risks of this method should not be overlooked. To avoid problems such as data inconsistency, update anomalies, and maintenance difficulties, the denormalization strategy must be carefully planned, implemented, and continuously monitored.

In the first step, a deep understanding of the existing system's performance bottlenecks and data structure forms the basis for determining the correct denormalization strategy. In the second step, the chosen strategy (e.g., moving the BOM to a flattened table) must be meticulously implemented, and mechanisms to ensure data consistency must be established. Finally, continuous monitoring, tracking of performance metrics, and proactive maintenance guarantee the long-term success of denormalization.

The three-step approach I shared in this post not only helped me solve a real problem I encountered in a large-scale manufacturing ERP but also outlined the general principles of this process. Controlled denormalization, when used correctly, can make your ERP systems run faster, more efficiently, and more reliably. When applying these techniques in your own systems, always remember to consider your specific needs and data structure.