The Anatomy of ERP Bill of Materials Denormalization: Performance and…

#erp #sistemmimarisi #software

The Anatomy of ERP Bill of Materials Denormalization: A Guide for Performance and Scalability

In a manufacturing ERP, after hours of analysis, we finally found the root cause of a performance bottleneck in the Bill of Materials. The situation was, in fact, due to complications arising from denormalization, a fairly common problem. In this post, I will delve into the anatomy of Bill of Materials denormalization in ERP systems; I will discuss the concrete problems I encountered, their causes, and solutions in detail. Drawing from my experiences, I will explain ways to make this complex structure more performant and scalable.

Such denormalizations are initially done to reduce data redundancy and improve query performance. However, over time and as data volume increases, these strategies can lead to unexpected performance issues. Especially hierarchical and frequently queried data structures like the Bill of Materials are among the first to feel the negative effects of denormalization. In this post, by sharing my own cases and the lessons learned from them, I aim to guide technology professionals facing similar situations.

Bill of Materials Structures and Fundamental Problems

A Bill of Materials (BOM) is a hierarchical structure that defines the components, sub-components, and their quantities for a main product. Traditionally, various methods are used to represent these structures in a database: adjacency list, nested set, materialized path, and closure table. Each has its own advantages and disadvantages. However, a common situation in ERP systems is the "denormalization" of these structures to meet performance requirements.

Denormalization is usually done directly for query performance. For example, to retrieve all sub-components of a main product in a single query, one might copy part of the sub-component information into the main product table. While this may seem fast at first, ensuring data consistency and managing updates can become a nightmare. When a sub-component changes, this information needs to be updated everywhere it was copied. This increases write time and processing load.

ℹ️ The Fundamental Paradox of Denormalization

Denormalization can improve read performance but severely strain write performance and data consistency. Especially in complex hierarchies, striking this balance is a significant engineering problem.

While developing a manufacturing ERP, Bill of Materials queries for supply chain and production planning modules needed to return results within seconds. However, in the system's old architecture, a separate query had to be run for each sub-component. This caused incredible slowness for a product with thousands of components. For a main product with over 1500 components, just listing the Bill of Materials could take 45 seconds. This was an unacceptable situation.

Denormalization Techniques and Encountered Challenges

One common technique used in Bill of Materials denormalization is to create a "flat" table containing all related sub-components and their quantities for each product. This table is linked to the main product table, and queries are performed directly on this flat table. For example, instead of a products table, a table named product_components is created, where each row represents a component of a main product.

In our example, the product_components table was structured as follows:

CREATE TABLE product_components (
    id SERIAL PRIMARY KEY,
    main_product_id INT NOT NULL, -- ID of the main product
    component_product_id INT NOT NULL, -- ID of the component
    quantity DECIMAL(10, 4) NOT NULL, -- Required quantity per main product
    component_level INT NOT NULL, -- Level in the hierarchy (0 for main product, 1 for first-level sub-component, etc.)
    -- Other relevant fields: unit, description, etc.
);

This structure significantly sped up queries like "list all components of the main product." All details would come with a single SELECT * FROM product_components WHERE main_product_id = X query. However, the real problem was ensuring the freshness of this data. When a component's quantity changed or a new component was added, all relevant rows in the product_components table needed to be updated. This process became very costly depending on the complexity of the Bill of Materials.

⚠️ Denormalization Update Cost

A single component change in a product_components table can lead to thousands of rows being updated. This creates a significant I/O load on the database and negatively impacts system response time.

In one case, a production line engineer wanted to change the quantity of a component for a main product. This was an operation that should normally take only a few seconds. However, due to the denormalized product_components table, this update operation took approximately 15 minutes. This was because the system was trying to update the quantity in all places where this component appeared across all variations of the main product. This situation eroded user confidence in the system.

Performance Improvement Strategies

I followed several strategies to cope with the problems caused by denormalization. First, I started by determining which queries were most affected and which denormalized fields were truly critical. This analysis was typically done using tools like pg_stat_statements or information obtained from database logs.

As a result of the analysis, I realized that we didn't always need a complete list of the Bill of Materials. Most of the time, we needed only the directly attached first-level components of a main product or components at a specific level. In this case, instead of full denormalization, it was necessary to consider a less denormalized structure containing only the necessary information or a completely different data model.

💡 Data Model Selection

It is always important to choose the most appropriate data model. For hierarchical data like a Bill of Materials, it is also possible to optimize the performance of classic methods such as adjacency list, nested set, materialized path, or closure table. Full denormalization may not always be the best solution.

In our situation, instead of completely removing the product_components table, we decided to use a hybrid approach with a "materialized path." While we kept the product_components table for the main product and directly linked components, for accessing deeper level components, we added a field to store a "path" for each product. This path was a string containing all main product IDs from the root product to the current component (e.g., 1/15/234/567).

After this change, the products table became:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    parent_id INT NULL, -- ID of the parent component (for adjacency list)
    materialized_path LTREE, -- For materialized path (PostgreSQL specific)
    -- Other product information
);

PostgreSQL's LTREE data type is highly efficient for storing and querying materialized paths. This made querying a specific branch of the Bill of Materials through the products table much faster. With queries like SELECT * FROM products WHERE id = subtree('/1/15'), we could easily retrieve all sub-branches of a specific main product.

Real-World Scenario: Root Cause Analysis and Solution

On one occasion, in a customer's ERP system, the production order creation process was abnormally slow. Especially for complex products, creating a production order could take 5-10 minutes. This severely disrupted production planning. Our initial suspicion was database indexes, but basic indexes were already in place.

Detailed investigations revealed that the source of the problem was again how the Bill of Materials data was queried. The system was using an old denormalization method, which was no longer actually in use, to collect all necessary components for the production order. This method repeatedly self-joined for each component. The query's execution plan looked terrible:

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Seq Scan on product_components pc1  (cost=0.00..150000.00 rows=100000 width=32)
   Filter: (main_product_id = 12345)
   ->  Join  (cost=0.00..120000.00 rows=50000 width=32)
         Join Filter: (pc2.main_product_id = pc1.component_product_id)
         ->  Seq Scan on product_components pc2
               Filter: (main_product_id = 12345)
               ->  Join ... (continues like this...)

This query was self-joining 15-20 times even for a simple product. This led to an exponentially increasing performance loss, especially for products with deep hierarchies. The fact that it took so long even for a simple product explained why it took minutes for complex products.

As a solution, we first eliminated this unnecessary and erroneous query. Then, we implemented the LTREE-based materialized path approach I mentioned above. During production order creation, we started using new and optimized queries to retrieve the Bill of Materials. After this change, the creation time for the same production order dropped from 45-60 seconds to 3-5 seconds.

💡 The Difference of Observability

A robust observability infrastructure is essential for detecting such problems. Logging, metric collection, and tracing tools play a critical role in finding the root cause of performance issues. Tools like pg_stat_statements and EXPLAIN ANALYZE are indispensable for analyzing database queries.

This case clearly demonstrated how dangerous denormalization can be and the impact of data modeling choices on system performance. A solution that initially seemed "fast" can eventually undermine the system's foundation. Therefore, denormalization decisions should always be made with careful trade-off analysis and regularly reviewed.

Next Steps and Recommendations

When addressing Bill of Materials denormalization, we should not only solve the current problem but also develop strategies to prevent similar problems in the future. This requires both adhering to data modeling principles and considering the system's growth potential.

My first recommendation is always to perform denormalization based on need. If denormalization is required for specific query performance, only enough denormalization should be done to meet the needs of that query. Instead of converting to a completely "flat" table, more controlled hierarchical data structures like materialized path or closure table can be preferred. PostgreSQL's LTREE or INTARRAY data types provide great convenience in this regard.

Secondly, mechanisms should be established to ensure data consistency. This can be through triggers, stored procedures, or business logic implemented at the application layer. However, triggers should be used carefully as they can negatively impact database performance. Ensuring consistency at the application layer can offer a more flexible approach. For example, when a component changes, an event listener that applies this change can update all relevant fields or tables.

ℹ️ Event Sourcing and CQRS

In more complex systems, architectural patterns like Event Sourcing and Command Query Responsibility Segregation (CQRS) can be used to improve data consistency and performance. Event Sourcing records all changes as events, while CQRS separates read and write models, allowing both to be optimized.

Thirdly, regular performance tests and monitoring are essential. As the system grows and data volume increases, even an initially well-performing denormalization strategy can lead to performance issues. Therefore, performing performance tests before deployment to production and at regular intervals allows us to detect potential bottlenecks in advance. Monitoring the slowest queries with tools like pg_stat_statements and regularly checking query plans with EXPLAIN ANALYZE offers a proactive approach.

Finally, I want to emphasize to young developers or professionals new to this field that denormalization is not a "savior" but rather a powerful tool that must be used carefully. It is essential to always understand the trade-offs and consider the long-term health of the system.