BOM Denormalization in ERP: Is It Always the Solution?

#erp #yazilimmimarisi #database #performance

BOM Denormalization: Shortcut or Trap?

In a manufacturing ERP, especially in systems with complex bills of materials (BOM), experiencing performance issues is inevitable. When I encounter such problems, one of the first solutions that usually comes to my mind is database denormalization. The idea of flattening BOM data—especially frequently queried relationships—to reduce query times from milliseconds to microseconds is highly appealing. However, my field experience has shown that BOM denormalization is not always a silver bullet; in fact, in some cases, it can lead to even bigger problems. In this post, I will explain why BOM denormalization is not always the answer, when it actually works, and its potential pitfalls, using concrete examples from my own experience.

When I first stepped into this domain, BOM queries in a large manufacturing company's ERP system had turned into an absolute nightmare. Fetching all sub-components and assembly levels of a product could sometimes take 5 to 10 seconds. This was incredibly slowing down the order preparation processes. My initial analyses showed that continuous recursive CTE (Common Table Expression) queries and deep JOIN operations were putting a massive strain on the system. Navigating back and forth between tables was exhausting the database engine, consuming CPU and I/O resources. At this point, the idea of denormalization looked very bright.

The Appeal of Denormalization: Fast Queries, Simple Structure

The biggest promise of denormalization is to increase query performance. In normalized database models, obtaining related data usually requires joining (JOIN) multiple tables. These JOIN operations can be costly, especially on large datasets and complex relationships. This cost increases even further in hierarchical data like bills of materials. When you want to see the components at all levels of a product, the database has to scan, match, and combine results from many tables.

ℹ️ What is Denormalization?

Denormalization is the process of intentionally moving away from normalization rules to add data redundancy or combine tables. Its primary goal is to increase data access speed. This is typically preferred for reporting and analytical queries.

For example, in a product's BOM (Bill of Materials) table, each row represents a component and its relationship to the parent product. If a parent product has multiple sub-components, and those sub-components have their own sub-components, this structure grows like a tree. To fetch this tree in a flat format, we use Recursive CTEs. However, every time these CTEs run, the database engine has to recalculate every level of the tree. With denormalization, we can, for instance, create a single table named ProductHierarchy and keep all ancestors and direct sub-components of each product in this table. This way, the query can be completed with a simple SELECT from a single table. When we applied this method at a manufacturing company, BOM query times dropped from 2 seconds to 150 milliseconds. At first glance, this was a massive improvement.

Real-World Traps: Data Consistency and Maintenance Overhead

However, as always, this improvement came with a price. The biggest disadvantage of denormalization is that maintaining data consistency becomes much harder. In a normalized structure, a piece of information is stored in only one place. When you need to make a change, updating that single place is enough. When you denormalize, however, the same information can exist in multiple places. This increases the risk of inconsistency during updates, inserts, or deletes.

To give an example, let's say we define a product as "Component A". Suppose this "Component A" is a part of parent products "X", "Y", and "Z". If we denormalized the data and associated "Component A" with "X", "Y", and "Z" in the ProductHierarchy table, and we later realize that "Component A" no longer has any relation to product "Y", we have to update three different records. If one of these updates is forgotten or left incomplete due to an error, we end up with an inconsistent state in our database. Such errors can spread rapidly, especially in high-throughput ERP systems, leading to reporting errors, incorrect production planning, or shipping issues.

⚠️ Risks of Denormalization

Denormalization leads to data redundancy, increasing storage space and making it harder to ensure data consistency. Update, insert, and delete operations become more complex, and additional mechanisms may be required to maintain data integrity.

Another major issue is the maintenance overhead. A denormalized data model can become more complex than a normalized one. Keeping track of which data is stored where can be confusing for new developers. Additionally, data cleaning and restructuring operations require more effort. In my own projects, while developing the backend for financial calculators, which was a side project of mine, I considered using denormalization to manage complex financial product trees. However, in an environment where data flow changes constantly and users expect instant updates, the additional workload and potential error risk of maintaining data consistency kept me away from this idea. Instead, I aimed to achieve similar performance levels through more aggressive caching strategies and optimized queries.

When Does Denormalization Make Sense?

It is worth noting that denormalization is not always a bad idea. In some scenarios, especially where data write frequency is low and read operations are extremely critical, denormalization can be an excellent solution. For example, data warehouses used for reporting purposes typically use denormalized structures. In these systems, data is usually loaded in bulk and rarely updated. The main goal is to allow analysts to perform fast data analysis with complex queries.

Another example is static data that users need to see instantly and does not change very often—such as product categories on an e-commerce site or thread titles on a forum. For this type of data, using a pre-calculated and aggregated structure instead of traversing the database on every query can significantly boost performance. In my Android spam blocker app, I denormalized relatively static data, such as countries and operator codes, to increase query speed. This allowed the app to process incoming calls faster in real-time.

Alternative Approaches: Ways to Avoid Denormalization

If you want to avoid the risks of denormalization, there are other ways to increase performance. The first of these is to optimize database queries. This means using the right indexes, avoiding unnecessary JOINs, making queries more efficient, and even finding bottlenecks by analyzing query plans with the EXPLAIN command.