DEV Community

Cover image for Magento 2 EAV (Entity,Attribute,Value) must die in 2021
Yegor Shytikov
Yegor Shytikov

Posted on

Magento 2 EAV (Entity,Attribute,Value) must die in 2021

MySQL historically uses outdated EAV architecture. with multiple tables an 3 trees long SQL queries.

Alt Text

Modern Databases doesn't need to have this architecture.

EAV is mostly useful to deal with unstructured data while still being able to do some sort of searching. EAV is very hard to scale and also tends to lead to very inefficient development experience.

Modern database has JSON support, such as PostgreSQL, then attributes can be queried, indexed and joined. This can offer performance improvements of over 1000x over naive EAV implementations., but does not necessarily make the overall Magento application more robust.

Note that there are two ways in which XML or JSON data can be stored: one way is to store it as a plain string, opaque to the database server; the other way is to use a database server that can “see into” the structure. There are obviously some severe drawbacks to storing opaque strings: these cannot be queried directly, one cannot form an index based on their contents, and it is impossible to perform joins based on the content.
Building an application that has to manage data gets extremely complicated when using EAV models, because of the extent of infrastructure that has to be developed in terms of metadata tables and application-framework code. Using JSON solves the problem of server-based data validation (which must be done by middle-tier and browser-based code in EAV-based frameworks), but has the following drawbacks:
It is programmer-intensive. JSON schemas are notoriously tricky to write by hand, a recommended approach is to create them by defining relational tables, generating JSON-schema code, and then dropping these tables. By contrast, in production systems that use EAV, such users define new attributes (and the data-type and validation checks associated with each) through a GUI application. Because the validation-associated metadata is required to be stored in multiple relational tables in a normalized design, a GUI application that ties these tables together and enforces the appropriate metadata-consistency checks is the only practical way to allow entry of attribute information, even for advanced developers — even if the end-result uses JSON instead of separate relational tables.
The server-based diagnostics that result with an JSON solution if incorrect data is attempted to be inserted (e.g., range check or regular-expression pattern violations) are cryptic to the end-user: to convey the error accurately, one would, at the least, need to associate a detailed and user-friendly error diagnostic with each attribute.
The solution does not address the user-interface-generation problem.
All of the above drawbacks are remediable by creating a layer of metadata and application code, but in creating this, the original “advantage” of not having to create a framework has vanished. The fact is that modeling sparse data attributes robustly is a hard database-application-design problem no matter which storage approach is used. Sarka’s work, however, proves the viability of using an JSON field instead of type-specific relational EAV tables for the data-storage layer, and in situations where the number of attributes per entity is modest (e.g., variable product attributes for different product types) the JSON-based solution is more compact than an EAV-table-based one. (JSON itself may be regarded as a means of attribute-value data representation, though it is based on structured text rather than on relational tables.)

MySQL 5.7.8+ introduces a number of functions for working with JSON values. These functions include those listed here:

Functions that create JSON values: JSON_ARRAY(), JSON_MERGE(), and JSON_OBJECT().
Functions that search JSON values: JSON_CONTAINS(), JSON_CONTAINS_PATH(), JSON_EXTRACT(), JSON_KEYS(), and JSON_SEARCH().
Functions that modify JSON values: JSON_APPEND(), JSON_ARRAY_APPEND(), JSON_ARRAY_INSERT(), JSON_INSERT(), JSON_QUOTE(), JSON_REMOVE(), JSON_REPLACE(), JSON_SET(), and JSON_UNQUOTE().
Functions that provide information about JSON values: JSON_DEPTH(), JSON_LENGTH(), JSON_TYPE(), and JSON_VALID().

Let's check how we can use this with modern MySQL or MariaDb.

SON is fast becoming the standard format for data interchange and for unstructured data, and MariaDB Platform (in fact, all MariaDB versions 10.2 and later) include a range of JSON supporting functions. There are a few good reasons why a JSON datatype hasn’t been implemented, but one is that there are just not many advantages to that, as JSON is a text-based format. Here I’ll describe JSON and the use cases for it, as well as the MariaDB JSON functions and their uses. We’ll also look at some other MariaDB features that are useful for JSON processing.

Insert Json into Maria Db is not the issue. However Search over Json is an issue!

The contents of all EAV columns in a particular row of a table can be collectively aggregated into a single JSON column, whose contents are of the form {"price":"10.00"} This has the interesting consequence that the SQL statement SELECT * from will not return the individual JSON columns, but concatenate all of them into a single JSON column. Create JSON attributing a Virtual(MariaDB)/ Generated(MySQL) columns are convenient for business applications such as product information, where the applicable attributes can be highly variable depending on the product type, but where the total number of variable attributes per product type are relatively modest.

In Magento we are having things like attributes: color, width, length and size. These we consider non-relational as the interpretation of these attributes is different depending of the type of garment (like size M or shoe sizes or a “zebra striped” color) and some garments might have a unique attribute, like designer or recommended by staff. Our table might then look something like this:

MariaDB> CREATE TABLE products(id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  price DECIMAL(9,2) NOT NULL,
  stock INTEGER NOT NULL,
  attr VARCHAR(1024));
Enter fullscreen mode Exit fullscreen mode

In this table we have a few columns that look like columns in any relational database table, and then we have a column, called attr, that can hold any relevant attribute for the garment in question and we will store that as a JSON string. You probably notice that we aren’t using a JSON datatype here as that is not present in MariaDB, despite the fact that there are JSON functions, but those JSON functions act on a text string with JSON content.

As the attr column in our example is plain text, we can of course index it as usual, but that is probably not what you want to do. Rather it would be neat to create an index on individual attributes in that JSON string. MariaDB doesn’t yet support functional indexes (i.e., functions not on values but on computed values). What MariaDB does have, though, is Virtual Columns, which can be indexed (and as of MariaDB 10.2 these virtual columns don’t have to be persistent). Learn more about Virtual Columns.

When a secondary index is created on a virtual generated column, generated column values are materialized in the records of the index. If the index is a covering index (one that includes all the columns retrieved by a query), *** generated/virtual column values are retrieved from materialized values in the index structure instead of computed “on the fly” ***. It is like apply custom fields to a schema.

There are additional write costs to consider when using a secondary index on a virtual column due to computation performed when materializing virtual column values in secondary index records during INSERT and UPDATE operations.

The easiest way to explain this is with an example. Let’s say we want an index on the colour attribute, if such a thing exists. For this we need two things: A virtual column that contains the colour attribute as extracted from the attr column, and then an index on that. In this case we will be using the JSON_VALUE function that takes a JSON value and a path, the latter describing the JSON operation to be performed, somewhat like a query language for JSON.

We end up with something like this:

MariaDB> ALTER TABLE products ADD attr_colour VARCHAR(32) AS (JSON_VALUE(attr, '$.colour'));
MariaDB> CREATE INDEX products_attr_colour_ix ON products(attr_colour);
With that in place, let’s see how that works:

MariaDB> SELECT * FROM products WHERE attr_colour = 'white';
+----+--------+-------+-------+---------------------------------+-------------+
| id | name   | price | stock | attr                            | attr_colour |
+----+--------+-------+-------+---------------------------------+-------------+
|  2 | Shirt  | 10.50 |    78 | {"size": 42, "colour": "white"} | white       |
|  3 | Blouse | 17.00 |    15 | {"colour": "white"}             | white       |
+----+--------+-------+-------+---------------------------------+-------------+
2 rows in set (0.00 sec)
And let’s see if that index is working as it should:

MariaDB> EXPLAIN SELECT * FROM products WHERE attr_colour = 'white';
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
| id   | select_type | table    | type | possible_keys           | key                     | key_len | ref   | rows | Etra       |
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
|    1 | SIMPLE      | products | ref  | products_attr_colour_ix | products_attr_colour_ix | 99      | const |    2 | Uing where |
+------+-------------+----------+------+-------------------------+-------------------------+---------+-------+------+------------+
1 row in set (0.00 sec)
Enter fullscreen mode Exit fullscreen mode

And just to show that the column attr_colour is a computed column that depends on the attr column, lets try updating the colour for the blouse and make that red instead of white and then search that. To replace a value in a JSON object MariaDB provides the JSON_REPLACE functions.

MariaDB> UPDATE products SET attr = JSON_REPLACE(attr, '$.colour', 'red') WHERE name = 'Blouse';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
MariaDB> SELECT attr_colour FROM products WHERE name = 'blouse';
+-------------+
| attr_colour |
+-------------+
| red         |
+-------------+
1 row in set (0.00 sec)
Enter fullscreen mode Exit fullscreen mode

MySQL Magento JSon reStriction:

MySQL able to create a single JSON field with 100 million elements, and MySQL stores this in approximately 800MB. The size of JSON documents stored in JSON columns is limited to the value of the max_allowed_packet system variable

Magento and PostgreSQL

Unfortunately Magento 2 is so legacy and don't support modern PostgreSQL DB. in poster it is much easier to manipulate with Json.

Let's use a example model with product entries to illustrate some basic operations when working with JSON data in PostgreSQL.

Alt Text

The operations in this section will be essentially the same either if you use json or jsonb, but let's review them to refresh what we can do with JSON and to set our use case before we see the jsonb goodies right after.

Define a Column in a Table:

Simply enough, we specify the data column with the jsonb data type:

CREATE TABLE product (  
  product_id serial NOT NULL,
  data jsonb
);
Enter fullscreen mode Exit fullscreen mode

Insert JSON Data

To enter data to the product table we just pass the whole JSON string as a field value:

INSERT INTO product VALUES (1, '{"title": "Sleeping Beauties", "genres": ["Fiction", "Thriller", "Horror"], "published": false}');  
INSERT INTO books VALUES (2, '{"title": "Influence", "genres": ["Marketing & Sales", "Self-Help ", "Psychology"], "published": true}');  
INSERT INTO books VALUES (3, '{"title": "The Dictator''s Handbook", "genres": ["Law", "Politics"], "authors": ["Bruce Bueno de Mesquita", "Alastair Smith"], "published": true}');  
INSERT INTO books VALUES (4, '{"title": "Deep Work", "genres": ["Productivity", "Reference"], "published": true}');  
INSERT INTO books VALUES (5, '{"title": "Siddhartha", "genres": ["Fiction", "Spirituality"], "published": true}');
Enter fullscreen mode Exit fullscreen mode

Query Data

We can now query specific keys within the JSON data:

SELECT data->'title' AS title FROM product;  
Enter fullscreen mode Exit fullscreen mode

Creating Indices

A dramatic improvement of jsonb over the json PostgreSQL data type, is the ability to index JSON data.

Our example only has 5 entries, but if they were thousands --or millions-- of entries, we could cut seek times in more than half by building indices.

We could, for example, index published product:

CREATE INDEX idx_published ON product ((data->'published'));  
Enter fullscreen mode Exit fullscreen mode

In general, I think storing entity properties in JSONB format can greatly simplify your database design and maintenance. If, like me, you do a lot of ad-hoc querying, having everything stored in the same table as the entity is really useful. The fact that it simplifies interacting with your data is already a plus, but the resulting database is also 3x smaller and from my tests it seems that the performance penalties are very limited. In some cases JSONB even performes faster then EAV, which makes it even better.

Test Magento JSONB EAV setup

The following tables were created for representing the data as EAV.

CREATE TABLE entity ( 
  id           SERIAL PRIMARY KEY, 
  name         TEXT, 
  description  TEXT
);
CREATE TABLE entity_attribute (
  id          SERIAL PRIMARY KEY, 
  name        TEXT
);
CREATE TABLE entity_attribute_value (
  id                  SERIAL PRIMARY KEY, 
  entity_id           INT    REFERENCES entity(id), 
  entity_attribute_id INT    REFERENCES entity_attribute(id), 
  value               TEXT
);
Enter fullscreen mode Exit fullscreen mode

The table below represents the same data, but with the attributes in a JSONB column which I called properties.

CREATE TABLE entity_jsonb (
  id          SERIAL PRIMARY KEY, 
  name        TEXT, 
  description TEXT,
  properties  JSONB
);
Enter fullscreen mode Exit fullscreen mode

A lot simpler, isn’t it?

Top comments (0)