Here is a simple use case: the application inserts a JSON document with the ID in the document, and we want to expose it as a column. The 3 solutions work in PostgreSQL >= 12 so the first one is probably the best. The two others are for older versions, or YugabyteDB which is currently (2.11.2) compatible with PostgreSQL 11.
Solution 1: Stored Generated Column
Since version 12, PostgreSQL supports declarative generated columns stored in the table. I create the table and add the column afterwards to show what is added to an existing table:
drop table if exists franck_table cascade;
create table franck_table(j jsonb);
alter table franck_table
add column recid text
generated always as (j->>'recid') stored
primary key;
With this I can insert the document:
insert into franck_table(j)
select to_jsonb(v) from (
select recid,'xxx' as data from generate_series(1,1000) recid
) g;
and select by the generated primary key:
select * from franck_table where recid='42';
This uses the primary key index:
postgres=# select * from franck_table where recid='42';
j | recid
------------------------------+-------
{"data": "xxx", "recid": 42} | 42
postgres=# explain (costs false)
select * from franck_table where recid='42';
QUERY PLAN
----------------------------------------------------
Index Scan using franck_table_pkey on franck_table
Index Cond: (recid = '42'::text)
(2 rows)
This is probably the best solution, with little additional storage, but all column capabilities: indexing, can be referenced by a foreign key, can be used for partitioning, and so on.
Solution 2: Virtual column in a view
If there's no need to materialize the column, we can add this computation to a view, and use the view instead of the table:
drop table if exists franck_table cascade;
create table franck_table(j jsonb);
create view franck_view as
select j->>'recid' as recid, j
from franck_table;
I cannot declare it as a key but can have a unique index on it for fast access, and uniqueness enforcement. But it cannot be used for referential integrity.
create unique index franck_index
on franck_table((j->>'recid'));
My DML now goes to the view, this is a capability of SQL databases (logical/physical independence):
insert into franck_view(j)
select to_jsonb(v) from (
select recid,'xxx' as data from generate_series(1,1000) recid
)v;
I can select efficiently thanks to the index:
select * from franck_view where recid='42';
explain (costs false)
select * from franck_view where recid='42';
This uses the function based index define on the JSON attribute:
yugabyte=# select * from franck_view where recid='42';
recid | j
-------+------------------------------
42 | {"data": "xxx", "recid": 42}
(1 row)
yugabyte=# explain (costs false)
yugabyte-# select * from franck_view where recid='42';
QUERY PLAN
----------------------------------------------------
Index Scan using franck_index on franck_table
Index Cond: ((j ->> 'recid'::text) = '42'::text)
(2 rows)
The view is there to provide an API compatibile whit what was expected (a recid
column) but you can query directly the JSON attribute.
This solution is nice, but we cannot define this column as a primary key. And, in YugabyteDB, we probably want a primary key to have hash-sharding on it. Without a primary key, an internal UUID is generated, but this means that all access will be secondary index access which is less efficient.
Solution 3: Trigger to generate the column
People tend to overlook the need for triggers, because they prefer to have the business logic in the application tier. But this is not business logic but data logic with physical optimization behind the logical SQL API. A SQL database need triggers for all the cases that do not have a declarative equivalent.
drop table if exists franck_table cascade;
create table franck_table(recid text primary key, j jsonb);
Once the column defined, a trigger function can use any procedural logic to define the behavior on DML. Here we just need to set the value on insert:
create function franck_trigger() returns trigger as $$
begin
new.recid:=new.j->>'recid';
return new;
end;
$$ language plpgsql;
The trigger itself is declarative, defining which procedure is called, and when:
create trigger franck_trigger before insert on franck_table
for each row execute function franck_trigger();
Defining data logic in triggers require careful thinking about all cases. This one is simple, but in general you need to think about what should happen during insert, update, delete and truncate.
The trigger will now do the necessary during normal DML:
insert into franck_table(j)
select to_jsonb(v) from (
select recid,'xxx' as data from generate_series(1,1000) recid
)v;
I can select efficiently thanks to the index:
select * from franck_table where recid='42';
explain (costs false)
select * from franck_table where recid='42';
This is the most efficient:
yugabyte=# select * from franck_table where recid='42';
recid | j
-------+------------------------------
42 | {"data": "xxx", "recid": 42}
(1 row)
yugabyte=# explain (costs false)
yugabyte-# select * from franck_table where recid='42';
QUERY PLAN
----------------------------------------------------
Index Scan using franck_table_pkey on franck_table
Index Cond: (recid = '42'::text)
(2 rows)
In YugabyteDB, because tables are stored in their index LSM Tree, an Index Scan on the primary key is an index only access. I explained this in Index Scan in YugabyteDB
In summary, you can, and should, declare your data logic in a SQL database. Many SQL features can do it with a simple standard declaration. When not having the specific feature, triggers can do the same with you own procedural logic.
Top comments (0)