DEV Community

Franck Pachot
Franck Pachot

Posted on

Computed columns in PostgreSQL 🐘 / YugabyteDB🚀

Here is a simple use case: the application inserts a JSON document with the ID in the document, and we want to expose it as a column. The 3 solutions work in PostgreSQL >= 12 so the first one is probably the best. The two others are for older versions, or YugabyteDB which is currently (2.11.2) compatible with PostgreSQL 11.

Solution 1: Stored Generated Column

Since version 12, PostgreSQL supports declarative generated columns stored in the table. I create the table and add the column afterwards to show what is added to an existing table:

drop table if exists franck_table cascade;

create table franck_table(j jsonb);

alter table franck_table
 add column recid text
 generated always as (j->>'recid') stored
 primary key;
Enter fullscreen mode Exit fullscreen mode

With this I can insert the document:

insert into franck_table(j)
 select to_jsonb(v) from (
  select recid,'xxx' as data from generate_series(1,1000) recid
) g;
Enter fullscreen mode Exit fullscreen mode

and select by the generated primary key:

select * from franck_table where recid='42';
Enter fullscreen mode Exit fullscreen mode

This uses the primary key index:

postgres=# select * from franck_table where recid='42';

              j               | recid
------------------------------+-------
 {"data": "xxx", "recid": 42} | 42

postgres=# explain (costs false)
select * from franck_table where recid='42';

                     QUERY PLAN
----------------------------------------------------
 Index Scan using franck_table_pkey on franck_table
   Index Cond: (recid = '42'::text)
(2 rows)
Enter fullscreen mode Exit fullscreen mode

This is probably the best solution, with little additional storage, but all column capabilities: indexing, can be referenced by a foreign key, can be used for partitioning, and so on.

Solution 2: Virtual column in a view

If there's no need to materialize the column, we can add this computation to a view, and use the view instead of the table:

drop table if exists franck_table cascade;

create table franck_table(j jsonb);

create view franck_view as
 select j->>'recid' as recid, j
 from franck_table;
Enter fullscreen mode Exit fullscreen mode

I cannot declare it as a key but can have a unique index on it for fast access, and uniqueness enforcement. But it cannot be used for referential integrity.

create unique index franck_index
 on franck_table((j->>'recid'));
Enter fullscreen mode Exit fullscreen mode

My DML now goes to the view, this is a capability of SQL databases (logical/physical independence):

insert into franck_view(j)
 select to_jsonb(v) from (
  select recid,'xxx' as data from generate_series(1,1000) recid
)v;
Enter fullscreen mode Exit fullscreen mode

I can select efficiently thanks to the index:

select * from franck_view where recid='42';

explain (costs false)
select * from franck_view where recid='42';
Enter fullscreen mode Exit fullscreen mode

This uses the function based index define on the JSON attribute:

yugabyte=# select * from franck_view where recid='42';

 recid |              j
-------+------------------------------
 42    | {"data": "xxx", "recid": 42}
(1 row)

yugabyte=# explain (costs false)
yugabyte-# select * from franck_view where recid='42';

                     QUERY PLAN
----------------------------------------------------
 Index Scan using franck_index on franck_table
   Index Cond: ((j ->> 'recid'::text) = '42'::text)
(2 rows)
Enter fullscreen mode Exit fullscreen mode

The view is there to provide an API compatibile whit what was expected (a recidcolumn) but you can query directly the JSON attribute.

This solution is nice, but we cannot define this column as a primary key. And, in YugabyteDB, we probably want a primary key to have hash-sharding on it. Without a primary key, an internal UUID is generated, but this means that all access will be secondary index access which is less efficient.

Solution 3: Trigger to generate the column

People tend to overlook the need for triggers, because they prefer to have the business logic in the application tier. But this is not business logic but data logic with physical optimization behind the logical SQL API. A SQL database need triggers for all the cases that do not have a declarative equivalent.

drop table if exists franck_table cascade;

create table franck_table(recid text primary key, j jsonb);
Enter fullscreen mode Exit fullscreen mode

Once the column defined, a trigger function can use any procedural logic to define the behavior on DML. Here we just need to set the value on insert:

create function franck_trigger() returns trigger as $$
begin
 new.recid:=new.j->>'recid';
 return new;
end;
$$ language plpgsql;
Enter fullscreen mode Exit fullscreen mode

The trigger itself is declarative, defining which procedure is called, and when:

create trigger franck_trigger before insert on franck_table 
for each row execute function franck_trigger();
Enter fullscreen mode Exit fullscreen mode

Defining data logic in triggers require careful thinking about all cases. This one is simple, but in general you need to think about what should happen during insert, update, delete and truncate.

The trigger will now do the necessary during normal DML:

insert into franck_table(j)
 select to_jsonb(v) from (
  select recid,'xxx' as data from generate_series(1,1000) recid
)v;
Enter fullscreen mode Exit fullscreen mode

I can select efficiently thanks to the index:

select * from franck_table where recid='42';

explain (costs false)
select * from franck_table where recid='42';
Enter fullscreen mode Exit fullscreen mode

This is the most efficient:

yugabyte=# select * from franck_table where recid='42';

 recid |              j
-------+------------------------------
 42    | {"data": "xxx", "recid": 42}
(1 row)

yugabyte=# explain (costs false)
yugabyte-# select * from franck_table where recid='42';

                     QUERY PLAN
----------------------------------------------------
 Index Scan using franck_table_pkey on franck_table
   Index Cond: (recid = '42'::text)
(2 rows)
Enter fullscreen mode Exit fullscreen mode

In YugabyteDB, because tables are stored in their index LSM Tree, an Index Scan on the primary key is an index only access. I explained this in Index Scan in YugabyteDB

In summary, you can, and should, declare your data logic in a SQL database. Many SQL features can do it with a simple standard declaration. When not having the specific feature, triggers can do the same with you own procedural logic.

Discussion (0)