In the previous posts I described multiple ways to read from local replicas. This was for cases where your table is global but accessed mostly from one region. I'll continue this series with another case where you want to geo-partition data. The table is global, at logical level, but data is stored in regional partitions of it. This uses PostgreSQL declarative partitioning and works fine when the partition key is part of the primary key, and mentioned in the query. If it is not, then all partition have to be read to find the right one. But there's a possibility to have a local read query without knowing the partition key. Here is an example.
I'll do this in a lab on my laptop, using Docker Compose containers, tagging the servers with the region. Of course, I'll not measure the latency, as all is physically there. But I'll check the execution plan to verify that I read only local partitions.
Create lab cluster
I start a regional cluster with 3 regions: earth
, moon
and mars
. I start a RF=1 so that I need only one server by region.
I've used this docker-compose
Now, I'll connect to the node on earth
with psql -p 5433 yugabyte
yugabyte=# show listen_addresses;
listen_addresses
------------------
yb-tserver-0
(1 row)
yugabyte=# select host,node_type,cloud,region,zone
from yb_servers() order by 1;
host | node_type | cloud | region | zone
--------------+-----------+-------+--------+------
yb-tserver-0 | primary | star | earth | base
yb-tserver-1 | primary | star | moon | base
yb-tserver-2 | primary | star | mars | base
(3 rows)
I create one tablespace per region:
create tablespace earth with (replica_placement=$placement${
"num_replicas": 1,"placement_blocks": [
{"cloud": "star","region": "earth","zone": "base","min_num_replicas": 1}
]}$placement$);
create tablespace moon with (replica_placement=$placement${
"num_replicas": 1,"placement_blocks": [
{"cloud": "star","region": "moon","zone": "base","min_num_replicas": 1}
]}$placement$);
create tablespace mars with (replica_placement=$placement${
"num_replicas": 1,"placement_blocks": [
{"cloud": "star","region": "mars","zone": "base","min_num_replicas": 1}
]}$placement$);
Create partitioned table
I create a list-partitioned table with a partition in each tablespace:
create extension if not exists pgcrypto;
create table customers(
id uuid default gen_random_uuid()
, planet text
, info text
, primary key(id,planet)
) partition by list(planet);
create table customers_earth partition of customers
for values in ('earth') tablespace earth;
create table customers_moon partition of customers
for values in ('moon') tablespace moon;
create table customers_mars partition of customers
for values in ('mars') tablespace mars;
Here my customers
are identified by an id
. However, I need to add the partition key planet
to the primary key. There's no way to enforce that the id
is unique across all regions, because PostgreSQL, and then YugabyteDB, has no global indexe. If you need, you can use a trigger for that. But, given that it is a uuid
you probably don't need it. Just take the full primary key as an identifier of the customer.
Now adding some data:
with planets(name) as (
values('earth'),('moon'),('mars')
) insert into customers(planet, info) select name, n::text
from planets, generate_series(1,10000) n;
My docker-compose has a container that shows the tablet statistics every 10 seconds. This is how how check which reads and writes go to each region.
Querying without the partition key
Now we get to the main point. I said that ideally you have the full key where querying for a customer:
yugabyte=#
explain analyze select id,planet from customers
where id='1c888089-c4f2-4da4-ba61-59957b965bf8' and planet='earth';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..4.11 rows=1 width=48) (actual time=0.981..0.984 rows=1 loops=1)
-> Index Scan using customers_earth_pkey on customers_earth (cost=0.00..4.11 rows=1 width=48) (actual time=0.980..0.983 rows=1 loops=1)
Index Cond: ((id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid) AND (planet = 'earth'::text))
Planning Time: 0.227 ms
Execution Time: 1.031 ms
Peak Memory Usage: 24 kB
(6 rows)
I query a customer from planet earth, it reads only one partition, stored in planet earth.
But what if I have only the id
?
yugabyte=#
explain analyze select id,planet from customers
where id='1c888089-c4f2-4da4-ba61-59957b965bf8' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..46.65 rows=300 width=48) (actual time=1.598..8.184 rows=1 loops=1)
-> Index Scan using customers_earth_pkey on customers_earth (cost=0.00..14.65 rows=100 width=48) (actual time=1.597..1.599 rows=1 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
-> Index Scan using customers_mars_pkey on customers_mars (cost=0.00..15.25 rows=100 width=48) (actual time=1.554..1.554 rows=0 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
-> Index Scan using customers_moon_pkey on customers_moon (cost=0.00..15.25 rows=100 width=48) (actual time=5.027..5.028 rows=0 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
Planning Time: 0.252 ms
Execution Time: 8.245 ms
Peak Memory Usage: 56 kB
(10 rows)
This is not so good because the customer can be anywhere. We have to query all partitions to find the right one
Don't look at the time here, because it is a lab on my laptop. But if you really deploy between Earth, Moon and Mars you will see a latency of 2.51 second for Index Scan using customers_moon_pkey
and minutes for customers_mars_pkey
. You may increase some timeouts for that, and set a preferred location for the yb-master
. Anyway, even with cross-region deployment on Earth, we need a solution for those who have the id
only.
Duplicate Indexes
In order to avoid cross-region reads in this case, I can create a Duplicate Covering Indexes in each region. This is a redundant index for the other regions than the primary key. For this I create two additional indexes for each table partition, mentioning the tablespace where I want it to be stored:
create index customers_moon_id_planet_earth on customers_moon(id,planet) tablespace earth;
create index customers_mars_id_planet_earth on customers_mars(id,planet) tablespace earth;
create index customers_earth_id_planet_moon on customers_earth(id,planet) tablespace moon;
create index customers_mars_id_planet_moon on customers_mars(id,planet) tablespace moon;
create index customers_earth_id_planet_mars on customers_earth(id,planet) tablespace mars;
create index customers_moon_id_planet_mars on customers_moon(id,planet) tablespace mars;
It is good to have meaningful naming conventions and, even better, generate them. Or at least check them:
yugabyte=# select tablename, tablespace, indexname
, pg_get_indexdef(indexname::regclass)
from pg_indexes
where tablename like 'customers%' order by 1,2,3;
yugabyte=# select tablename, tablespace, indexname
, pg_get_indexdef(indexname::regclass)
from pg_indexes
where tablename like 'customers%' order by 1,2,3;
tablename | tablespace | indexname | pg_get_indexdef
-----------------+------------+--------------------------------+-------------------------------------------------------------------------------------------------------
customers_earth | earth | customers_earth_pkey | CREATE UNIQUE INDEX customers_earth_pkey ON public.customers_earth USING lsm (id HASH, planet ASC)
customers_earth | mars | customers_earth_id_planet_mars | CREATE INDEX customers_earth_id_planet_mars ON public.customers_earth USING lsm (id HASH, planet ASC)
customers_earth | moon | customers_earth_id_planet_moon | CREATE INDEX customers_earth_id_planet_moon ON public.customers_earth USING lsm (id HASH, planet ASC)
customers_mars | earth | customers_mars_id_planet_earth | CREATE INDEX customers_mars_id_planet_earth ON public.customers_mars USING lsm (id HASH, planet ASC)
customers_mars | mars | customers_mars_pkey | CREATE UNIQUE INDEX customers_mars_pkey ON public.customers_mars USING lsm (id HASH, planet ASC)
customers_mars | moon | customers_mars_id_planet_moon | CREATE INDEX customers_mars_id_planet_moon ON public.customers_mars USING lsm (id HASH, planet ASC)
customers_moon | earth | customers_moon_id_planet_earth | CREATE INDEX customers_moon_id_planet_earth ON public.customers_moon USING lsm (id HASH, planet ASC)
customers_moon | mars | customers_moon_id_planet_mars | CREATE INDEX customers_moon_id_planet_mars ON public.customers_moon USING lsm (id HASH, planet ASC)
customers_moon | moon | customers_moon_pkey | CREATE UNIQUE INDEX customers_moon_pkey ON public.customers_moon USING lsm (id HASH, planet ASC)
(9 rows)
Each table partition (tablename
) has, in each region (tablespace
) an index on (id,planet)
. And, yes, I should have created it as UNIQUE. That's a harmless mistake.
Now running my query above, looking for the region by providing only the id
:
yugabyte=#
explain analyze select id,planet from customers
where id='1c888089-c4f2-4da4-ba61-59957b965bf8' ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..45.45 rows=300 width=48) (actual time=0.890..3.030 rows=1 loops=1)
-> Index Scan using customers_earth_pkey on customers_earth (cost=0.00..14.65 rows=100 width=48) (actual time=0.890..0.891 rows=1 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
-> Index Only Scan using customers_mars_id_planet_earth on customers_mars (cost=0.00..14.65 rows=100 width=48) (actual time=1.293..1.293 rows=0 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
Heap Fetches: 0
-> Index Only Scan using customers_moon_id_planet_earth on customers_moon (cost=0.00..14.65 rows=100 width=48) (actual time=0.844..0.844 rows=0 loops=1)
Index Cond: (id = '1c888089-c4f2-4da4-ba61-59957b965bf8'::uuid)
Heap Fetches: 0
Planning Time: 43.061 ms
Execution Time: 3.105 ms
Peak Memory Usage: 56 kB
(12 rows)
Remember that I'm connected to earth
. Finding the country for any id
from there reads the primary key customers_earth_pkey
for the local partition customers_earth
of the table and reads the duplicate indexes customers_mars_id_planet_earth
and customers_moon_id_planet_earth
for the partitions customers_mars
and customers_moon
Then, if there are other columns to read, you should have them included into the local index [but be careful if you expect updates on them - See Denis comment below]. But if there's more, like joining to other table, better to connect to the right region, to get SQL processed there, and add the region in the where clauses:
$ docker exec -it yb-tserver-0 ysqlsh -h yb-tserver-0
ysqlsh (11.2-YB-2.15.0.1-b0)
Type "help" for help.
yugabyte=# show listen_addresses ;
listen_addresses
------------------
yb-tserver-0
(1 row)
yugabyte=# select id,planet from customers
where id='85338353-162a-4a62-bcb9-78fd58a6b500' ;
id | planet
--------------------------------------+--------
85338353-162a-4a62-bcb9-78fd58a6b500 | mars
(1 row)
yugabyte=# select host from yb_servers() where region='mars';
host
--------------
yb-tserver-2
yugabyte=# \c yugabyte yugabyte yb-tserver-2
You are now connected to database "yugabyte" as user "yugabyte" on host "yb-tserver-2" at port "5433".
yugabyte=# show listen_addresses ;
listen_addresses
------------------
yb-tserver-2
(1 row)
yugabyte=# explain analyze select * from customers where id='85338353-162a-4a62-bcb9-78fd58a6b500' and planet='mars';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..4.11 rows=1 width=80) (actual time=2.763..2.765 rows=1 loops=1)
-> Index Scan using customers_mars_pkey on customers_mars (cost=0.00..4.11 rows=1 width=80) (actual time=2.762..2.764 rows=1 loops=1)
Index Cond: ((id = '85338353-162a-4a62-bcb9-78fd58a6b500'::uuid) AND (planet = 'mars'::text))
Planning Time: 34.833 ms
Execution Time: 2.842 ms
Peak Memory Usage: 40 kB
(6 rows)
Another advantage of having the region in the primary key is that you will have it in all child table's foreign key. Then, if they are also partitioned, the local partition only will be read.
Note that maintaining the additional indexes has a cost. Here is the same insert as I did above, but now with indexes to maintain in all regions:
with planets(name) as (
values('earth'),('moon'),('mars')
) insert into customers(planet, info) select name, n::text
from planets, generate_series(1,10000) n;
Top comments (2)
It's coincidence but I was researching how to query a geo-partitioned cluster efficiently when all you have is an
id
of a record (with no geo-column). Very timely! Just curious how big would be an impact for writes as long as the app needs to update all the indexes across distant regions.Btw, love the planetary data model!
Maybe not a coincidence as this was driven by discussions with colleagues and users 😀
Only the inserts and delete have to update all indexes because I don't expect an update in the primary key. Except if you include more columns in the indexes -> I'll add a comment on this suggestion, yes, good point