DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

Moving data from CockroachDB🪳 to PostgreSQL🐘 or YugabyteDB🚀

A user asked on the YugabyteDB community slack how to migrate from CockroachDB. You may think that "wire-compatibility" with PostgreSQL makes it easy, it is not. The ways we usually export data from PostgreSQL doesn't work with CRDB. However because PostgreSQL is powerful and YugabyteDB benefits from all those SQL features, there is an easy solution with Foreign Data Wrapper.

Start the lab

For this test I'm starting the two databases in Docker, initialize the TPCC demo database in CockroachDB and create an empty one in YugabyteDB


# start containers

docker run --network yb -d --rm --name yb -p 5433:5433 -p 7000:7000 -p 15433:15433 yugabytedb/yugabyte:latest bash -c "
yugabyted start --background false --tserver_flags="ysql_beta_features=true"
"

docker run --network yb -d --rm --name cr -p 26257:26257 cockroachdb/cockroach:latest bash -c "
cockroach start-single-node --insecure
"

# create demo tpcc tables in 🪳

docker exec -it cr cockroach workload init tpcc

# create empty db in 🚀

PGHOST=localhost PGPORT=5433 PGUSER=yugabyte PGDATABASE=yugabyte
until pg_isready ; do sleep 1 ; done | uniq &&
psql -c "
create database tpcc
"
PGDATABASE=tpcc
Enter fullscreen mode Exit fullscreen mode

DDL: Migrate the schema

CRDB accepts connections though the PostgreSQL protocol but is not compatible with the PostgreSQL catalog views, and then doesn't support pg_dump:

Franck@YB:~ $ pg_dump -U root -h localhost -p 26257 -d tpcc -s

pg_dump: error: query failed: ERROR:  column "x.tableoid" does not exist
pg_dump: error: query was: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace
Enter fullscreen mode Exit fullscreen mode

However, they provide a proprietary way to generate the CREATE TABLE statements:

# export ddl
psql -U root -h localhost -p 26257 -d tpcc -Ato ddl.sql -c "
show create all tables
"
Enter fullscreen mode Exit fullscreen mode

I like this simple command, even if not compatible with PostgreSQL, it is a good workaround for the absence of pgdump --schema-only.
I have extracted the DDL in a ddl.sql file:
Image description

As you can see, the DDL generated is not PostgreSQL-compatible. I have been working with a lot of SQL databases and have never seen this. In SQL we create indexes with CREATE INDEX. Indexes can be created implicitly from CREATE TABLE to enforce unique constraints, but declaring an index with the logical attributes of a table (list of columns and constraints) is not common.

I've quickly written a small awk script to get it back to PostgreSQL-compatible DDL:

awk '
# create extensions that are default in CR
NR==1{
 print "create extension if not exists pgcrypto;" > "tab-"FILENAME
}
# remove all reference to schema public
/TABLE public[.]/{
 $0=gensub(/(REFERENCES |TABLE )public[.]/,"\\1","g")
}
# move commas at the end of lines to the begining of next line
nextline!=""{
 $0=gensub(/(^\t*)(.*)$/,"\\1"nextline"\\2",1)
 nextline=""
}
/,$/{
 sub(/,$/,"")
 nextline=","
}
# INDEX clause in the CREATE TABLE is not a SQL syntax
/^CREATE TABLE/{
 table=gensub(/^CREATE TABLE (.*) [(]/,"\\1",1)
}
/^\t*,(UNIQUE )?INDEX/{
 indexes=indexes"\n"gensub(/ STORING /," INCLUDE ",1,gensub(/^\t*,(UNIQUE )?(INDEX)([^(]+)(.*)( STORING)?(.*)$/,"create \\1\\2 \\3 on "table" \\4 \\5 \\6;",1))
 $0=gensub(/(^\t*),(.*)$/,"\\1--\\2",1)
}
# validate constraints at creation
/^ALTER TABLE.*ADD CONSTRAINT.*/{
 $0=gensub(/(.*)(NOT VALID)?(;)$/,"\\1\\3",1)
 {print > "ref-"FILENAME}
 $0="--"$0
}
/^ALTER TABLE.*VALIDATE CONSTRAINT.*;$/{
$0="--"$0
}
# print that to the create table file
{print > "tab-"FILENAME}
END{
print indexes > "ind-"FILENAME
}
' ddl.sql

Enter fullscreen mode Exit fullscreen mode

This does a few additional transformations:

  • remove references to public schema to be able to create into a dedicated one
  • move the ending comma (,) to the beginning of next line to make it easier to comment out
  • comment the INDEX declaration and generate the CREATE INDEX statement to run after the import of data
  • add create extension for the functions that CRDB use without declaration (gen_random_uuid() here)
  • I move the creation of referential integrity constraints into another script to run it at the end, instead of creating them immediately as NOT VALID. By the way, with COPY in YugabyteDB you can create them before and import DISABLE_FK_CHECK if you want to bypass the check

The result goes in a tab-ddl.sql file for the table creation, ind-ddl.sql for the secondary indexes and ref-ddl.sql for the foreign keys.

This is easy to run on the YugabyteDB target, I'm defining search_path to create those tables in a tpcc schema (this is the reason I removed references to public in the DDL script):

psql -v ON_ERROR_STOP=1 \
 -c 'create schema tpcc' -c 'set search_path=tpcc,"$user", public' \
 -ef tab-ddl.sql | tee tab-ddl.log

Enter fullscreen mode Exit fullscreen mode

I'm using the PostgreSQL psql here with failure on the first error because when you migrate data, you don't want any unseen surprise. Better stop than expecting you to read the log.

I can see the tables from the YugabyteDB web console on http://localhost:15433/

Image description

They are empty. Ready to import data.

DML: Migrate the data

Dump

CockroachDB has no tool to dump data in a format that is easy to import elsewhere. You may be surprised by that, but I'm not. Oracle was famous by providing a SQL*Loader with no SQL*Unloader (the rumors say that it was ready to be released when Larry Ellison decided to remove a tool that makes it too easy to move out of his major source of revenues).
The same with CockroachDB:

UPDATE: COPY is now supported - I've added a \copy method below

Export to CSV

In the issues mentionned above, the answers to the absence of dump mention an export to CSV with cockroach sql --format csv. I tried it, but it just displays the result with commas to separate fields. It doesn't provide the quality needed for a migration. For example, I tried with the TPCC schema but encountered many errors in timestamp format or even worse: no distinction between a NULL value and the 'NULL' character string:

Image description

as you can see, " ("") ",NULL,NULL,0 is the CSV for (' (") ','NULL',null,0) where it should be " ("") ",NULL,,0

For a migration, you cannot rely on such a tool.
As a comparison, here is what PostgreSQL or YugabyteDB provides:

Image description

\copy to CSV

The PostgreSQL-compatible way to export to CSV with COPY is finally supported and it may be a preferred way

I've generated the export and import commands with a query on pg_tables:


tpcc=> -- Export to CSV
tpcc=> select format('\copy %I to %L with csv',tablename,tablename||'.csv')
tpcc->  from pg_tables where schemaname='public';
                    format
-----------------------------------------------
 \copy warehouse to 'warehouse.csv' with csv
 \copy district to 'district.csv' with csv
 \copy customer to 'customer.csv' with csv
 \copy history to 'history.csv' with csv
 \copy "order" to 'order.csv' with csv
 \copy new_order to 'new_order.csv' with csv
 \copy item to 'item.csv' with csv
 \copy stock to 'stock.csv' with csv
 \copy order_line to 'order_line.csv' with csv
(9 rows)

tpcc=> -- Import from CSV
tpcc=> select format('\copy %I from %L with csv',tablename,tablename||'.csv')
tpcc->  from pg_tables where schemaname='public';
                     format
-------------------------------------------------
 \copy warehouse from 'warehouse.csv' with csv
 \copy district from 'district.csv' with csv
 \copy customer from 'customer.csv' with csv
 \copy history from 'history.csv' with csv
 \copy "order" from 'order.csv' with csv
 \copy new_order from 'new_order.csv' with csv
 \copy item from 'item.csv' with csv
 \copy stock from 'stock.csv' with csv
 \copy order_line from 'order_line.csv' with csv
(9 rows)
Enter fullscreen mode Exit fullscreen mode

I've run those commands to export from CockroachDB and import to YugabyteDB and did not encounter the same problems as with the EXPORT command. Then, this is probably a better method than the next one, using Foreign Data Wrapper, which were the only reliable one in previous versions.

Foreign Data Wrapper

As YugabyteDB is PostgreSQL compatible, we can use the Foreign Data Wrapper. Of course it also depend on the compatibility of the remote database. I tried this in psql:

psql <<'SQL'
CREATE EXTENSION postgres_fdw;
CREATE SERVER cr FOREIGN DATA WRAPPER postgres_fdw OPTIONS (
 host 'cr', port '26257', dbname 'tpcc'
);
CREATE USER MAPPING FOR yugabyte SERVER cr OPTIONS (
 user 'root', password ''
);
IMPORT FOREIGN SCHEMA public FROM SERVER cr INTO public;
\det
SQL
Enter fullscreen mode Exit fullscreen mode

and this was looking good...

Image description

...except that nothing is there:

yugabyte=# \det
 List of foreign tables
 Schema | Table | Server
--------+-------+--------
(0 rows)
Enter fullscreen mode Exit fullscreen mode

Unfortunately nothing was imported from the schema. This is not a surprise as CockroachDB is not compatible with PostgreSQL catalog views.

Let's try with a table, declaring the structure:

CREATE foreign TABLE warehouse (
  w_id INT8 NOT NULL
  ,w_name VARCHAR(10) NOT NULL
  ,w_street_1 VARCHAR(20) NOT NULL
  ,w_street_2 VARCHAR(20) NOT NULL
  ,w_city VARCHAR(20) NOT NULL
  ,w_state CHAR(2) NOT NULL
  ,w_zip CHAR(9) NOT NULL
  ,w_tax DECIMAL(4,4) NOT NULL
  ,w_ytd DECIMAL(12,2) NOT NULL
) server cr;
Enter fullscreen mode Exit fullscreen mode

this looks good:

CREATE FOREIGN TABLE

tpcc=# \det

   List of foreign tables
 Schema |   Table   | Server
--------+-----------+--------
 public | warehouse | cr
(1 row)

tpcc=# select * from warehouse;

 w_id | w_name | w_street_1 | w_street_2 | w_city | w_state |   w_zip   | w_tax  |   w_ytd
------+--------+------------+------------+--------+---------+-----------+--------+-----------
    0 | 8      | 17         | 13         | 11     | SF      | 640911111 | 0.0806 | 300000.00
(1 row)

tpcc=# insert into tpcc.warehouse select * from warehouse;

INSERT 0 1

tpcc=# select * from tpcc.warehouse;

 w_id | w_name | w_street_1 | w_street_2 | w_city | w_state |   w_zip   | w_tax  |   w_ytd
------+--------+------------+------------+--------+---------+-----------+--------+-----------
    0 | 8      | 17         | 13         | 11     | SF      | 640911111 | 0.0806 | 300000.00
(1 row)

tpcc=# truncate table tpcc.warehouse;
TRUNCATE TABLE
Enter fullscreen mode Exit fullscreen mode

I was able to read data from CockroachDB and insert it into my YugabyteDB. Great. I'll do that for all tables. I have removed what I imported because I'll automate this for all tables.

As in YugabyteDB we are like in PostgreSQL, with the same catalog views, I'm generating the CREATE FOREIGN TABLE and INSERT from it:

with
n as (
 select oid as relnamespace
 , nspname as sch
 from pg_namespace
 where nspname ='tpcc'
), c as (
 select oid as attrelid
 , relnamespace
 , relname as tab
 from pg_class
 where relkind='r'
), a as (
 select attrelid
 , attname as col
 , format('%I %s',attname,format_type(atttypid, atttypmod)) as def
 from pg_attribute
 where attnum>0 and not attisdropped
)
select
 format(
  'create foreign table %I (%s) server cr'
  ,tab
  , string_agg(def,', ')
 )
from n natural join c natural join a group by sch,tab
union all
select
 format(
  'insert into %I.%I(%s) select %s from %I'
  ,sch
  ,tab
  ,string_agg(col,',')
  ,string_agg(col,','),tab)
from n natural join c natural join a group by sch,tab
union all
select
 format(
  'drop foreign table %I'
  ,tab
 )
from n natural join c
\gexec
Enter fullscreen mode Exit fullscreen mode

This generates the statements reading the structure of the target tabels (in schema tpcc) and executes the result with \gexec:

Image description

My warehouse foreign table was already created, the others are created here. Then rows are imported. And at the end I drop the foreign tables to be sure we don't use them anymore. The application can now work on the new tables in YugabyteDB after creating the secondary indexes and foreign keys:

psql -v ON_ERROR_STOP=1 \
 -c 'set search_path=movr,"$user", public' \
 -ef ind-ddl.sql | tee ind-ddl.log
psql -v ON_ERROR_STOP=1 \
 -c 'set search_path=movr,"$user", public' \
 -ef ref-ddl.sql | tee ref-ddl.log
Enter fullscreen mode Exit fullscreen mode

All this is an example and can be customized. For large tables, you may want to pre-split them, but if not, YugabyteDB can auto-split when they grow. And if you prefer to go though CSV files, create the foreign tables only and use PostgreSQL COPY to get a reliable CSV file. Here is how to generate the \copy commands with the previous query WITH clause:

...
union all
select
 format(
  '\copy (select * from %I) to %L csv;'
  ,tab,tab||'.csv'
 )
from n natural join c
;
Enter fullscreen mode Exit fullscreen mode

Of course, your application must be down for this kind of migration. To reduce downtime (and stress) there are replication solutions like https://www.arcion.io which has all connectors.

If you are migrating from CockroachDB you may be positively surprised by all the possibilities offered by PostgreSQL-compatible ecosystem. All this was standard PostgreSQL queries, on pg_catalog schema, with psql and \gexec. I didn't install any other tools here. And they are reliable because thousands of PostgreSQL users work with it daily on all kind of data, for years. YugabyteDB re-uses the same code for the SQL layer, changing only what is needed to scale-out with high performance.

Top comments (3)

Collapse
 
pablopla profile image
pablopla

YugabyteDB can't export data to PostgreSQL
github.com/yugabyte/yugabyte-db/is...

Collapse
 
franckpachot profile image
Franck Pachot

You can export data.
For metadata (DDL) you can pg_dump from YugabyteDB. PostgreSQL adds the access method (LSM-tree in this case) even if not standard. That cannot be controlled by Yugabyte. But it is really easy to change. I've 3 lines of AWK to do that. If you have any problem moving from YB to PG please tell me I can help

Collapse
 
franckpachot profile image
Franck Pachot

Thinking more about it, this should be a patch for PostgreSQL, something like: ignore access methods that are extensions.