DEV Community

Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on • Edited on

SQLcl to transfer data from Oracle to PostgreSQL or YugabyteDB πŸ…ΎπŸ˜πŸš€

UPDATE: you can also use YugabyteDB Voyager to move from Oracle to YugabyteDB

Old DBAs have stories about Oracle providing a "SQL*Loader" without any "SQL*Unloader" because Larry Ellison didn't want his customers to move out. This has changed: there's an easy way to export to CSV with a simple set sqlformat csv in SQLcl. Follow Jeff Smith blog to know more about it.

Here is an example. I wanted to move some sample data from Oracle to YugabyteDB to compare the size. I have an always free Automonous Database, which includes the SSB sample schema. There is a LINEORDER table which is a few hundreds of GB. I'll get the DDL with dbms_metadata. The only change I had to do was sub(" NUMBER,"," NUMERIC,") and I disabled constraints, and collation clauses.

Of course, there are professional tools to convert an Oracle schema to PostgreSQL. The good old ora2pg, or AWS SCT which is also great to assess the level of changes required by a migration. But for something quick, I'm good with awk πŸ˜‰

Then the export is easy with set sqlformat csv and the few settings to output only data like feedback off pagesize 0 long 999999999 verify off. I pipe all that to awk which builds the \copy command that takes these CSV lines as-is. I like to do little steps and then build 10000 lines COPY commands with (NR-data)%10000, data being set at the beginning of the COPY command. Sending them in parallel would be easy, but I may not need it because YugabyteDB is multithreaded.

Here is the script I use - I have my Autonomous Database wallet in TNS_ADMIN, SQLcl installed in my home (an Oracle free tier ARM on which I also run my YugabyteDB lab).

{
TNS_ADMIN=/home/opc/wallet_oci_fra ~/sqlcl/bin/sql -s demo/"bsky-social-baena-3gpea"@o21c_tp @ /dev/stdin SSB LINEORDER <<SQL
set feedback off pagesize 0 long 999999999 verify off
whenever sqlerror exit failure
begin
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'SEGMENT_ATTRIBUTES', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'STORAGE', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'CONSTRAINTS', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'REF_CONSTRAINTS', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'SQLTERMINATOR', true);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'COLLATION_CLAUSE', 'NEVER');
end;
/
set sqlformat default
select dbms_metadata.get_ddl('TABLE','&2','&1') from dual ;
set sqlformat csv
select * from "&1"."&2" ;
SQL
} | awk '
/^ *CREATE TABLE /{
 table=$0 ; sub(/^ *CREATE TABLE/,"",table)
 print "drop table if exists "table";"
 schema=table ; sub(/\"[.]\".*/,"\"",schema)
 print "create schema if not exists "schema";"
}
/^"/{
 data=NR-1
 print "\\copy "table" from stdin with csv header"
}
data<1{
 sub(" NUMBER,"," numeric,")
}
{print}
data>0 && (NR-data)%1000000==0{
 print "\\."
 print "\\copy "table" from stdin with csv"
}
END{
 print "\\."
}
'
Enter fullscreen mode Exit fullscreen mode

The output can directly be piped to psql 😎

Here is my screen when starting the load:
Image description
It is a lab, measuring elapsed time makes not sense, but I looked at rows_inserted statistics to verify that all is distributed to the 3 nodes of my distributed SQL database. Even with a single client session, the load is distributed on all the cluster.

This works the same for PostgreSQL because it is the same API: YugabyteDB uses PostgreSQL on top of the distributed storage.

All the component in this test are free and easy to use:

Top comments (0)