DEV Community

Intelligent Convert
Intelligent Convert

Posted on

Practical Case of Oracle to PostgreSQL Migration

Intelligent Converters conducted Oracle to PostgreSQL migration project for a large cargo delivery company, as described in this whitepaper. The primary motivation behind the migration was to reduce the cost of the database management system and transition to an open-source solution that offered sufficient scalability, security, and customization capabilities. Here is a concise overview of the source database:

  • The source database server: Oracle 12g
  • Database size: 50 GB with 190 tables
  • PL/SQL code: 120,000 lines encompassing stored procedures, functions, and triggers

The Oracle to PostgreSQL migration project took approximately two months to complete, including half a month for assessment and planning, one month for the migration process, and another half month for testing. The migration team consisted of three specialists: two developers and one database administrator. There were six fundamental phases in the project:

  1. Investigation and Planning: Identify all Oracle-specific data storage and processing methods utilized in the source database, as well as their extent of use.
  2. Schema Migration: Select the appropriate tools to facilitate schema migration and implement them.
  3. Data Migration Methodology: Choose the most suitable approach to minimize system downtime during the migration process.
  4. Data Migration Execution: Perform the data migration, ensuring that all necessary transformations required by the PostgreSQL DBMS are handled.
  5. Converting PL/SQL Code: Oracle packages, stored procedures, functions and triggers must be converted into the PostgreSQL format, utilizing tools for partial automation and conducting manual post-processing when necessary.
  6. Testing and Optimization: Conduct performance and functional tests on the resulting database, fine-tuning it as needed to ensure optimal performance.

Schema Migration

In Oracle, there are certain data types that do not have a direct equivalent in PostgreSQL. One such example is the DATE data type, which encompasses both the date and time parts. On the other hand, PostgreSQL supports the following data types:

  • Date: Represents a pure date without the time component.
  • Time: Represents a pure time without the date component, with an optional time zone specification.
  • Timestamp: Represents both the date and time, also with an optional time zone specification.

For Oracle to PostgreSQL migration, there are two options for mapping the data: either utilizing the TIMESTAMP data type or configuring the Oracle extension to use the Oracle-style date type, oracle.date.

Spatial types also require special consideration during the migration process. Oracle has a built-in SDO_GEOMETRY type for handling spatial data, whereas PostgreSQL requires the installation of PostGIS to work with spatial data types.

Below is the safe mapping for distinguished data types in Oracle to PostgreSQL migration:

Oracle PostgreSQL
BINARY_FLOAT REAL
BINARY_INTEGER INTEGER
BINARY_DOUBLE DOUBLE PRECISION
BLOB, RAW(n), LONG RAW BYTEA (1GB limit)
CLOB, LONG TEXT (1GB limit)
DATE TIMESTAMP
NUMBER, NUMBER(*) DOUBLE PRECISION or BIGINT if it is a part of Primary Key
NUMBER(n,0), NUMBER(n) n<5 – SMALLINT
5<=n<9 – INT
9<=n<19 – BIGINT
n>=19 – DECIMAL(n)
NUMBER(p,s) DECIMAL(p,s)
REAL DOUBLE PRECISION

When dealing with numeric types, it is crucial to consider the specific requirements within the database. If accuracy is the primary concern, Oracle numeric types should be mapped to PostgreSQL's NUMERIC type. However, if calculation speed is of utmost importance, the ideal mapping would be to either the REAL or DOUBLE PRECISION types in PostgreSQL.

Data Migration

Data is the most important part of Oracle to PostgreSQL migration that requires careful consideration of the strategy and tools, especially for large databases, as it may cause unacceptable system downtime if wrong approach is being used. There are three common approaches to data migration:

  • Snapshot method provides single-step migration of the entire table’s data. However, it requires essential downtime for the source database during the entire period of data reading to prevent data loss or corruption.
  • Piecewise snapshot splits table’s data into fragments and then migrates the resulting chunks simultaneously using parallel threads or processes. Although downtime is still required, it is significantly reduced compared to the snapshot method. In this migration project specialists of Intelligent Converters utilized the piecewise snapshot method for large tables that contained millions of rows.
  • Changed data replication (CDR) is based on continuous loading data by tracking incremental changes. It allows to decrease system downtime near to zero.

Now, let's discuss how the third method may be applied for Oracle to PostgreSQL migration. It has two major implementations, each with its weaknesses. The first implementation is trigger-based CDR. It involves creating triggers on insert, update, and delete operations for each replicated table in the source database. These triggers track all changes by storing information about the events in a special 'history' table. Based on this data, the CDR tool replicates the changes to the target database. However, the trigger-based CDR approach introduces additional overhead to the source database due to the need to write into the 'history' table for every data update.

The second method is known as Transaction Log Change Data Replication (CDR). This approach utilizes the transaction logs generated by the Oracle DBMS to track and replicate changes to the target PostgreSQL database. The advantage of this method over trigger-based CDR is that it doesn't require modifications to the source database. However, Transaction Log CDR also has its own weaknesses:

  • Risk of Missing Changes: Oracle regularly archives transaction log files, which means that a Change Data Capture (CDC) tool may miss some changes if it fails to read the transaction log before it gets archived.
  • Data Loss or Damage: If there is a loss of connection to the target database during the replication of changes from the transaction log (or any other type of error occurs), there is a possibility of data loss or damage due to the lack of control over the transaction log.

It is important to consider these limitations when choosing the appropriate method for data migration and replication. Besides reducing the system downtime, another challenge of Oracle to PostgreSQL migration is missing specific data formats in the target DBMS as well as the external data capabilities.

In the previous section, BYTEA was mentioned as the recommended PostgreSQL data type for binary data. However, this conversion of types is not suitable for large binary data migration when average field size exceeds 10MB. This is due to the specific way in which BYTEA data is read – it can only be extracted as a single fragment, and piecewise reading is not possible. This limitation can lead to significant RAM overhead. To overcome this issue, PostgreSQL offers an alternative solution called the LARGE OBJECT data type. Values of the LARGE OBJECT type are stored in the system table 'pg_largeobject,' which is present in every database. This approach is capable of storing up to 4 billion rows, while the max acceptable volume of a LARGE OBJECT is 4TB. Additionally, piecewise reading is supported, addressing the limitations of BYTEA.

Another important aspect of migration is properly handling the ROWID, which is a pseudo-column that identifies the physical address of a record in a table in Oracle. PostgreSQL provides a similar service field called ctid, but it is not a direct equivalent of ROWID. According to the PostgreSQL documentation, the ctid value may change due to the vacuuming process. To emulate the functionality of ROWID in PostgreSQL, there are three basic methods:

  • Use an existing primary key (or create a new one) to identify rows instead of ROWID.
  • Add a serial or bigserial column with auto-generated values and designate it as the primary key or a unique key to replace ROWID functionality.
  • When it is not possible to define a single-column primary key, create a unique index that spans multiple columns. Determine the minimal unique set of fields for each row to establish this index.

In Oracle, external data stored outside the database can be linked and treated as a regular table using the "external table" feature of the DBMS. Similarly, PostgreSQL utilizes the Foreign Data Wrapper library for the same purpose. For example, the "file_fdw" extension is available in PostgreSQL to interact with external CSV files as if they were regular tables.

Migration of PL/SQL Code

All the issues addressing the Oracle to PostgreSQL migration of packages, stored procedures, functions, and triggers are explored in this section.

Packages. When migrating Oracle packages to PostgreSQL, it's important to note that PostgreSQL does not have a built-in equivalent feature for packages. However, this functionality can be emulated by organizing all related components belonging to a package within a PostgreSQL schema that shares the same name. In this approach, global variables can be stored in a dedicated service table within the schema. This allows for logical grouping and organization of code similar to packages in Oracle, despite the absence of a direct package concept in PostgreSQL.

Types casting. In PostgreSQL, strict type casting is required when invoking functions, operators, or when inserting and updating data in tables using the results of expressions. To address this requirement, a workaround is available using the pseudo-type 'anyelement'. This pseudo-type allows for more flexible handling of different data types within the context of function calls and data manipulation operations in PostgreSQL. By utilizing 'anyelement', the strict type casting requirement can be bypassed, providing a more versatile approach to handling data with varying types:

create or replace function my_concat(str1 anyelement, str2 anyelement)
returns varchar 
language plpgsql 
as $$
begin
return str1::varchar || str2::varchar;
end;
$$;

Enter fullscreen mode Exit fullscreen mode

Stored procedure/function must have either one argument of anyelement type or all parameters of the same type.

Sequences. Oracle and PostgreSQL have similar syntax of declaring sequences. For example:

CREATE SEQUENCE SEQ1 START WITH 10 INCREMENT BY 2 MAXVALUE 1000000 CACHE 20 NO CYCLE;
Enter fullscreen mode Exit fullscreen mode

However, there is a difference in referencing the next value of a sequence. In Oracle, the syntax is 'sequence.nextval', whereas in PostgreSQL, it is 'nextval('sequence')'.

Autonomous Transactions. In Oracle, autonomous transactions enable a subprogram to independently commit or rollback SQL operations without affecting the main transaction. This functionality allows certain operations, such as inserting data into a table within an insert-trigger, to succeed even if the main transaction fails. To achieve this, the corresponding INSERT statement must be enclosed within an autonomous transaction. This ensures that the insert operation is handled separately from the main transaction, providing the desired level of independence and allowing for specific operations to proceed regardless of the outcome of the main transaction:

CREATE OR REPLACE PROCEDURE insert_critical_data(v_data varchar2)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
    INSERT INTO info_table (data) VALUES (v_data);
    commit;
END;

Enter fullscreen mode Exit fullscreen mode

PostgreSQL does not provide direct support for autonomous transactions. However, there are workarounds available to achieve similar functionality. One approach is to refactor the code so that autonomous transactions are not required. Alternatively, you can utilize the 'dblink' module in PostgreSQL. With dblink, you can establish a new connection and execute a query using that connection, which is immediately committed regardless of the main transaction's outcome. This allows you to perform operations, such as inserting a row into a table, that will be committed even if the calling transaction is rolled back:

CREATE OR REPLACE FUNCTION insert_critical_data(v_data TEXT)
RETURNS void
AS
$BODY$
DECLARE
    v_sql text;
BEGIN
    PERFORM dblink_connect('myconn', 
         'dbname=mydbname user=… password=… host=… port=…');
    v_sql := format('INSERT INTO info_table (data) 
         VALUES (%L)', v_data);
    PERFORM dblink_exec('myconn', v_sql);
    PERFORM dblink_disconnect('myconn');
END;
$BODY$
LANGUAGE  plpgsql;

Enter fullscreen mode Exit fullscreen mode

Triggers. In Oracle, the source code of a trigger's body is included within the CREATE TRIGGER statement itself. However, in PostgreSQL, the trigger source code must be composed as a separate standalone function, which is then referenced from the CREATE TRIGGER statement. This particularity allows PostgreSQL to maintain a clearer separation between the trigger's definition and its associated logic:

CREATE OR REPLACE FUNCTION store_changes()
  RETURNS TRIGGER 
  LANGUAGE PLPGSQL
  AS
$$
BEGIN
    IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name) OR (NEW.email <> OLD.email)
THEN
         INSERT INTO changes_log(id,changed_on)
         VALUES(OLD.id,now());
    END IF;
    RETURN NEW;
END;
$$

CREATE TRIGGER make_changes
  BEFORE UPDATE
  ON employees
  FOR EACH ROW
  EXECUTE PROCEDURE store_changes();
Enter fullscreen mode Exit fullscreen mode

Built-in Functions. Oracle and PostgreSQL offer similar sets of built-in functions, although they are not identical. The following table presents functions that need to be ported to their equivalent counterparts during Oracle to PostgreSQL migration:

Oracle PostgreSQL
ADD_MONTH($date,$n_month) $date + $n_month * interval '1 month'
DECODE($exp, $when, $then, ...) CASE $exp WHEN $when THEN $then ... END
INSTR($str1, $str2) POSITION($str2 in $str1)
LAST_DAY($date) select (date_trunc('MONTH', cast($date as date)) + INTERVAL '1 MONTH - 1 day')::date
MONTHS_BETWEEN($date2,$date1) extract(month from age($date2::date,$date1::date))
REGEXP_LIKE($str, $pattern) $str SIMILAR TO $pattern
ROWNUM (see note) row_number() over ()
SYSDATE CURRENT_DATE
SYS_GUID() uuid_generate_v1()
TRUNC($date, $format) date_trunc($format, $date)

(note) Predicate 'where rownum < N' must be translated into 'limit N' in PostgreSQL

For this Oracle to PostgreSQL migration project two products of Intelligent Converters have been used: Oracle-to-PostgreSQL and the Code Converter. First tool completely automates migration of schema, data, indexes, constraints and foreign keys. Second tool designed to partially automate processing triggers, stored procedures, functions and views. This tool converts most of Oracle built-in functions and PL/SQL patterns into PostgreSQL format. Although manual post-processing of the output source code was required, the converter reduced total duration of the migration project at 40%.

Top comments (0)