Insolita

Posted on Jul 26, 2020

Notes for migrating an app from Mysql to Postgres

#webdev #database #postgres #mysql

Actually, in most cases, we work with databases through abstraction layers of ORM/ActiveRecord libraries, which give us the ability to work with different databases in the same manner. That can make an illusion that we can easily be switching to another database system. Just run a new database system, change database driver in framework settings, and upload data dump to the new database. So, it can be the truth for some simple cases. But for larger projects, we forced to use raw database queries from time to time. It happens for different reasons - when we want to add better optimization or want to use database-specific features, or when we need a complex query. My article is about several differences between MySQL and PostgreSQL that we should keep in mind if we want to migrate from one to another. And if any kind of raw SQL queries are in our application, we should pay special attention to it.

Quotes

For MySQL, we can use single quotes '' as well as double quotes "" for strings and backticks for system data - table names, column names.

For PostgreSQL, we must use only single quotes for strings and double quotes for table, column names, etc...

Comments

MySQL supports ANSI standard comment prefix '--' double-hyphen and also hashtags. PostgreSQL understands only double-hyphen prefix.

Case Sensitiveness

MySQL under Linux os is case sensitive for column, index, stored routine, and event names. And it is not sensitive for database and table names.
Imagine, that we have table "my_table" with columns "name" and "myText"

select name, myText from my_table; -- ok
select Name, mytext from my_table; -- ok
select name, myText from My_Table; -- wrong

Postgres is case-sensitive for quoted system names and insensitive for unquoted (But all unquoted names casts to lowercase)

select name, myText from my_table; -- wrong
select name, myText from "My_Table"; -- wrong

select name, "myText" from my_table; -- ok
select NAME, "myText" from my_table; -- ok
select name, "myText" from My_Table; -- ok

Operators

MySQL use AND,OR as well as alises &&, || - for logical operations.
Postgres use only AND, OR. The operator || used for string concatenations.

MySQL use ^ for byte XOR operation.
Postgres use ^ for exponentiation and # for byte XOR.

Postgres haven't SOUNDS LIKE operator. (But you can use fuzzystrmatch extension, that can soundex and even more)

MySQL LIKE operator is case-insensitive. Postgres LIKE operator case-sensitive. Use ILIKE operator for an ignore-case search.

Strictness

MySQL got real strictness mode settings only since 5.7 version. So if your MySQL version less than 5.7, or you use it without strict-mode - you must pay attention for sure that your application well-validate data size/length accordingly column limits. Because Postgres is strict for this.
Also, one of the often mistakes that come from MySQL non-strictness is Grouping queries without explicit columns aggregating.

I assume that we have a table users

create table users(
 id bigserial primary key,
 name varchar(255) not null,
 age smallint default 0 null,
 department varchar(20) not null
);

MySql in non-strict mode allows to make queries like

select *, count(id) from users group by department;
select name, age, count(id), department from users group by department;

Postgres requires explicit aggregation for columns that are not a part of GROUP BY. So valid query should be looks like:

select avg(age), array_agg(name),count(id), department from users group by department;

Data Types

I make review only for several ones. You can find more details in the official documentation. MySQl PostgreSQL

The BOOL type. In the MySQL it just an alias for TINYINT(1) and can take only 0, false, 1, true values;
In the Postgres - bool is a separated type. It can be filled with values: true, false, 'true', 'false','yes', 'no', 't', 'f', 'y', 'n', '0', '1'. But it can't accept unquoted 0 and 1 values, which may become a problem for a data migration from MySQL. You should wrap 0 and 1 values in quotes, or add explicit type casting 0::bool, 1::bool.
There are no TINYINT type in Postgres. The minimal integer type is 2-byte SMALLINT with values in range from -32768 to +32767
Postgres haven't the UNSIGNED option. If you need to restrict data, you can add CHECK constraint.

CREATE TABLE users (
  id serial primary key, 
  name  varchar(255),
  age smallint CHECK(age > 0) 
);
-- OR
ALTER TABLE users ADD CHECK(age >0)

Postgres functions for date and time manipulations are really different. If you use some of them in MySQL, don't forget to replace it with analogs
Postgres haven't the feature ON UPDATE CURRENT_TIMESTAMP that allows automatically update timestamp value on every row update. You should write the trigger function for the same behavior, or make it on the application level.
JSON. There are no problems for move data from the MySQL json field to Postgres. But syntax for operations with json in queries absolutely different. Also, you should know that Postgres offers a more efficient JSONB type. (What a difference?)
Postgres haven't a SET column type. The nearest alternative is Array with additions constraints.
BYTEA - is a Postgres alternative for MySQL BINARY type. But it has known problems with data conversion. But solutions are known also.
ENUM. Postgres support it, but the way for it creation is different. You should define own TYPE as ENUM, and then declare it for a column.

CREATE TYPE job_state AS ENUM ('pending', 'active', 'fail', 'success');
CREATE TABLE jobs (
    id bigserial,
    name varchar(255),
    payload jsonb,
    state  job_state
);
INSERT INTO jobs (name, payload, state) 
VALUES ('do it', {"foo": "bar"}, 'pending')

Creating/Updating Table Schema

Unlike MySQL, Postgres can wrap in transactions not only data modifications but the table structure operations too. And it is useful for migrations.
Postgres can't change column positions. ... ADD COLUMN * AFTER, ... ADD COLUMN * BEFORE are not supported
There are differences for column change operations In MySql, it can be used with one query:

ALTER TABLE jobs MODIFY name varchar(100) NULL DEFAULT 'cron_job' COMMENT 'job name';

Same for PostgreSQL will be like:

ALTER TABLE jobs ALTER COLUMN name TYPE varchar(100);
ALTER TABLE jobs ALTER COLUMN name DROP NOT NULL; 
ALTER TABLE jobs ALTER COLUMN name SET DEFAULT 'cron_jobs';
COMMENT ON COLUMN "jobs"."name" IS 'job name';

MySQL doesn't support default values for columns with types BLOB, TEXT, JSON, GEOMETRY. Postgres can set it.

Generated Columns

MySQL supports generated columns, that calculate self-value based on values of other columns automatically. It looks like

ALTER TABLE my_box ADD COLUMN volume integer as (width * height * len);

PostgreSQL with version < 12 doesn't support it. But since 12 it can same with syntax like:

ALTER TABLE my_box ADD COLUMN volume integer GENERATED ALWAYS AS (width * height * len) STORED;

Autoincrement Index

Unlike MySQL, Postgres allows us to has more than one autoincremented column (and even with different increment settings). It called sequences.
How to get last inserted id?

You can create the insert query with an option for return id.

INSERT INTO users (name, age) VALUES ('Joe', 20) RETURNING id;

You can use functions currval() for last inserted id and nextval() for next id.

SELECT currval(pg_get_serial_sequence('users', 'id'));
SELECT nextval(pg_get_serial_sequence('users', 'id'));
-- pg_get_serial_sequence is a function for return sequence name by table name and index column name. In most cases it called like tablename_columnname_id_seq.

How to change the next increment value?

(It could become a necessary operation if you fill table ids manually)

SELECT setval(
pg_get_serial_sequence('users', 'id'),
 (SELECT MAX(id) FROM users)
);

The TRUNCATE operation doesn't flush sequence by default. It could create unwanted effects. Use the truncate command with an explicit option

TRUNCATE TABLE table_name RESTART IDENTITY;

Full-text Search

If you use full-text search functional with MySQL, you should be ready for serious work to migrate, because Postgres analog is really different.
I just publish a link to the official documentation.

How to move the data?

PgChameleon https://pgchameleon.org/documents/
PgLoader https://pgloader.io/ and other converters
via framework and ORM with both db connections. Read from one, modify and insert into another.

So, seems that is all, that I remember from my experience. If you also have an experience in migration applications from one DB system to another - welcome to comments.

DEV Community