Comparing Data Handling in MySQL and PostgreSQL: the RETURNING clause and CTE's

As a follow-up to my last post titled "Comparing Data in MySQL and PostgreSQL", I'd like to discuss some differences between these two databases when it comes to actually working with data. And like my previous post, many of these observations were enlightening as to the benefits PostgreSQL can offer.

One major inconvenience of working with MySQL is the need to run an additional query in order to get a row that was just inserted. At best, MySQL only returns the id of the most recently inserted row, but no additional columns (the id is critical since it was auto-incremented and the user would presumably not know what it is). If using MySQL in node.js, for example, an INSERT query returns a json message such as this, where the id of the inserted row is 6:

{
    "fieldCount": 0,
    "affectedRows": 1,
    "insertId": 6,
    "serverStatus": 3,
    "warningCount": 0,
    "message": "",
    "protocol41": true,
    "changedRows": 0
}

In command-line MySQL, however, one would have to use the LAST_INSERT_ID() function to run a query to get said id:

mysql> SELECT LAST_INSERT_ID();
       +------------------+
       | LAST_INSERT_ID() |
       +------------------+
       |                6 |
       +------------------+
       1 row in set (0.03 sec)

...and then you would still have to perform another query to get any other columns from the row.

PostgreSQL, however, has the unbelievably sensible RETURNING clause, which can't be lived without after knowing about it (guilty). RETURNING is essentially a bridge from an INSERT query directly into a SELECT query in which you decide what parts of the inserted row are returned and how. Here's an example in a PostgreSQL database sports, where a sport is inserted into a table sports, and its id is returned, along with the name (aliased as "sportName"), but NOT the season:

sports=# INSERT INTO sports(name, season)
         VALUES('football', 'fall')
         RETURNING sports.id, sports.name AS "sportName";

         id | sportName 
         ----+-----------
           3 | football
         (1 row)

         INSERT 0 1

RETURNING is a not-so-subtle difference in functionality between these two databases that's hard to ignore. But its functionality has more ramifications when considering Common Table Expressions. CTE's are essentially a means of aliasing subqueries that present the user with a very readable code block. Both MySQL and Postgres have CTE's, but when combined with Postgres's RETURNING clause, a CTE can accomplish streamlined sequential insertions without the need for multiple transactions. In the example below, a CTE is used to get the id of the sport we added previously, then a new league 'NFL' is inserted with the id of the selected sport as foreign key id_sport, and once the league exists, a team is inserted using the league's id and sport's id.

sports=# WITH sport AS (
           SELECT id
           FROM sports
           WHERE name = 'football'
         ), league AS (
           INSERT INTO leagues (name, id_sport, level)
           VALUES ('NFL', (SELECT sport.id FROM sport), 'professional')
           RETURNING id, id_sport
         )
         INSERT INTO teams (designation, nickname, id_sport, id_league)
         SELECT 'New Orleans', 'Saints', sport.id, league.id
         FROM sport
         JOIN league ON TRUE
         RETURNING * \gx

         -[ RECORD 1 ]------------
         id          | 7
         designation | New Orleans
         nickname    | Saints
         id_sport    | 3
         id_league   | 2

         INSERT 0 1

Accomplishing the same thing in MySQL might look something like the following example, where the league is inserted, then in a new query, the id of the just-inserted league is selected to use as the foreign key to insert the team. And if any information beyond the id is needed for the team, another query will certainly have to be made of course.

mysql> INSERT INTO leagues (name, id_sport, level)
    -> VALUES ('NFL', (
    ->     SELECT sports.id
    ->     FROM sports
    ->     WHERE name = 'football'
    ->   ), 'professional');
       Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO teams (designation, nickname, id_sport, id_league)
    -> SELECT 'New Orleans', 'Saints', (
    ->     SELECT sports.id
    ->     FROM sports
    ->     WHERE name = 'football'
    ->   ), (
    ->     SELECT LAST_INSERT_ID()
    ->   );
       Query OK, 1 row affected (0.00 sec)
       Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from teams \G
       *************************** 1. row ***************************
                id: 7
       designation: New Orleans
          nickname: Saints
          id_sport: 3
         id_league: 2
       1 row in set (0.00 sec)

The difference here may seem trivial, but Postgres's singular query capability could have a significant impact when writing queries away from the command line and within an application. This is, in fact, something I personally have been inconvenienced by when using MySQL before I knew any different. It wasn't until writing these comparisons that I came to realize some of the things I thought a database should logically be able to do were, indeed, possible—with another database. This is not to say that MySQL is without merit, of course (it is the most popular open-source database after all—over twice as popular as Postgres, in fact), but that sometimes, branching out into unfamiliar territory can produce solutions you may not have known existed. Fittingly, other database users appear to be making some of the same conclusions, as PostgreSQL and others continue to make gains in evening out the playing field of relational databases.
source: db-engines.com/

Top comments (5)

Andrei Gatej • Sep 22 '20

Great article, thank you for sharing?

I’d be curious to know, what’s the difference between the first pg snippet and doing the same thing “imperatively” in node?(e.g make a req to insert first, then get the id, the make the second request).

Is it that with the first approach, everything is happening under the same TCP connection(or the same transaction)?

Grant • Sep 23 '20 • Edited

Hi Andrei!

In the first PG snippet, since the row/id can be returned automatically, it eliminates a step. For example, in node, you could save the result of that first INSERT (the id) to a variable, then use the variable in another query. Without this, you would have to do three steps like you mentioned: INSERT, SELECT, and whatever else. Either way it would be done in one connection, but it's shorter with PG, and it can really help if you want more columns than just the id. It can also be helpful if you do an UPDATE, which is another situation where you might not really be sure what happened, like the auto-incrementing id. Does that answer your question?

Andrei Gatej • Sep 23 '20

Yes, thank you very much. So with both approaches, only one connection is made? I find it a bit strange, since with the “node” approach we have multiple requests and with the other one we have only one big query. Or is the same connection used among all the requests?

Grant • Sep 23 '20

Yes, it would be one connection if I understand what you are asking, though you would presumably do multiple steps within a transaction, so that you have better control of how the overall operation behaves (for ACID compliance).