There are many different types of tables in PostgreSQL. Each of them is designed for specific tasks. The most common and well-known is the heap table. I wrote about its structure in another article. The standard table allows you to store rows, update data, and make OLAP and OLTP requests.
However, there are still a number of tables that are simply forgotten about. In my opinion, the interesting tables now are unlogged and temporary tables. In this article we'll talk about them and compare them with logged tables.
Logged Tables in PostgreSQL
In PostgreSQL, a logged table is a type of table for which all changes to the data are recorded in the write-ahead log, which is used for crash recovery and replication.
Let's see what a log is in relational databases. To do this, I'll draw an analogy with the real world. You get a key at the reception desk and sign in to the visitor's log. The signature is your ID. The timestamp you enter in the log shows when you took the desired key. Your reception desk logbook entry is a pre-recording of the event you want to do with the key you received. For example, you take the key from the stand and go to open the office.
When we start a transaction, PostgreSQL saves events to log files. Log files within PostgreSQL are alerting files: WAL files or Write-Ahead Log files. They first write to themselves the change you want to make, and only then is that change applied to the table itself, located in memory or on disk.
Log files are needed in order to ensure the operability of relational databases in terms of ACID. Let me remind you that ACID is a set of relational database properties that guarantee, among other things, the reliability of the transaction:
- Atomicity.
- Consistency.
- Isolation.
- Durability.
These four letters also have to do with how a relational database can recover from errors.
The log files provide the possibility of data recovery if for some reason the server or process has failed. When the process is reactivated, the relational database restores itself by reading the log files. To do this, Postgres uses an algorithm of the ARIES family (Algorithms for Recovery and Isolation Exploiting Semantics), which:
- Generates undo and redo lists.
- Plays transactions from the redo list, with the condition that the occurred checkpoint is executed/not executed.
- Rolls back transactions from the undo list to preserve data consistency.
Another reason for using log files is data replication from the master to independent StandBy servers. With the help of preemptive recording files, we can replay events on dependent replicas that occurred on the database master server.
Moreover, log files can act as a source of an event model for reproducing changes in a corporate data warehouse, using, for example, the Debezium tool.
Let's look at a simple example:
BEGIN;
INSERT INTO test.log VALUES (1, 'Peter');
COMMIT | ROLLBACK ;
I start the transaction with the word BEGIN
, then I insert it into a pre-created Logged table test.log
. And I insert a test entry with Peter and a 1 as the identifier. After that, I do either COMMIT
or ROLLBACK
.
As a result, our simple transaction records its every step at the log file level:
Unlogged Tables in PostgreSQL
An unlogged table is a type of table for which changes are not recorded in the write-ahead log. So, let's consider the same test case with an unlogged table to see the changes.
BEGIN;
INSERT INTO test.unlog VALUES (1, 'Peter');
COMMIT | ROLLBACK ;
Unlike the previous example, data changes using INSERT
, UPDATE
or DELETE
operations don't get into the log files. They go directly to the data storage segment that corresponds to our table:
To better understand the difference between logged and unlogged tables, let's move on to a more illustrative example.
Logged table. I'm creating a standard log table, logged by default:
CREATE TABLE test.log
(id INTEGER,
name VARCHAR);
BEGIN;
SELECT txid_current( );
SELECT pg_current_wal_lsn( );
-- 1/AC0AFC8
SELECT pg_walfile_name(pg_current_wal_lsn( ));
-- 00000001000000010000000A
INSERT INTO test.log VALUES (1,'test1');
INSERT INTO test.log VALUES (2,'test2');
INSERT INTO test.log VALUES (3,'test3');
COMMIT;
The first function txid_current
returns the transaction number. Next, I get the current LSN (with a comment in the code, I specified the number that was returned to me after executing the pg_current_wal_lsn
function). I also get this number for the pg_walfile_name
function, and as a result of its execution I get the name of the current file that corresponds to this LSN record.
To understand how to interpret this, let's go back to the analogy with the key and the log at the reception desk. I made an entry in the log that it was I who took the key. This is nothing other than the LSN. If a lot of people often take keys, then the log (like a notebook) will end sooner or later. Then you'll need to get a new log (like a notebook). In the world of logged tables, a new log is a new WAL file; it appears when the current one is filled up. By default, the size of the preemptive write file in PostgreSQL is 16 MB. You can vary the size within the postgresql.conf configuration file.
Then in the test I sequentially perform simple inserts into the table, one after another, and at the end I commit the data with COMMIT
. Let's see what happened as a result:
Using the pg_waldump
utility, which is included in the PostgreSQL installation packages, we can get references to our session and our transaction from the preemptive record file. This is done with various keys for the command line. I specify the LSN from which I want to start dumping this binary file, and specify the name of the WAL file that I received as part of my session.
Then we find that the log file for our logged table contains the specified order of INSERT
commands from our transaction. There are three of them here:
- INSERT+INIT.
- INSERT with offset 2.
- INSERT with offset 3.
At the end we find COMMIT
- our recording of the data. The data is logged and then published for general access to all users.
Unlogged tables. Now let's see how an unlogged table behaves. The test is the same, only previously I created a table with the unlogged option. The code tells the database that my table is unlogged.
CREATE TABLE test.unlog
(id INTEGER,
name VARCHAR);
BEGIN;
SELECT txid_current( );
SELECT pg_current_wal_lsn( );
-- 1/AC0B0E0
SELECT pg_walfile_name(pg_current_wal_lsn( ));
-- 00000001000000010000000A
INSERT INTO test.unlog VALUES (1,'test1');
INSERT INTO test.unlog VALUES (2,'test2');
INSERT INTO test.unlog VALUES (3,'test3');
COMMIT;
If we run the same test, make a dump of it and read it through the pg_dump
utility, we'll see only COMMIT
. There will be no inserts in the results:
In the figure above, the indicator means there are no corresponding events.
An unlogged table is a data set that isn't controlled by a WAL file.
When Unlogged Tables are Suitable
Unlogged tables are useful when the data in them isn't critical, but the filling speed is critical.
The simplest example is OLAP queries for generating data marts. If the marts are filled based on a process, you aren't afraid of the risk of data loss, because there's that same ETL (sometimes ELT) process that transforms the data at the SQL query level (possibly not SQL). You can simply restart the process and get the data again.
Headers of Logged and Unlogged Tables
To continue the comparison, let's look at the headers of our two tables. To do this, I use the pageinspect
extension, which provides methods for working with pages within a specific table. So, let's consider the result of executing the page_header
function, which returns page headers number 0 for both tables.
In a comparative analysis, the table headers are exactly the same, because the data in them are identical. The only thing that's different is the first column storing the LSN. There's a number in the logged table, there's simply no number in the unlogged table. This means that the unlogged table is represented by data in its own data file, but not in the logging files.
If the PostgreSQL process crashes for some reason or the electricity is turned off, this will be an extreme shutdown of your PostgreSQL instance. In this case, when restoring the system, PostgreSQL will apply the TRUNCATE
command on the unlogged table. This command deletes all rows from the table, which means your data will be lost. It should be noted that if there was a "correct" stop of the PostgreSQL server, for example, through the operation systemctl stop postgres
, then the data in the unlogged tables will be saved at the next start.
So, with unlogged tables, you need to keep the following in mind:
- There is a risk of data loss.
- High speed of filling / changing data.
How to Сonvert a Logged Table into an Unlogged One or Vice Versa
You can convert the logged and unlogged states by applying the following commands:
ALTER TABLE test.t1 SET LOGGED;
ALTER TABLE test.t1 SET UNLOGGED;
It's important to know that when you try to make a table logged from unlogged, the change process temporarily blocks SQL traffic that uses the table as part of CRUD
operations. These are all operations that are possible in a table from the user's point of view: INSERT
, UPDATE
, DELETE
, SELECT
.
User sessions that attempt to access the table will wait until the ALTER TABLE
command completes. That is, you will have sessions waiting on the backend and/or frontend. Of course, if you have set up a timeout for the session, then it will wait for the set time.
When you convert a table to a logged one, then all the contents begin to be written to the log files. This is necessary so that the preemptive write file has all the data in the event of a failure. If the table is large, the process of filling in the WAL files will take much longer than a couple of seconds.
The SET UNLOGGED
operation to convert a table to an unlogged one is fast. It simply changes the status at the PostgreSQL metadata level that your table is no longer logged, and it no longer needs to write to the preemptive write files. The SET UNLOGGED
command temporarily blocks the metadata level, so you need to make a small window in CRUD
operations for the change to apply.
Pros and Сons of Unlogged Tables
Let's summarize unlogged tables in PostgreSQL.
Pros:
- High speed for UPDATE, DELETE, INSERT operations. I conducted my test on my PostgreSQL server, and the coefficient for TPS - the number of transactions per second - for the unlogged table was 9 times higher than the logged one.
- Any indexes and TOAST tables will automatically be unlogged, since these are dependent objects that "inherit" the properties of the main object.
Let me remind you that TOAST tables are satellite tables. They help to store long strings and divide them into pieces within an additional table that supports the original one. I talked about this in more detail in another article.
Cons:
- Automatic clearing of data in the table after a crash.
- The table content is not replicated on the StandBy server, since there are simply no events based on the data in the master server's WAL files.
Temporary Tables in PostgreSQL
Let us turn our attention to temporary tables. Temporary tables in PostgreSQL are a type of table that exists only for the duration of a database session or transaction.
CREATE TEMPORARY TABLE tmp
(id INTEGER,
name VARCHAR)
ON COMMIT DELETE ROWS
[ON COMMIT PRESERVE ROWS]
[ON COMMIT DROP];
When creating a temporary table, you can't explicitly specify a schema for the table. All temporary tables in PostgreSQL fall into the corresponding schema named pg_temp_xxx
, where xxx is your session number (for example pg_temp_8
or pg_temp_165
). The schema for the session is created automatically by PostgreSQL itself.
The creation operation has rules for forming the life cycle of a temporary table at the code level:
-
ON COMMIT PRESERVE ROWS
is set by default. This means that if youCOMMIT
orROLLBACK
data within transactions, the rows in the table will be saved until you end the session. Only after that will the table be automatically deleted. - The
ON COMMIT DROP
option means that if there is aCOMMIT
orROLLBACK
of data within the transaction, the table will be automatically deleted from your session immediately. - The
ON COMMIT DELETE ROWS
option means that the structure will be preserved at the data commit level duringCOMMIT
orROLLBACK
, but the rows will be deleted. But in any case, when the session is closed, the table will be deleted automatically.
The life cycle of a temporary table looks schematically as shown in the figure below:
When Temporary Tables are Suitable and When Not
Temporary tables are suitable as a temporary buffer for storing your data set during a mart rebuild. They are convenient to use for analytical traffic.
I've encountered temporary tables in situations where a mart that collects a large stack of data is a kind of monolith within an SQL query, and this SQL query can run for several hours. In order to optimize such a query in some way, you can divide it into SQL parts for preliminary data preparation and store the prepared data parts in temporary tables.
You can temporarily store data within your entire pipeline (a series of processes) and fill in a temporary table as part of the data retrieval and transformation. At the end of the process, you can use a temporary table, for example, in JOIN
operations with permanent tables. After the data is committed and you exit the session, all temporary tables will be automatically deleted. The next time the process is called, they are created again in a new way and then it goes round in a circle.
A temporary table can also contain indexes. You can create a temporary structure, upload data there, create an index on them, and then make a query to this temporary table and get your queries accelerated.
I don't recommend using temporary tables if they're needed on the backend as part of OLTP traffic. An example of such traffic is the purchase of concert tickets or bank transactions. In other words, it's a fast business transaction that has to get your data, save them and give you a response.
A large uncontrolled flow of creating temporary tables, and hence temporary schemes, interferes with the optimizer, the process of rebuilding statistics. You have to be careful with this.
Tests with Examples of Creating a Temporary Table
Let's look at the following test with the creation of a temporary table inside a transaction:
BEGIN;
SELECT txid_current( );
SELECT pg_current_wal_lsn( );
-- 1/AC54128
SELECT pg_walfile_name(pg_current_wal_lsn( ));
-- 00000001000000010000000A
CREATE TEMPORARY TABLE temp
(id INTEGER,
name VARCHAR)
ON COMMIT PRESERVE ROWS;
COMMIT;
-- close connection
Before creating the table itself, I get the same information as in the previous example with the logged table: the LSN and the name of the current WAL file that corresponds to this LSN record.
Let's see what's in the log file:
When creating the temporary table, 120 events occurred. That is, its very creation initiates writing to the log file. And as soon as we close the connection to the database, 40 more events occur about the fact that the table itself has been deleted.
Now we'll preliminarily create a temporary table in the session:
CREATE TEMPORARY TABLE temp
(id INTEGER,
name VARCHAR)
ON COMMIT PRESERVE ROWS;
BEGIN;
SELECT txid_current( );
SELECT pg_current_wal_lsn( );
-- 1/AC90368
SELECT pg_walfile_name(pg_current_wal_lsn( ));
-- 00000001000000010000000A
INSERT INTO temp VALUES (1,'test1');
INSERT INTO temp VALUES (2,'test2');
INSERT INTO temp VALUES (3,'test3');
COMMIT;
With such a test, the log file will have the same picture as it was for the non-logged table:
Headers of Logged and Temporary Tables
If we compare the headers of the zero pages of a regular logged table and a temporary table, it will look exactly the same as for an unlogged table:
Everything is the same except for the LSN. For a temporary table, it's empty, and because of this, the contents of this table aren't transmitted to StandBy. If there's no record in the preemptive record files, there's nothing to transmit to the dependent servers as physical replication.
Pros and Cons of Temporary Tables
Pros:
- Acceleration for UPDATE, DELETE, INSERT operations.
- Any indexes and TOAST tables will automatically be temporary.
- For temporary tables, you can't manually define the schema of the database, because it's created automatically and defined for your temporary table automatically, too.
Cons:
- Table content is not physically replicated to StandBy servers.
- It's necessary to create a temporary table each time for a new session.
In the Next Series
We've looked at the difference between logged and unlogged tables, and also talked about cases where temporary tables can be useful. In the following material we'll analyze:
- Clustered tables.
- Foreign tables.
- Partitioned tables.
- Inherited tables.
Top comments (0)