Quadcode team for Quadcode

Posted on Dec 21, 2022 • Edited on Dec 27, 2022 • Originally published at Medium

TOAST tables in PostgreSQL

#postgres #beginners #database

In the previous article we learned about the structure of a Heap table and a page in PostgreSQL. In this part we'll deal with TOAST tables, also known as satellite tables. They help to split the data in terms of the length of rows.

TOAST tables

TOAST means The Oversized-Attribure Storage Technique. A TOAST table is a regular Heap table, and it essentially inherits the properties you specify on the original table.

SELECT relname, relpages, oid
  FROM pg_class,
             (SELECT reltoastrelid
                 FROM pg_class
              WHERE relname = 't') AS ss
 WHERE oid = ss.reltoastrelid OR
               oid = (SELECT indexrelid
                            FROM pg_index
                         WHERE indrelid = ss.reltoastrelid)

Let's go back to our table t from the previous article:

CREATE TABLE public.t
(
 A INTEGER,
 B INTEGER,
 C VARCHAR
);

BEGIN;
INSERT INTO t (A, B, C) VALUES (1, 1, 'String #1');
COMMIT;

On the left we see a link to our table 16559—this is the identifier of the table stored in the meta layer:

If we rewrite the query WHERE class.oid = 16559, specifying the identifier, we'll see our TOAST table with the name "pg_toast_16556". That is, the pattern of its creation is pg_toast plus a suffix defining the OID of the parent table.

It's worth noting that the TOAST table doesn't contain a link to another TOAST. That is, there's not an infinite number of toasts; there's the first level and that's it.

To understand what a TOAST table is, we'll conduct a test. I set the string size to more than 2 kilobytes and insert it into my custom table t. I have generated a number here, and this string turned out to be very long:

Let's see what happens. If we go back and look at page #0 of my original table, we'll see that the page is practically empty. Where did the 2 kB data go?

If I make a direct query to my TOAST table, it turns out that the data are lying here:

SELECT ctid, * 
FROM pg_toast.pg_toast_16556

They're divided into chunks, and each chunk contains its own part of the metadata. They're essentially separate. What does this mean? If you have long strings, then by making a query SELECT ctid, * FROM table, you force Postgres to access the TOAST table, read these chunks, glue them together for you, and return this information outward.

My personal opinion: if you do SELECT ctid with an asterisk, beware of the asterisk and specify which columns you want. Most likely, in most cases you won't need long strings. But when long strings are needed, then Postgres works like this: it'll consolidate your data and store it in a separate structure that helps the original table function.

If you look at the TOAST table from the point of view of metadata, using the pageinspect extension, then absolutely it has a sequence number, lower- and upper-case, offsets. That is, everything is the same as in a regular Heap table.

SELECT *
FROM page_header(get_raw_page('pg_toast.pg_toast_16556',0));

TOAST strategies

The default Postgres strategy is extended. It says that you have a TOAST table and you also archive data. Postgres tries to compress this data inside the page.

The strategy can be changed. In addition to extended there are:

Plain—prevents compression and out-of-line storage.
External—allows out-of-line storage but prevents compression.
Main—allows out-of-line storage if will not be choice and allows compression.

They're applied depending on the combinations of parameters that you want to achieve. For example, do you need compression or not, do you need to store it in a TOAST table or not?

For example, there's a table t. I query the metadata and get a history of the attributes that saturate my table:

SELECT attr.attname, 
              t.typname, 
              CASE
                WHEN attstorage = 'p' THEN 'plain'
                WHEN attstorage = 'x' THEN 'extended'
                WHEN attstorage = 'e' THEN 'external'
                WHEN attstorage = 'm' THEN 'main'
              END AS attstorage
FROM pg_attribute attr INNER JOIN 
            pg_type t ON t.OID = attr.atttypid
WHERE attrelid =16556
ORDER BY attr.attnum

In pink, I highlighted the system attributes that are mapped in my table. The green ones are the ones I created myself, custom ones. My columns are a, b, c.

As you can see in the last column of the table in the figure 7, this is basically a plain strategy specifically for types not from the string family (plus they must be of fixed length).

The C attribute is VARCHAR type; it has extended set forth by default. This means that the strategy allows compression and also storage in TOAST. Roughly speaking, by writing a SELECT * query, you get a small negative, since you need to go to TOAST, and glue the lines that need to be unzipped beforehand.

Different combinations of strategies can, possibly, improve your model. Why possibly? Because you need to check benchmarks and do load testing on your database and a specific model.

If you use an external strategy, which means that you can store it on a TOAST table, but there won't be compression, then the downside here is that the disks will be eaten up. But in the modern world, disks are cheap, everything should be fast, optimal, and responses should be received within milliseconds. You can start your research with an external strategy.

How can the strategy be changed? The command is usually:

ALTER TABLE public.t 
ALTER COLUMN c 
SET STORAGE PLAIN;

TOAST types are the usual types of working with strings, text formats, CHAR and VARCHAR. Perhaps you'll create your own custom types, which will be a so-called wrapper of the system type.

When choosing a particular strategy, it's necessary to test it using a benchmark, TPS metrics (transaction per second), latency, and only after that draw conclusions about whether to switch the strategy or not.

Book recommendations

Finally, I want to recommend interesting books from the point of view of what's inside a database. Not only Postgres is here, but other things you'll find useful:

"Database Internals: A Deep Dive into How Distributed Data Systems Work" — Alex Petrov.
"Readings in Database Systems" — Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker.
"PostgreSQL Notes for Professionals", a PDF course compiled from Stack Overflow Documentation.
"Understanding EXPLAIN".

DEV Community

TOAST tables in PostgreSQL

TOAST tables

TOAST strategies

Book recommendations

Top comments (0)

Read next

Aurora Limitless - Resharding

Removing code smells: Using dependency injection through Props in React

Day 9: Terminal Forms 📇

New to Dev.to. What do you usually do here?