In the previous article we learned about the structure of a Heap table and a page in PostgreSQL. In this part we'll deal with TOAST tables, also known as satellite tables. They help to split the data in terms of the length of rows.
TOAST tables
TOAST means The Oversized-Attribure Storage Technique. A TOAST table is a regular Heap table, and it essentially inherits the properties you specify on the original table.
SELECT relname, relpages, oid
FROM pg_class,
(SELECT reltoastrelid
FROM pg_class
WHERE relname = 't') AS ss
WHERE oid = ss.reltoastrelid OR
oid = (SELECT indexrelid
FROM pg_index
WHERE indrelid = ss.reltoastrelid)
Let's go back to our table t
from the previous article:
CREATE TABLE public.t
(
A INTEGER,
B INTEGER,
C VARCHAR
);
BEGIN;
INSERT INTO t (A, B, C) VALUES (1, 1, 'String #1');
COMMIT;
On the left we see a link to our table 16559—this is the identifier of the table stored in the meta layer:
If we rewrite the query WHERE class.oid = 16559
, specifying the identifier, we'll see our TOAST table with the name "pg_toast_16556". That is, the pattern of its creation is pg_toast
plus a suffix defining the OID of the parent table.
It's worth noting that the TOAST table doesn't contain a link to another TOAST. That is, there's not an infinite number of toasts; there's the first level and that's it.
To understand what a TOAST table is, we'll conduct a test. I set the string size to more than 2 kilobytes and insert it into my custom table t. I have generated a number here, and this string turned out to be very long:
Let's see what happens. If we go back and look at page #0 of my original table, we'll see that the page is practically empty. Where did the 2 kB data go?
If I make a direct query to my TOAST table, it turns out that the data are lying here:
SELECT ctid, *
FROM pg_toast.pg_toast_16556
They're divided into chunks, and each chunk contains its own part of the metadata. They're essentially separate. What does this mean? If you have long strings, then by making a query SELECT ctid, * FROM table
, you force Postgres to access the TOAST table, read these chunks, glue them together for you, and return this information outward.
My personal opinion: if you do SELECT ctid with an asterisk, beware of the asterisk and specify which columns you want. Most likely, in most cases you won't need long strings. But when long strings are needed, then Postgres works like this: it'll consolidate your data and store it in a separate structure that helps the original table function.
If you look at the TOAST table from the point of view of metadata, using the pageinspect
extension, then absolutely it has a sequence number, lower- and upper-case, offsets. That is, everything is the same as in a regular Heap table.
SELECT *
FROM page_header(get_raw_page('pg_toast.pg_toast_16556',0));
TOAST strategies
The default Postgres strategy is extended. It says that you have a TOAST table and you also archive data. Postgres tries to compress this data inside the page.
The strategy can be changed. In addition to extended there are:
- Plain—prevents compression and out-of-line storage.
- External—allows out-of-line storage but prevents compression.
- Main—allows out-of-line storage if will not be choice and allows compression.
They're applied depending on the combinations of parameters that you want to achieve. For example, do you need compression or not, do you need to store it in a TOAST table or not?
For example, there's a table t
. I query the metadata and get a history of the attributes that saturate my table:
SELECT attr.attname,
t.typname,
CASE
WHEN attstorage = 'p' THEN 'plain'
WHEN attstorage = 'x' THEN 'extended'
WHEN attstorage = 'e' THEN 'external'
WHEN attstorage = 'm' THEN 'main'
END AS attstorage
FROM pg_attribute attr INNER JOIN
pg_type t ON t.OID = attr.atttypid
WHERE attrelid =16556
ORDER BY attr.attnum
In pink, I highlighted the system attributes that are mapped in my table. The green ones are the ones I created myself, custom ones. My columns are a, b, c.
As you can see in the last column of the table in the figure 7, this is basically a plain strategy specifically for types not from the string family (plus they must be of fixed length).
The C attribute is VARCHAR type; it has extended set forth by default. This means that the strategy allows compression and also storage in TOAST. Roughly speaking, by writing a SELECT * query
, you get a small negative, since you need to go to TOAST, and glue the lines that need to be unzipped beforehand.
Different combinations of strategies can, possibly, improve your model. Why possibly? Because you need to check benchmarks and do load testing on your database and a specific model.
If you use an external strategy, which means that you can store it on a TOAST table, but there won't be compression, then the downside here is that the disks will be eaten up. But in the modern world, disks are cheap, everything should be fast, optimal, and responses should be received within milliseconds. You can start your research with an external strategy.
How can the strategy be changed? The command is usually:
ALTER TABLE public.t
ALTER COLUMN c
SET STORAGE PLAIN;
TOAST types are the usual types of working with strings, text formats, CHAR and VARCHAR. Perhaps you'll create your own custom types, which will be a so-called wrapper of the system type.
When choosing a particular strategy, it's necessary to test it using a benchmark, TPS metrics (transaction per second), latency, and only after that draw conclusions about whether to switch the strategy or not.
Book recommendations
Finally, I want to recommend interesting books from the point of view of what's inside a database. Not only Postgres is here, but other things you'll find useful:
- "Database Internals: A Deep Dive into How Distributed Data Systems Work" — Alex Petrov.
- "Readings in Database Systems" — Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker.
- "PostgreSQL Notes for Professionals", a PDF course compiled from Stack Overflow Documentation.
- "Understanding EXPLAIN".
Top comments (0)