Dmitriy

Posted on Dec 23, 2025

File handling in PostgreSQL: barriers and ways around them

#postgres #postgressql #webdev #programming

In an ideal world, you'd store files separately from your database. Keep the DB lean, stick files in object storage or a file server, and call it a day. But reality? Not so simple.

Some developers find it easier to work with files directly in the database. Others need to meet strict compliance requirements around storing original documents and digital signatures. And some teams just don't want the hassle of maintaining yet another storage system. So despite best practices, many teams end up storing files in their DBMS, but often without fully understanding the implications.

File storage options in PostgreSQL

PostgreSQL gives you a few ways to store files:

regular tables;
pg_largeobject — a catalog for large objects.

Let's dig into each.

Regular tables

Standard PostgreSQL tables can handle text, bytea, and JSONB data types. There's also a basic file API with commands like:

pg_ls_dir — list files and directories;
pg_stat_file — get file metadata (size, modification time, permissions, etc.);
pg_read_file — read a text file and return its contents as a string;
pg_read_binary_file — read a file (including binary data) and return bytea format.

Note: You can find the full function list here, but these aren't meant for user data management.

That said, regular tables have two big limitations when it comes to files.

First limitation: the 1 GB wall
By default, regular tables can't handle files (or any data) larger than 1GB. Try to insert something bigger, and you'll hit an error:

create table user_data (
    id bigint,
    data_bytea bytea
);

# insert into user_data (id, data_bytea)
  values (8, pg_read_binary_file('/tmp/file.1gb.bytea'));
ERROR: file length too large
Time: 669.132 ms

# select pg_read_binary_file('/tmp/file.1gb.bytea');
ERROR: file length too large
Time: 666.684 ms

You can work around this by chunking your data. Split a 5 GB file into 100 MB pieces and store each as a separate row. This bypasses the field size limit, but it complicates your application architecture — you'll need extra tables to track and reassemble the fragments.

insert into user_data (id, data_bytea)
values (81, pg_read_binary_file('/tmp/file.1gb.bytea', 0, 1024*1024*100));

insert into user_data (id, data_bytea)
values (82, pg_read_binary_file('/tmp/file.1gb.bytea', (1024*1024*100)+1, 1024*1024*100));

insert into user_data (id, data_bytea)
values (83, pg_read_binary_file('/tmp/file.1gb.bytea', (1024*1024*100)+1, 1024*1024*100));

Second limitation: inconsistent insert performance
The other issue? Insert speeds can become wildly unpredictable. Your storage might be humming along fine, transactions cruising through, and then suddenly — parallel inserts start crawling or hanging completely.

Let's simulate this: fire off multiple inserts of heavy data.

insert into user_data values (777, repeat('b', 10000)::bytea);

To understand what's happening, check pg_stat_activity to see what active processes are waiting for:

SELECT pid,
       extract(epoch from now() - state_change) as sec_state_change,
       state_change,
       wait_event,
       wait_event_type,
       state,
       query
FROM pg_stat_activity
WHERE query LIKE 'insert into user_data%'
  AND state = 'active';

Result:

pid     | sec_state_change | state_change           | wait_event     | wait_event_type | state  | query
--------+------------------+------------------------+----------------+-----------------+--------+-------
1013836 | 196.148285       | 17:58:03.280158+03     | ClientRead     | Client          | idle   | SHOW TRANSACTION ISOLATION LEVEL
1013866 | 403.011387       | 17:54:36.417056+03     | BufferIO       | IPC             | active | insert into user_data (id,data) values (777...
1013867 | 403.011375       | 17:54:36.417068+03     |                |                 | active | insert into user_data (id,data) values (777...
1013868 | 403.011407       | 17:54:36.417036+03     | BufferIO       | IPC             | active | insert into user_data (id,data) values (777...
1013869 | 403.011383       | 17:54:36.41706+03      | BufferIO       | IPC             | active | insert into user_data (id,data) values (777...
1013870 | 403.011375       | 17:54:36.417068+03     |                |                 | active | insert into user_data (id,data) values (777...
1013871 | 403.011384       | 17:54:36.417059+03     | BufferIO       | IPC             | active | insert into user_data (id,data) values (777...
1013872 | 403.011389       | 17:54:36.417054+03     | BufferIO       | IPC             | active | insert into user_data (id,data) values (777...
1013873 | 403.011077       | 17:54:36.417366+03     | BufferMapping  | LWLock          | active | insert into user_data (id,data) values (777...
1013874 | 403.011070       | 17:54:36.417373+03     |                |                 | active | insert into user_data (id,data) values (777...
1013875 | 403.011138       | 17:54:36.417305+03     |                |                 | active | insert into user_data (id,data) values (777...

You'll see mass wait events for BufferMapping and BufferIO.

Here's what's happening: when inserting large values, PostgreSQL splits data into TOAST chunks on the fly, compresses them, and hunts for free space to write them.

BufferMapping means processes are competing to allocate new buffer cache pages for this data.
BufferIO indicates the system is bottlenecked on disk I/O performance while writing those pages.

This is all due to complex processes running under the hood when writing large objects. You can read about the intricacies of large value storage in the official docs and detailed articles. The key takeaway: under high load, this mechanism becomes a bottleneck.

For each user table, PostgreSQL creates just one TOAST table, regardless of how many columns contain large data. Values from all columns end up in this single storage, which has hard limits:

Maximum size: 32 TB.
Maximum number of rows (chunks): ~4.2 billion (int4 limitation for chunk ID).

Important: The 4.2 billion record limit is the total for all columns combined, not per column.

TOAST supports four storage strategies for columns on disk:
PLAIN — for fixed-length values like integers
EXTERNAL — for external storage of uncompressed data
EXTENDED — for external storage of compressed data
MAIN — for inline storage of compressed data

Note:

The TOAST mechanism kicks in when a row value exceeds TOAST_TUPLE_THRESHOLD (usually 2KB). TOAST will compress and/or move field values outside the table until the row value drops below 2KB or can't be compressed further.

So the distribution works like this:

If data fits in the original table (under 2 KB), write it directly to the column.
If data exceeds 2 KB, store it in the TOAST table and put a reference in the main table.

If your system actively works with files, this limit can sneak up on you—and once you hit it, inserting new data becomes impossible.

You can overcome this barrier through partitioning. Each partition gets its own TOAST table, pushing back the overall limit. But this requires significant design and maintenance effort from your DBA.

Large object storage
PostgreSQL also provides the pg_largeobject system table for storing large objects. Each file is identified by an OID assigned at creation.

Note: Since pg_largeobject is a system table, it can't be partitioned.

The API for pg_largeobject resembles standard file operations: create, read, modify, and delete objects. Like TOAST, there are limits: maximum object size is 32 TB, and you can store about 4.2 billion objects (rows).

Let's see how this works. Important: the actual file data isn't stored in your user table — only the reference (OID).

-- 1. Create a table to store only the file identifier (OID)
CREATE TABLE lo (
    id bigint PRIMARY KEY,
    lo_oid oid
);

-- 2. Load a file into storage and record the OID in our table
-- (Assumes '/tmp/file.2gb.bytea' exists on the server)
INSERT INTO lo (id, lo_oid) VALUES (1, lo_import('/tmp/file.2gb.bytea'));

-- 3. Check data in the user table
SELECT * FROM lo;
-- Result:
-- id | lo_oid
-- ---+------------
-- 1  | 2032353971 &lt;-- This is the reference to the object

-- 4. Check the pg_largeobject system table where data is physically stored (split into pages)
SELECT loid, pageno, data FROM pg_largeobject WHERE loid = 2032353971 LIMIT 2;
-- Result:
-- loid       | pageno | data
-- -----------+--------+-------
-- 2032353971 | 0      | \x58546f...
-- 2032353971 | 1      | \x81234b...

-- 5. Deleting an object
-- IMPORTANT: Deleting a row from the 'lo' table DOES NOT delete the file from pg_largeobject.
-- You must explicitly delete it using lo_unlink:
SELECT lo_unlink(2032353971);

However, this approach has quirks. With intensive inserts, you might encounter OID generation contention or exhaust the OID limit. In monitoring, this shows up as OidGen lock waits:

select pid, wait_event, wait_event_type, state, left(query,15) 
from pg_stat_activity 
where backend_type = 'client backend';

pid    | wait_event    | wait_event_type | state  | left
-------+---------------+-----------------+--------+-----------------
977260 | ClientRead    | Client          | idle   | SHOW TRANSACTIO
977309 |               |                 | active | INSERT INTO lo
977310 | OidGen        | LWLock          | active | INSERT INTO lo
977311 |               |                 | active | INSERT INTO lo
977312 |               |                 | active | INSERT INTO lo
977313 |               |                 | active | INSERT INTO lo
977314 | BufferIO      | IPC             | active | INSERT INTO lo
977315 |               |                 | active | INSERT INTO lo
977316 | BufferMapping | LWLock          | active | INSERT INTO lo
977317 |               |                 | active | INSERT INTO lo
977318 | BufferIO      | IPC             | active | INSERT INTO lo
977426 |               |                 | active | select pid, wai
(12 rows)

So with pg_largeobject:

You can store up to 32TB and 4.2 billion objects.
No partitioning support.
Deleting a row (OID) from your user table leaves data in pg_largeobject untouched.

There are also significant operational downsides. Migrating objects to another storage system is cumbersome due to the lack of built-in convenient mechanisms. All this makes pg_largeobject less than ideal for fully-featured large file handling in high-load systems.

Note: Besides these options, you could roll your own implementation in PostgreSQL for large object storage, but custom solutions don't guarantee full compatibility or reliable operation with the RDBMS.

What Postgres Pro enterprise brings to the table

Postgres Pro Enterprise (starting from version 16) includes specialized mechanisms that lift the standard PostgreSQL limitations and ease migration from Oracle. For database file storage, you can use three extensions:

pgpro_sfile — extension for storing large objects inside the database;
pgpro_bfile — extension for working with files stored on disk;
dbms_lob — an Oracle DBMS_LOB package analogue that uses pgpro_sfile and pgpro_bfile.

Let's explore each one.

The pgpro_sfile module

The pgpro_sfile module lets you store numerous large objects (called sfiles in the extension's terminology) and provides functionality similar to Oracle's BLOB.

Key features:

Removing limits. Total storage volume and object count are limited only by disk space and the bigint type (2^63 – 1). This solves the 32 TB and 4 billion object problem.
Storage structure. Data is stored in a set of special tables (in the pgpro_sfile_data schema), managed by the module. Each object is split into blocks, and blocks into pages of ~8KB. User tables only store sfile object identifiers.
Performance. Supports parallel read and write operations. For example, you can write to one large object from multiple sessions simultaneously, as different blocks can be written to different storage partitions in parallel.
Management. Implements functions for creating (sf_create), reading (sf_read), writing (sf_write), trimming (sf_trim), and deleting (sf_delete) objects.

Detailed function and architecture descriptions are available in the module documentation.

The pgpro_bfile module

pgpro_bfile is a Postgres Pro Enterprise extension designed for working with external files stored on the server's disk. The module provides capabilities for managing references to files outside the database, similar to Oracle's BFILE type.

The bfile type contains two fields:

dir_id — directory ID in the bfile_directories table.
file_name — external file name.

Key features:

Security. Access is through directory objects. An admin creates a "directory" in the DB (bfile_directory_create), linked to a filesystem path, and grants read/write permissions to specific users (bfile_grant_directory).
Data access. The module provides functions for reading (bfile_read), writing (bfile_write), copying, and retrieving file properties.
Virtualization. File access goes through Postgres virtual file functions, so standard RDBMS restrictions and settings apply.

Overall, the pgpro_bfile module is an optimal solution for working with files of any size on external file storage when you don't need to store the data inside the database.

The dbms_lob module

dbms_lob is an extension that emulates the Oracle DBMS_LOB package interface. It serves as a "wrapper" over pgpro_sfile and pgpro_bfile and is critical for migration projects, allowing application code to be ported with minimal changes.

The module operates with these data types:

CLOB — type for storing large volumes of text data (like long documents, XML/JSON files). It's analogous to Oracle CLOB and stores data as the standard text type.

BLOB — type for storing binary data (images, videos, PDFs) in pgpro_sfile extension tables. It replaces Oracle BLOB and allows storing files of virtually unlimited size.

BFILE — data type for referencing a file in the server's filesystem, provided by the pgpro_bfile extension. It's analogous to Oracle's BFILE.

dbms_lob provides an extensive API for interacting with large objects, including functions for:

Creation and conversion: empty_blob(), empty_clob(), to_blob(), to_clob().
Opening and closing: open(), close(), isopen().
Reading: getlength(), read(), substr(), instr().
Modification: write(), writeappend(), erase(), trim().
Multi-LOB operations: compare(), append(), copy().

So the dbms_lob module for Postgres Pro Enterprise not only fully replicates Oracle's DBMS_LOB functionality but also allows storing data both inside and outside the database. This makes it the best solution when migrating applications from Oracle.

The bottom line

Storing files in a database is often considered suboptimal, it increases DB load and complicates operations. But when business requirements demand it, PostgreSQL and Postgres Pro Enterprise users have solid tools at their disposal.

While standard PostgreSQL mechanisms (TOAST, Large Objects) impose limits on volume (32 TB) and object count, specialized Postgres Pro Enterprise modules (pgpro_sfile, dbms_lob) let you break through these barriers, enabling petabyte-scale data storage and providing convenient interfaces for development and Oracle migration.