DEV Community

Chiazam Ochiegbu
Chiazam Ochiegbu

Posted on

Dealing with Large Objects in Postgresql

Databases provide data types as suitable containers to store values accordingly. You use the int data type to store numeric values and char and varchar data types for string values. Each data type has its own limitations with respect to the size and type of data it can store. A database solution model will be based on real-life problems, therefore these are not the only types of data that you will always confront. We do not live in the ice age anymore; we have to store large-sized images, audio, and video files and varchar is certainly not the answer to this. Objects that require huge storage sizes and can’t be entertained with simple available data types are usually referred to as Large Objects (LOs) or Binary Large Objects (BLOBs). To handle these LOs, you need a LO storage mechanism that can store them easily and access them efficiently.

Let's discuss the following

  • Why large objects?
  • PostgreSQL large objects
  • Large objects in action
  • Manipulating large objects through the libpq client interface library

Why large objects?

We will see in a more comparative mode why and where we need large objects.
You can categorize large objects in general, based on the structure or type of data. The types you usually come across are as follows:

  • Simple
  • Complex
  • Semi-structured
  • Unstructured

Among the first two, simple-structured data is related to data that can be organized in simple tables and data types and complex-structured data is the one that deals with requirements such as that of user-defined data types.
In the age of the Internet, the types mentioned previously are not the only forms of data that you have to handle; you have XML and JSON as well. It’s not interpreted by a relational database in a general way. This type of data can be categorized as semi- structured. Again, referring to storage of images, audio, and videos that are used massively today and can’t be stored in the same way as the first three types of data because they can’t be broken down into smaller logical structures for interpretation by standard means. It is hence unstructured data and needs a different mechanism to handle them.
PostgreSQL answers your problems with the feature of large objects that store objects of considerably huge size, and it’s been there since the release of the PostgreSQL. Good things happened and over the time it’s got even better.

PostgreSQL large objects

Interestingly, PostgreSQL provides two ways to store large objects with respect to each requirement you have to meet. They are as follows:

  • Implementation of the BYTEA data type
  • Implementation of large object storage

Though our area of interest here has been large objects, yet we will skim through some characteristics of BYTEA. It is similar to VARCHAR and text character strings, yet it has a few distinctive features as well. It can store raw or unstructured data, but character strings do not. It also allows storing of null values. VARCHAR does not permit storing zero octets, other octets, or sequences of octet values that are not valid as per database character set encoding. While using BYTEA, you can manipulate actual raw bytes, but in the case of character strings, processing is dependent on locale setting.

BYTEA when compared with large object storage comes with a big difference of storage size; each BYTEA entry permits storage of 1 GB whereas large objects allow up to 4 TB. The large object feature provides functions that help you manipulate external data in a much easier way that could be quite complex when doing the same for BYTEA.

The preceding discussion was a small comparison and analysis to show you the available choices in PostgreSQL to store binary data using BYTEA or large object storage. A requirement is the best judge to opt any of these.

Implementing large objects

Things are well remembered when they are listed and this is how we will remember PostgreSQL large objects implementation in our memory:

  • Large objects, unlike BYTEA, are not a data type but an entry in a system table.
  • All large objects are stored in the pg_largeobject system table.
  • Each large object also has a corresponding entry in the pg_largeobject_metadata system table.
  • Large objects are broken up into chunks of default size and further stored as rows in the database.
  • These chunks in rows are B-tree indexed; hence, this ensures fast searches during read/write operations.
  • From PostgreSQL 9.3 onwards, the maximum size of a large object in a table can be 4 TB.
  • Large objects are not stored in user tables; rather, a value of the Object Identifier (OID) type is stored. You will use this OID value to access the large object. So, when you have to access a large object, you will reference the OID value that points to a large object present on the pg_largeobject system table.
  • PostgreSQL provides the read/write Application Program Interface (API) that offers client- and server-side functions. Using this API, you can perform operations such as create, modifying, and delete on large objects. OIDs are used in this function as a reference to access large objects, for example, to transfer the contents of any file to the database or to extract an object from the database into a file.
  • From PostgreSQL 9.0 onwards, large objects now have an associated owner and a set of access permissions. Retrieving data using these functions gives you the same binary data you added. Examples of the functions are lo_create(), lo_unlink(), lo_import(), and lo_export().
  • PostgreSQL provides the ALTER LARGE TABLE feature to change the definition of a large object. Remember that its only functionality is to assign a new owner.

Note
Functions for large objects must be called in a transaction block, so when auto-commit is off, make sure that you issue the BEGIN command explicitly.

Top comments (0)