Definition
A status field is a relational database column or relationship (foreign key) in a table that indicates the current state or condition of each record in that table.
The value of this field is expected to change throughout the lifetime of any given table record.
Use Case
For this use case example, within a contrived e-commerce system, a catalogue products
table might have status values such as:
in stock |
In warehouse stock. May be purchased and shipped. |
on order |
On back order. May be purchased and shipped. |
unavailable |
Unavailable for purchase. May be viewed in customer order history. |
deleted |
Should not be viewable to customers. |
Various database structures may be used to store this status. We'll explore these below.
A naive design: Use a varchar field
The most naive database design would simply store this status
field as a varchar
type:
/* Postgres */
DROP TABLE IF EXISTS products CASCADE;
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(50) NOT NULL UNIQUE,
status VARCHAR(20) NOT NULL
);
DROP INDEX IF EXISTS idx_product_status;
CREATE INDEX idx_product_status
ON products(status);
Adding some sample data
INSERT INTO products
(title, sku, status)
VALUES ('EcoBoost Portable Charger', 'SKU-ECBW-1234', 'in stock'),
('AquaPure Water Filter', 'SKU-AQPF-5678', 'on order'),
('SolarGlow Garden Lights', 'SKU-SGLA-9101', 'unavailable'),
('FitFlex Yoga Mat', 'SKU-FFYM-1121', 'deleted'),
('BreezeAir Conditioner', 'SKU-BACA-3141', 'in stock'),
('CrispSound Bluetooth Speaker', 'SKU-CSBS-5161', 'on order'),
('SmoothBlend Juicer', 'SKU-SBJG-7181', 'unavailable'),
('QuickCook Microwave Oven', 'SKU-QCMO-9201', 'deleted'),
('UltraView Binoculars', 'SKU-UVBK-1221', 'in stock'),
('ProFit Running Shoes', 'SKU-PFRS-3241', 'in stock');
will give us a data set that looks like this:
SELECT product_id,
title,
sku,
status
FROM products;
product_id | title | sku | status
------------+------------------------------+---------------+-------------
1 | EcoBoost Portable Charger | SKU-ECBW-1234 | in stock
2 | AquaPure Water Filter | SKU-AQPF-5678 | on order
3 | SolarGlow Garden Lights | SKU-SGLA-9101 | unavailable
4 | FitFlex Yoga Mat | SKU-FFYM-1121 | deleted
5 | BreezeAir Conditioner | SKU-BACA-3141 | in stock
6 | CrispSound Bluetooth Speaker | SKU-CSBS-5161 | on order
7 | SmoothBlend Juicer | SKU-SBJG-7181 | unavailable
8 | QuickCook Microwave Oven | SKU-QCMO-9201 | deleted
9 | UltraView Binoculars | SKU-UVBK-1221 | in stock
10 | ProFit Running Shoes | SKU-PFRS-3241 | in stock
(10 rows)
The problem here is that the status
varchar type is unconstrained; the value could be any string and this could lead to data inconsistencies.
An improved design: use an enum field
This improved design makes use of an ENUM
type to define a restricted set of valid status
values:
/* Postgres */
DROP TABLE IF EXISTS product_status CASCADE;
DROP TYPE IF EXISTS product_status CASCADE;
CREATE TYPE product_status AS ENUM (
'in stock',
'on order',
'unavailable',
'deleted'
);
DROP TABLE IF EXISTS products CASCADE;
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(50) NOT NULL UNIQUE,
status product_status NOT NULL -- **Note** type here
);
DROP INDEX IF EXISTS idx_product_status;
CREATE INDEX idx_product_status
ON products(status);
This limits the possible value of status
to one of the defined string values; 'in stock', 'on order', 'unavailable' or 'deleted'.
There are several benefits of using an enum
type over a varchar
:
-
Data Integrity: ensure that the value is always within a specific set of values. This is not possible with varchar (unless you add a
CHECK
constraint). -
Performance:
enum
values are stored as integers, making comparing and filtering data more efficient.. -
Indexing:
enum
types can be more efficient than indexingvarchar
columns, which can lead to faster search and retrieval of data. -
Storage:
enum
values are stored as integers, which can be more space-efficient thanvarchar
.
However, adding new values to an enum
type requires database schema changes, which may be a heavy operation if your database is large.
Metadata
Metadata is data that provides information about other data.
These enum
status values have the following metadata with regards to the relevant products
table record:
Value | In stock | Buyable | Active |
---|---|---|---|
in stock |
Yes | Yes | Yes |
on order |
No | Yes | Yes |
unavailable |
No | No | Yes |
deleted |
No | No | No |
This metadata now need to be implemented through some business logic in code.
Something like:
# status.py
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class ProductStatus:
"""A data model for product status"""
is_in_stock: bool
is_buyable: bool
is_active: bool
@classmethod
def create(cls, status: str) -> ProductStatus:
"""Create a `ProductStatus` instance derived from the given string"""
match status.lower():
case "in stock":
return ProductStatus(
is_in_stock=True,
is_buyable=True,
is_active=True,
)
case "on order":
return ProductStatus(
is_in_stock=False,
is_buyable=True,
is_active=True,
)
case "unavailable":
return ProductStatus(
is_in_stock=False,
is_buyable=False,
is_active=True,
)
case "deleted":
return ProductStatus(
is_in_stock=False,
is_buyable=False,
is_active=False,
)
case _:
raise ValueError(f"Unable to determine product status '{status}'")
This works well enough, but it does split the domain between the database and the code base.
It would be better if we could represent the state within the database structure itself.
The next refactor: Add state columns
In order to store these state values better in this database, we could add a few flag columns to the products
table:
/* Postgres */
DROP TABLE IF EXISTS products CASCADE;
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(50) NOT NULL UNIQUE,
is_in_stock BOOLEAN NOT NULL,
is_buyable BOOLEAN NOT NULL,
is_active BOOLEAN NOT NULL
);
This is an improvement, as we now have status attributes for each products
table record.
But, some limitations remain.
We cannot add any metadata to the various status flags. We also would need to add further columns if we ever needed a status that requires additional state flags. This would necessitate an ALTER
operation on our large products
table.
We have also lost the ability to succinctly allocate a single status value to a product record.
Good database design: Apply normalisation
The best design would be to abstract product status
from the products
table.
To achieve this, we normalise the database structure by adding a foreign key to a product_status
table.
Using a foreign key for representing record status in a database table, rather than an enum, has several advantages:
- Data Integrity: Foreign keys enforce referential integrity, preventing invalid or inconsistent status values.
- Flexibility and Extensibility: Easily add or modify status values without altering the table structure.
- Normalization: Foreign keys follow the principles of database normalization by reducing data redundancy.
- Consistency: A foreign key ensures that the status values are consistent throughout the database, as they are stored in a single table.
- Indexing and Performance: Foreign key relationships are optimised, making lookups and joins between related tables more efficient.
/* Postgres */
DROP TYPE IF EXISTS product_status CASCADE; -- Drop the type that we created earlier
DROP TABLE IF EXISTS product_status CASCADE;
CREATE TABLE product_status (
product_status_id SERIAL PRIMARY KEY,
product_status_usid VARCHAR(50) NOT NULL UNIQUE, -- unique string identifier
description VARCHAR(200) NULL,
is_in_stock BOOLEAN NOT NULL,
is_buyable BOOLEAN NOT NULL,
is_active BOOLEAN NOT NULL
);
DROP TABLE IF EXISTS products CASCADE;
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(50) NOT NULL UNIQUE,
product_status_id INTEGER NOT NULL,
FOREIGN KEY (product_status_id)
REFERENCES product_status (product_status_id)
);
Next, let's create records in product_status
, for the various status values, and associated state flags.
/* Postgres */
INSERT INTO product_status
(product_status_usid, description, is_in_stock, is_buyable, is_active)
VALUES ('in stock', 'Product is in stock', true, true, true),
('on order', 'Product is on back order', false, true, true),
('unavailable', 'Product is unavailable', false, false, true),
('deleted', 'Product is deleted', false, false, false);
Which gives us:
SELECT product_status_id AS id,
product_status_usid AS usid,
description,
is_in_stock AS in_stock,
is_buyable AS buyable,
is_in_stock AND is_buyable AS shippable, -- derived attribute
is_active AS active
FROM product_status;
id | usid | description | in_stock | buyable | shippable | active
----+-------------+------------------------------------+----------+---------+-----------+--------
1 | in stock | Product is in stock | t | t | t | t
2 | on order | Product is on back order | f | t | f | t
3 | unavailable | Product is unavailable | f | f | f | t
4 | deleted | Product is deleted | f | f | f | f
(4 rows)
And re-add our sample product data, this time using a foreign key id for the status:
INSERT INTO products
(title, sku, product_status_id)
VALUES ('EcoBoost Portable Charger', 'SKU-ECBW-1234', 1),
('AquaPure Water Filter', 'SKU-AQPF-5678', 2),
('SolarGlow Garden Lights', 'SKU-SGLA-9101', 3),
('FitFlex Yoga Mat', 'SKU-FFYM-1121', 4),
('BreezeAir Conditioner', 'SKU-BACA-3141', 1),
('CrispSound Bluetooth Speaker', 'SKU-CSBS-5161', 2),
('SmoothBlend Juicer', 'SKU-SBJG-7181', 3),
('QuickCook Microwave Oven', 'SKU-QCMO-9201', 4),
('UltraView Binoculars', 'SKU-UVBK-1221', 1),
('ProFit Running Shoes', 'SKU-PFRS-3241', 1);
We can now use a JOIN
to return fields from both tables:
SELECT p1.title,
p1.sku,
p2.description AS status
FROM products AS p1
JOIN product_status AS p2
ON p1.product_status_id = p2.product_status_id;
title | sku | status
------------------------------+---------------+--------------------------
EcoBoost Portable Charger | SKU-ECBW-1234 | Product is in stock
AquaPure Water Filter | SKU-AQPF-5678 | Product is on back order
SolarGlow Garden Lights | SKU-SGLA-9101 | Product is unavailable
FitFlex Yoga Mat | SKU-FFYM-1121 | Product is deleted
BreezeAir Conditioner | SKU-BACA-3141 | Product is in stock
CrispSound Bluetooth Speaker | SKU-CSBS-5161 | Product is on back order
SmoothBlend Juicer | SKU-SBJG-7181 | Product is unavailable
QuickCook Microwave Oven | SKU-QCMO-9201 | Product is deleted
UltraView Binoculars | SKU-UVBK-1221 | Product is in stock
ProFit Running Shoes | SKU-PFRS-3241 | Product is in stock
(10 rows)
The value of a usid
The unique string identifier (usid) product_status_usid
value is useful for reducing cognitive load when constructing queries.
For example:
SELECT p1.sku AS sku
FROM products AS p1
JOIN product_status AS p2
ON p1.product_status_id = p2.product_status_id
WHERE p2.product_status_usid = 'in stock'; -- All products in stock
is easier to understand at a glance, than
SELECT p1.sku AS sku
/* ... snipped ... */
WHERE p1.product_status_id = 1; -- What does 1 denote?
Similarly, when referring to these foreign key records in code, we do not want to use a primary key integer value as a constant (as these are strictly-speaking not constant) identifier. Rather, we would want to use the usid for this.
Extensibility
Adding a new status
Should we need to add a new status (for example pre-order
) to our system, it is as simple as adding a new record to the product_status
table.
/* Postgres */
INSERT INTO product_status
(product_status_usid, description, is_in_stock, is_buyable, is_active)
VALUES ('pre-order', 'Product is available for pre-order', false, true, true);
Adding a status log
Another benefit that this abstraction offers us, is the ability to extend our architecture fairly easily.
For example, to add a table to log status changes.
/* Postgres */
DROP TABLE IF EXISTS product_status_log CASCADE;
CREATE TABLE product_status_log (
product_id INTEGER NOT NULL,
product_status_id INTEGER NOT NULL,
logged_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
FOREIGN KEY (product_id)
REFERENCES products (product_id),
FOREIGN KEY (product_status_id)
REFERENCES product_status (product_status_id)
);
CREATE INDEX idx_product_status
ON product_status_log (product_id, product_status_id);
And we have a nice log
SELECT p3.product_status_usid AS status,
p2.logged_at AS log_timestamp
FROM products AS p1
JOIN product_status_log AS p2
ON p1.product_id=p2.product_id
JOIN product_status AS p3
ON p2.product_status_id = p3.product_status_id
WHERE p1.sku = 'SKU-SGL-9101'
ORDER BY p2.logged_at ASC;
status | log_timestamp
-------------+-------------------------------
in stock | 2023-08-07 22:46:21.388738+02
on order | 2023-08-07 22:46:57.509255+02
in stock | 2023-08-07 22:47:01.686259+02
on order | 2023-08-07 22:47:19.070394+02
in stock | 2023-08-07 22:47:26.662571+02
unavailable | 2023-08-07 22:47:31.837687+02
deleted | 2023-08-07 22:47:37.574532+02
(7 rows)
Cheers!
Change log
Date | Description |
---|---|
2023-10-18 | Add "Definition" section |
2023-10-18 | Add "Use Case" section |
2023-10-18 | Qualify "adding new values to an enum type ..." |
2023-10-19 | Apply SQL style |
2023-10-20 | Add images |
Top comments (6)
Can you help me understand this statement
"However, adding new values to an enum type requires an ALTER TYPE statement, which can be a heavy operation if your database is large."
Adding new values to new enum types impacts the product table?
My apologies; you are quite correct @ramakrishnan83
That statement may hold true for MySQL, which requires an ALTER TABLE to add a value to an ENUM. I do not believe it is true for Postgres, which has abstracted the enum into a TYPE that may be altered.
Great article! Well done! 👏
Keep going
good work
Nice!
How would you go about if
product_status
on order
status would have an associatedDate
value?