Intorduction
We all have used software that allows us to undo or redo our actions. Think about making a huge change to an image in Photoshop only to undo it with the press of button, and then change your mind and redo that change, you can even keep changing your mind and undo/redo continuously.
But what about online apps, things that have a database in the backend. Can we easily undo and redo our last action? what if it involves possibly updating multiple tables?
The answer is YES! We can add rollback and replay features to a database, even if our changes involve multiple tables. Suprisngly, it's not hard to implement.
Not only can we undo and redo our database edits, but also keep track of what happened and how the record in our database have exactly updated. This solution works even if we didn't plan the schema from the get-go with undos/redos in mind. This solution is so robust that we can retro fit any schema with this solution so it can support undos and redos. All we need is to create a self-maintaining log table that will serve as some sort of "event sourcing" table, but the rest of the schema can do whatever it wants. The log table is completely independant of the rest of the schema.
We will see a postgres implementation. I use the postgres language syntax (PLPGSQL), but the same idea can be implemented on your application server instead of on the database, or on a different database. I choose to put the logic on the database instead of the application server to avoid having to do a lot of round-trips between the database and the application server. If I perform the operation in the database, the data is already there, I don't have to waste precious time moving data over the network.
I have made a toy accounting appliction github repo to demo an application that uses the undo/redo feature: https://github.com/brilliant-ember/SQL-Undo-Redo-Qureries/
Here's the secret sauce:
Step 1. Create an event_log table
This table can't directly interface with the application, instead it will be populated by event triggers. It's an append only table, so we can't delete or edit records (except for some required updates as we will see later).
Let's take a look at the event_log table columns
- id and event_timestamp, the id is the primary key of the event, and the event_timestamp is when the trigger inserted the event.
- event type, it can either be user for events initiated by the user, or undo, or redo. Mind you that even the undo and redo are initiated by the user, but make a distinction between user direct events, and event done via undo/redo
- There are only 3 possible sql operations: INSERT, UPDATE, DELETE and we will perform them on a specific table and on a specific row_id, so we record all of that information. Incase of an insert we will record the inserted row id.
- The wal_lsn column is to create a reference to the write ahead log to making tracking with the internal postgres log easier
- the edited_fields column is the new data to be written, Fields and their new values (after the change) .In case we are deleting data we will write '_delete_' for every column field we are deleting. This column is how the redo will know what data to reapply to the row we want to redo.
- previous_state is the fields of the row before the updates take place,Fields and their values before the change. In case we are inserting a new record the previous state will be empty. This column is where the undo takes it's data to actually perform the undo, it will revert the row to the data of the previous state.
- voided_by_event_id column will tell us if a if our event got "cancelled" by another event. If a user event got undone, then the undo event will be the voiding event and the user event will be voided by that undo event. Similarly if an undo got "cancelled" by a redo the same thing will happen. Note that a redo only works on undo event, but an undo works on both a redo event and a user event.
- transaction_id is for when events are atomic, if the user events are atomic, then so must be the undo, and if an undo is atomic then so must be the redo.
- fk_dep column is very interesting. It allows us to record the foreign key dependencies of the row and it's of this format {columnName: {foreignTableName: foreign_key}}. It's effectively a dependency tree that tells us what foreign dependencies this row has.
Note
The event_log table and log_event trigger must be put in the at the end of your migrations after all other tables have been created. This is because later at the end you will need to create a dynamic trigger that inserts into the event log on every edit to any of the other tables. So if this event_log was in the middle and you created other tables after it your down migration will fail since future tables depend on the log_event function which your are trying to delete.
CREATE TABLE event_log (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
sql_op TEXT NOT NULL CHECK (sql_op IN ('INSERT', 'UPDATE', 'DELETE')),
event_type TEXT NOT NULL CHECK (event_type IN ('user', 'undo', 'redo')),
table_name TEXT NOT NULL,
row_id TEXT NOT NULL,
-- user_id UUID NOT NULL,
wal_lsn PG_LSN NOT NULL,
edited_fields JSONB NOT NULL,
previous_state JSONB, -- can be null because the first creation of a record will not have a previous state
notes TEXT,
-- usually null, but if this event has been voided by
-- another event, that voiding event will be here.
-- for example if this event is a user action event, and later
-- got undid, the event that undid this user actino
-- will be in this column. Similarly if an undo event got redone
-- the redoing event id will be in this voided_by_event_id column
-- and if a redo got undone, the undone event will be here
voided_by_event_id BIGINT REFERENCES event_log(id),
-- if there's a transaction and we update multiple tables
-- and we'd want to undo them at the same time
-- passed via context
transaction_id BIGINT,
-- {foreignColumnName: {foreignTable: 'foreign_key'} }this is used for inter-table dependencies useful for two cases
-- 1. When we need to write in the same transaciton
-- for example a created tire (child_row) needs a car (parent_row) to be already there
-- and they're usually created in the same database transaction. This is used to keep track of that
-- 2. When we need to redo an insert bute we deleted the row that the foreign key refers to and reinserted it with a different id
-- this fk_dep will be used with to search for the new inserted id
--
fk_dep JSONB
);
Step 2 Create a materialized view to help us keep track of foreign table dependencies
We will need a way to keep track of every foreign dependency we have, both column wise and table wise. This information doesn't update often (only when you change the schema). So it's good to make this materialized view in the very last migration so it can record all the foreign dependencies.
This view will create two ordered list where the referenced table and fk_columns will line up incase you have multiple foreign dependencies with the same table but with different column names. It will look like this
table_name | referenced_tables (name[]) | fk_columns (name[]) |
---|---|---|
child | {parent} | {parent_id} |
CREATE MATERIALIZED VIEW fk_cache AS
SELECT
src_table.relname AS table_name,
array_agg(tgt_table.relname ORDER BY pair.ord) AS referenced_tables,
array_agg(src_col.attname ORDER BY pair.ord) AS fk_columns
FROM
pg_constraint c
JOIN pg_class src_table ON src_table.oid = c.conrelid
JOIN pg_class tgt_table ON tgt_table.oid = c.confrelid
-- Match source and target columns by position
JOIN LATERAL unnest(c.conkey) WITH ORDINALITY AS src(attnum, ord) ON TRUE
JOIN LATERAL unnest(c.confkey) WITH ORDINALITY AS tgt(attnum, ord) ON tgt.ord = src.ord
JOIN pg_attribute src_col ON src_col.attrelid = c.conrelid AND src_col.attnum = src.attnum
JOIN LATERAL (
SELECT src.ord -- position in FK constraint
) AS pair ON TRUE
WHERE c.contype = 'f'
GROUP BY src_table.relname;
CREATE UNIQUE INDEX fk_cache_table_name_idx
ON fk_cache (table_name);
Step 3 Create the trigger function
The trigger function is what's going to insert entries to the event_log table, there's no other way of inserting data to it, and of course you can't delete data from it. (don't forget to update user permissions to prevent this :D )
The log function depends on passing two thing by context
- a flag to tell the log_event trigger if this event is part of a transaction, you can do that by making your app set the context var
myapp.transaction_started
. This is needed so we record the transaction id in the event_log record. You can set the flag by using the helper function provided -
myapp.event_type
which will default to 'user' type unless this event was triggered by and undo or redo, which they will set this var to either 'undo' or 'redo' accordingly. An important thing to note, is that if the the event_type was undo or redo, that means we are voiding another event. So this trigger will also update the log record's voided_by_event_id column to this NEW.id event.
Note that the helper function record_jsonb
removes the columns I don't want to record in the event_log, like the id, created_at, and updated_at columns. But feel free to add any columns you want to exclude from this logging, any column you exclude will not be undoable/redoable.
Helper functions:
CREATE OR REPLACE PROCEDURE flag_transaction_start()
AS $$
BEGIN
SET LOCAL myapp.transaction_started = 'true';
END;
$$ LANGUAGE plpgsql;
-- if there is a row that depends on another row being created first
-- this function will return that in a jsonb format.
-- for example if order depends on cart being created first
-- func(order, order_row_by_id_x) => {'column_name':{'cart_table' : "1"}}
-- which means that we need cart with id 1 to be created first before we create the order
-- of course the cart can have any column name, so we handle that by column_name key-object pair
CREATE OR REPLACE FUNCTION get_row_foreign_key_dependencies(input_table TEXT, row_data JSONB)
RETURNS JSONB AS $$
DECLARE
fk_map JSONB := '{}';
table_fk_arr TEXT[];
col_fk_arr TEXT[];
fk_col TEXT;
fk_table TEXT;
fk_edited_same_transaction BOOLEAN;
i INT;
BEGIN
SELECT tables_arr, cols_arr INTO table_fk_arr, col_fk_arr FROM get_foreign_key_array(input_table);
IF table_fk_arr is NULL OR cardinality(table_fk_arr) = 0 THEN
return NULL;
END IF;
IF cardinality(table_fk_arr) <> cardinality(col_fk_arr) THEN
RAISE EXCEPTION 'the number of tables doesnt line up with the number of columns.';
END IF;
FOR i IN 1..cardinality(table_fk_arr) LOOP
fk_col := col_fk_arr[i]::TEXT;
IF row_data ? fk_col THEN
fk_table := table_fk_arr[i];
fk_map := jsonb_set(
fk_map,
ARRAY[fk_col],
jsonb_build_object(fk_table, row_data->fk_col),
true
);
END IF;
END LOOP;
RETURN fk_map;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE function record_jsonb(r RECORD)
RETURNS JSONB AS $$
BEGIN
RETURN to_jsonb(r) - 'id' - 'created_at' - 'updated_at'; -- we remove the columns we don't need
END;
$$ LANGUAGE plpgsql;
-- returns all the foreign keys of the input table
CREATE OR REPLACE FUNCTION get_foreign_key_array(input_table TEXT, tables_arr OUT TEXT[], cols_arr out TEXT[])
AS $$
BEGIN
SELECT referenced_tables, fk_columns INTO tables_arr, cols_arr FROM fk_cache where table_name=input_table;
END;
$$ LANGUAGE plpgsql;
Trigger function
CREATE OR REPLACE FUNCTION log_event()
RETURNS TRIGGER AS $$
DECLARE
changed JSONB := '{}'::JSONB;
k TEXT;
udata JSONB;
txn_id BIGINT := NULL;
txn_flag TEXT := current_setting('myapp.transaction_started', true);
fk_map JSONB;
event_type TEXT := current_setting('myapp.event_type', true);
voided_by_event_id BIGINT;
undone_or_redone_user_event_id BIGINT;
new_event_id BIGINT;
BEGIN
IF txn_flag IS NOT NULL AND txn_flag <> '' AND txn_flag::BOOLEAN THEN
txn_id := txid_current_if_assigned();
END IF;
IF event_type IS NULL OR event_type='' THEN
event_type='user';
END IF;
IF TG_OP ='UPDATE' THEN
udata := record_jsonb(NEW);
FOR k IN SELECT jsonb_object_keys(udata) LOOP
IF udata->k IS DISTINCT FROM record_jsonb(OLD)->k THEN
changed := changed || jsonb_build_object(k, udata->k);
END IF;
END LOOP;
ELSIF TG_OP = 'INSERT' THEN
fk_map := get_row_foreign_key_dependencies(
TG_TABLE_NAME,
to_jsonb(NEW)
);
changed := record_jsonb(NEW);
ELSIF TG_OP ='DELETE' THEN
FOR k IN SELECT jsonb_object_keys(record_jsonb(OLD)) LOOP
changed := changed || jsonb_build_object(k, '_deleted_');
END LOOP;
fk_map := get_row_foreign_key_dependencies(
TG_TABLE_NAME,
to_jsonb(OLD)
);
END IF;
IF fk_map = '{}' THEN
fk_map = NULL;
END IF;
INSERT INTO event_log (
sql_op,
table_name,
row_id,
-- user_id,
wal_lsn,
edited_fields,
previous_state,
transaction_id,
fk_dep,
event_type
)
VALUES (
TG_OP,
TG_TABLE_NAME,
COALESCE(NEW.id::TEXT, OLD.id::TEXT),
-- userid,
pg_current_wal_lsn(),
changed,
record_jsonb(OLD),
txn_id,
fk_map,
event_type
) RETURNING id INTO new_event_id;
IF event_type = 'undo' OR event_type = 'redo' THEN
undone_or_redone_user_event_id := current_setting('myapp.target_event_id')::BIGINT;
UPDATE event_log SET voided_by_event_id = new_event_id WHERE id = undone_or_redone_user_event_id;
END IF;
return NEW;
END;
$$ LANGUAGE plpgsql;
Step 4 Create the redo and undo functions
We have only 3 possible sql operations: 'Update', 'Insert', 'Delete'
- An insert cancels the delete
- A delete cancels the insert
- and an update cancels another update Kinda like rock-paper-scissors, but not really. We will have a lot of helper functions here, but first let's put the undo and redo functions to get a general sense of direction. These functions will work on a actioning a single event (like undo a single user event) and will return the id of what we did, if we inserted a new row to the affected table we will return that newly inserted id, and if we updated or edited a row, we will return the affecte row id.
CREATE OR REPLACE FUNCTION undo_event(target_event event_log, row_id OUT TEXT)
AS $$
BEGIN
SET LOCAL myapp.event_type = 'undo';
EXECUTE format('SET LOCAL myapp.target_event_id = %L', target_event.id::TEXT);
row_id := target_event.row_id;
IF TRIM(UPPER(target_event.sql_op)) = 'UPDATE' THEN
CALL undo_redo_update_query(target_event.id, target_event.previous_state, target_event.table_name, target_event.row_id);
ELSIF TRIM(UPPER(target_event.sql_op)) = 'INSERT' THEN
SELECT t.row_id INTO row_id FROM undo_redo_delete_query(target_event.table_name, target_event.row_id) t;
ELSIF TRIM(UPPER(target_event.sql_op)) = 'DELETE' THEN
SELECT t.new_id INTO row_id
FROM undo_redo_insert_query(target_event.previous_state,target_event.table_name, target_event.row_id, target_event.fk_dep) t;
ELSE
RAISE EXCEPTION 'unknown sql operation for while undoing event id %, the operation was %', target_event.id, target_event.sql_op;
RETURN;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION redo_event(undo_event event_log, row_id OUT TEXT)
AS $$
DECLARE
e event_log; -- the user event that got undone by the undo event
BEGIN
IF undo_event.event_type != 'undo' THEN
RAISE EXCEPTION 'Cannot redo since the event id % is not of type undo, it was type %', undo_event.id, undo_event.event_type;
END IF;
-- we want the user event that got voided by the undo event- an undo a can undo a redo-event or a user-event
SELECT * INTO e FROM event_log WHERE voided_by_event_id = undo_event.id AND (event_type='user' OR event_type='redo');
IF NOT FOUND THEN
RAISE EXCEPTION 'No event found for id %', undo_event.id;
RETURN;
END IF;
SET LOCAL myapp.event_type = 'redo';
EXECUTE format('SET LOCAL myapp.target_event_id = %L', undo_event.id::TEXT);
row_id := e.row_id;
-- the undo undid the update, so the redo will reapply the update
IF TRIM(UPPER(e.sql_op)) = 'UPDATE' THEN
CALL undo_redo_update_query(e.id, e.edited_fields, e.table_name, e.row_id);
-- the undo undid the insert, so the redo will reapply the insert
ELSIF TRIM(UPPER(e.sql_op)) = 'INSERT' THEN
SELECT t.new_id INTO row_id
FROM undo_redo_insert_query(e.edited_fields, e.table_name, e.row_id, e.fk_dep) t;
-- the undo undid the delete, so the redo will reapply the delete
ELSIF TRIM(UPPER(e.sql_op)) = 'DELETE' THEN
SELECT t.row_id INTO row_id FROM undo_redo_delete_query(e.table_name, e.row_id) AS t;
ELSE
RAISE EXCEPTION 'unknown sql operation for while redoing event %, the operation was %', e.id, e.sql_op;
RETURN;
END IF;
END;
$$ LANGUAGE plpgsql;
Now this code above only works for actioning a single event, what if we have multiple events in a transaction? like we created multiple records in different tables and we want to undo them all atomically? We create two more functions that will wrap over the undo and redo functions to handle this case. These wrappers will call the flag transaction function we created earlier with the trigger.
As explained earlier, the order of operations is important. When we insert we want to insert the parents first then the children to avoid missing foreign dependency problems. And when we delete we want to delete in the reverse direction for the same reason. These two functions handle this ordering by using ORDER BY id DESC
to ensure that we action events in the correct order.
Each of these wrapper functions will return the {table_name_actioned: array_of_ids_actioned}
-- returns a map of shape {table_name:[ids tourched]}
CREATE OR REPLACE FUNCTION perform_undo(target_event event_log)
RETURNS JSONB AS $$
DECLARE
txn_id BIGINT;
rec event_log;
row_id TEXT;
out_map JSONB := '{}'; -- {table_name: [actioned_ids (updated or inserted or deleted)]}
ids_arr TEXT[];
BEGIN
IF target_event IS NULL THEN
RAISE WARNING 'perform_undo received a null target event';
RETURN out_map;
END IF;
txn_id := target_event.transaction_id;
-- single row undo (single e undo)
IF txn_id IS NULL THEN
SELECT r.row_id INTO row_id FROM undo_event(target_event) r;
ids_arr := array_append(ids_arr, row_id);
out_map := jsonb_set('{}', ARRAY[target_event.table_name], to_jsonb(ids_arr));
RETURN out_map;
-- undo whole transaction (can involve multiple rows and multiple tables)
ELSE
CALL flag_transaction_start();
-- the ordering is important, we want to action items
-- in the same order the user inputted them.
FOR rec IN
SELECT * FROM event_log
WHERE transaction_id = txn_id AND event_type != 'undo'
ORDER BY id DESC
LOOP
SELECT r.row_id INTO row_id FROM undo_event(rec) r;
IF out_map ? rec.table_name THEN
SELECT array_agg(elem)::TEXT[] INTO ids_arr
FROM jsonb_array_elements_text(out_map->rec.table_name) tbl(elem);
ELSE
ids_arr := ARRAY[]::TEXT[];
END IF;
ids_arr := array_append(ids_arr, row_id);
out_map := jsonb_set(out_map, ARRAY[rec.table_name], to_jsonb(ids_arr));
END LOOP;
END IF;
RETURN out_map;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION perform_redo(e event_log)
RETURNS JSONB AS $$
DECLARE
txn_id BIGINT;
rec event_log;
row_id TEXT;
out_map JSONB := '{}'; -- {table_name: [actioned_ids (updated or inserted or deleted)]}
ids_arr TEXT[];
BEGIN
IF e IS NULL THEN
RAISE WARNING 'perform_redo received a null target event';
RETURN out_map;
END IF;
txn_id := e.transaction_id;
-- single row redo (redo a single undo)
IF txn_id IS NULL THEN
SELECT r.row_id INTO row_id FROM redo_event(e) r;
ids_arr := array_append(ids_arr, row_id);
out_map := jsonb_set('{}', ARRAY[e.table_name], to_jsonb(ids_arr));
RETURN out_map;
-- redo whole transaction (can involve multiple rows and multiple tables)
ELSE
CALL flag_transaction_start();
-- we need to redo them in the same order the got undone to preserve any foreign key dependency
FOR rec IN
SELECT * FROM event_log
WHERE transaction_id = txn_id AND event_type = 'undo'
ORDER BY id DESC
LOOP
SELECT r.row_id INTO row_id FROM redo_event(rec) r;
IF out_map ? rec.table_name THEN
SELECT array_agg(elem)::TEXT[] INTO ids_arr
FROM jsonb_array_elements_text(out_map->rec.table_name) tbl(elem);
ELSE
ids_arr := ARRAY[]::TEXT[];
END IF;
ids_arr := array_append(ids_arr, row_id);
out_map := jsonb_set(out_map, ARRAY[rec.table_name], to_jsonb(ids_arr));
END LOOP;
END IF;
RETURN out_map;
END
$$ LANGUAGE plpgsql;
Now we do the helper functions
These functions do a lot of the heavy lifting for the logic, as well as the recursive search for the fk_dep.
First set of helpers. These are the insert, delete, and update functions
-- when undoing a delete by inserting or redoing an insert we have
-- to respect the foreign key constraints
-- for example if in the undo we created a parent, and then we want to create the children
-- the children will refer to the parent we created, not the one from the event table
CREATE OR REPLACE FUNCTION undo_redo_insert_query(fields JSONB, table_name TEXT, old_id TEXT, fk_dep JSONB, new_id OUT TEXT)
AS $$
DECLARE
query_data JSONB := '{}';
colname TEXT;
old_val TEXT;
new_fk TEXT;
tbl_name TEXT;
BEGIN
FOR colname IN SELECT * FROM jsonb_object_keys(fields) LOOP
old_val := fields->>colname;
IF old_val IS NOT NULL THEN
IF fk_dep ? colname THEN
-- in this case old_val will be the foreign table id
tbl_name := extract_table_name_from_fk_dep(fk_dep, colname);
SELECT t.row_id INTO new_fk FROM get_leaf(old_val, tbl_name, NULL) AS t;
query_data := jsonb_set(query_data, ARRAY[colname], to_jsonb(new_fk));
ELSE
query_data := jsonb_set(query_data, ARRAY[colname], to_jsonb(old_val));
END IF;
ELSE
query_data := jsonb_set(query_data, ARRAY[colname], 'null'::JSONB);
END IF;
END LOOP;
EXECUTE (SELECT format(
'INSERT INTO %I (%s) VALUES (%s) RETURNING id::TEXT',
table_name,
string_agg(format('%I', k), ', '),
string_agg(format('%L', query_data ->> k), ', ')
)
FROM jsonb_object_keys(query_data) AS k)
INTO new_id;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION undo_redo_delete_query(table_name TEXT, row_id INOUT TEXT)
AS $$
BEGIN
SELECT t.row_id INTO row_id FROM recursively_find_restored_row_id(NULL, table_name, row_id) AS t;
EXECUTE format(
'DELETE FROM %I WHERE id::TEXT = %L',
table_name,
row_id
);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE PROCEDURE undo_redo_update_query(event_id BIGINT, fields JSONB, table_name TEXT, row_id TEXT)
AS $$
DECLARE
q TEXT;
fk_dep JSONB := '{}';
colname TEXT;
original_fk TEXT;
new_fk TEXT;
set_clauses TEXT := '';
add_comma BOOLEAN := FALSE;
tbl_name TEXT;
BEGIN
SELECT t.row_id, t.fk_dep INTO row_id, fk_dep FROM get_leaf(row_id, table_name, event_id) AS t;
FOR colname IN SELECT * FROM jsonb_object_keys(fields) LOOP
original_fk := fields->>colname;
IF add_comma THEN
set_clauses := set_clauses || ', ';
END IF;
IF original_fk IS NOT NULL THEN
IF fk_dep ? colname AND THEN
-- if we are inserting into the foreign column, we want to make sure we use the latest id if the one in the fk_dep is deleted
tbl_name := extract_table_name_from_fk_dep(fk_dep, colname);
SELECT t.row_id INTO new_fk FROM get_leaf(original_fk, tbl_name, NULL) AS t;
set_clauses := set_clauses || format('%I = %L', colname, new_fk::TEXT);
ELSE
set_clauses := set_clauses || format('%I = %L', colname, original_fk::TEXT);
END IF;
ELSE
set_clauses := set_clauses || format('%I = NULL', colname);
END IF;
add_comma := TRUE;
END LOOP;
-- we use updated_at=updated_at to trigger an update even if the jsonb is empty
-- this is important so the trigger can get called to mark this event as voided
-- even if no actual undo/redo need to be done. This will trigger a warning but it's okay
IF NOT add_comma THEN -- if we never added a comma then fields was empty
EXECUTE (SELECT format('UPDATE %I SET updated_at=updated_at WHERE id::TEXT=%L', table_name, row_id));
RETURN;
END IF;
q := format(
'UPDATE %I SET %s WHERE id::TEXT = %L',
table_name,
set_clauses,
row_id
);
EXECUTE q;
END;
$$ LANGUAGE plpgsql;
Second set of helpers involve the recursive search to find the leaf node of the dependency tree
-- records can be deleted and then reinserted with a new primary key, so we need to find
-- the latest re-inserted primary key whenever we either do updates,
-- or inserts that need foreign keys (we need the latest id for the fk, we can't insert using the outdated key)
CREATE OR REPLACE FUNCTION get_leaf(row_id INOUT TEXT, table_name TEXT, event_id BIGINT, fk_dep OUT JSONB )
AS $$
DECLARE
q TEXT;
r BOOLEAN;
BEGIN
IF row_id IS NULL THEN
RAISE EXCEPTION 'cannot action %:% because row id is null', table_name, row_id;
END IF;
-- check if we are already at the leaf node
q := format('SELECT EXISTS (SELECT 1 FROM event_log WHERE id = %L AND voided_by_event_id = NULL) ', row_id);
EXECUTE q INTO r;
IF r THEN
q:= format('SELECT fk_dep FROM event_log WHERE id=%L', row_id);
EXECUTE q INTO fk_dep;
RETURN;
END IF;
SELECT t.row_id, t.event_id INTO row_id, event_id FROM recursively_find_restored_row_id(event_id, table_name, row_id) AS t;
q := format('SELECT EXISTS (SELECT 1 FROM %I WHERE id = %L) ', table_name, row_id);
EXECUTE q INTO r;
IF NOT r THEN
RAISE EXCEPTION 'cannot action %:% because it does not exists. How was the record deleted?', table_name, row_id;
END IF;
-- we need to check foreign key deps
-- incase we update a foreign key column to something that got deleted then resotred with a new id
q := format('SELECT fk_dep FROM event_log WHERE id=%L', event_id);
EXECUTE q INTO fk_dep;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION extract_table_name_from_fk_dep(fk_dep JSONB, colname TEXT, tbl_name OUT TEXT)
AS $$
BEGIN
IF (SELECT COUNT(*) FROM jsonb_object_keys(fk_dep -> colname)) <> 1 THEN
RAISE EXCEPTION 'expected that the fk map will only have one element in the colname key %, fk_map: %', colname, fk_dep;
END IF;
SELECT k INTO tbl_name FROM jsonb_object_keys(fk_dep->colname) AS k LIMIT 1;
END;
$$ LANGUAGE plpgsql;
-- when we delete a record, then restore it with undo (similar logic with redo) then the restored record has a different
-- id to the original one we deleted. but we want to connect them in case we want to chain undos or redos after restoring the delete.
-- for example: when we update a record, then delete it, then restore it by undo. then undo the update
-- we run into the original updated record row_id being not found as it was deleted (we then resotred it but its id is now different)
-- so we need to walk up the event_log to find the new row_id (decendant). we do so recursilvy in case there are more deltes and restores.
-- PARAMS
-- event id: can be NULL. It's only used as an output variable but since we are using recursion we must make it an inout type
-- table_name and row_id are the main parameters that must be supplied.
-- RETURN
-- event_id of the event that was restored after the delete
-- row_id the row_id that got restored (the replacement row_id after we restored the delted one)
-- will look for the latest row_id for the table name
-- for example if table_name:row_id_3 got deleted, but then the delete got undone and creted table_name:row_id_4. then this
-- function will be smart and retrn row_id_4 even though the input was row_id_3
CREATE OR REPLACE FUNCTION recursively_find_restored_row_id(event_id INOUT BIGINT, table_name TEXT, row_id INOUT TEXT, depth INT DEFAULT 0)
AS $$
DECLARE
check_query TEXT;
check_result BOOLEAN;
voided_by_eid BIGINT;
BEGIN
-- RAISE NOTICE 'before-update-check %: % - depth: %', table_name, row_id, depth;
IF row_id IS NULL OR row_id='0' THEN
RAISE EXCEPTION 'Recursive search depth % - table %:% event_id % it is not allowed to have nil row_id', depth, table_name, row_id, event_id;
END IF;
IF depth > 100 THEN
RAISE EXCEPTION 'Recursive search depth % exceeds limit of 100 - table %:% event_id %', depth, table_name, row_id, event_id;
END IF;
check_query := format('SELECT EXISTS (SELECT 1 FROM %I WHERE id = %L) ', table_name, row_id);
EXECUTE check_query INTO check_result;
IF check_result THEN
RETURN ;
END IF;
-- we didn't find the row id in the table, so find who deleted it
-- if it was deleted by an event that is not voided then the
-- row_id doesn't have a corrsponding decendant that can accept the update
check_query := format('SELECT voided_by_event_id FROM event_log WHERE table_name=%L AND row_id=%L AND sql_op=%L',
table_name, row_id, 'DELETE');
EXECUTE check_query INTO voided_by_eid;
IF voided_by_eid IS NULL OR voided_by_eid=0 THEN
-- we didn't find anything. can't update
RAISE NOTICE 'didnt find the parent of the row id that can accept the action for %: %',table_name, row_id;
row_id := NULL;
event_id := NULL;
RETURN;
END IF;
check_query := format('SELECT row_id from event_log WHERE id=%L', voided_by_eid);
EXECUTE check_query INTO row_id;
SELECT t.row_id, t.event_id into row_id, event_id FROM recursivly_find_restored_row_id(voided_by_eid, table_name, row_id, depth+1) AS t;
END;
$$ LANGUAGE plpgsql;
Step 5 register the trigger
The function must be triggered on all tables, except for the event log table. This should run in the very last migration after all your other tables got created
DO $$
DECLARE
table_rec RECORD;
BEGIN
FOR table_rec IN
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE' AND table_name != 'event_log'
LOOP
EXECUTE format('CREATE TRIGGER %I_event_trigger
AFTER INSERT OR UPDATE OR DELETE ON public.%I
FOR EACH ROW EXECUTE FUNCTION public.log_event();',
table_rec.table_name, table_rec.table_name);
END LOOP;
END $$;
Important points
Why do we use previous_state and edited_fields instead of just storing deltas?
Good question!
The answer is, like any other thing in engineering, it's a tradeoff!
it's a trade-offs between simplicity, traceability, and storage efficiency.
Yes you can do it by just storing deltas.
Using our approach (previous_state and edited_fields) is more clear and simple for both humans and code, no need to compute diffs on the fly. Also it allows for bidirectional replay, with previous_state you rollback, and with edited-fields we reapply (we don't implement reapplying in this guide though).
It's disadvantage compared to delta's approach is that it uses more slightly more storage space(we are only storing edited fields not the full object so not too bad) as you're storing two maps instead of one. But that's okay given that we its saves CPU time since we don't have to compute diffs on the fly.
In a database setting, the storage space is MUCH cheaper than CPU time, so it's a good tradeoff to use slightly more space to save a lot of CPU power.
More about the fk_dep column
This table is critically important when we try to restore records that have a hierarchical structure. Imagine a cart table and a purchase table. Where the purchases are children of the cart. The purchase rows can't exist without a cart_id. If a user deleted the cart then so will he delete the purchases. And this will be recorded as 2 events
- 1. Delete purchases | user event
- 2. Delete cart | user event
This ordering of event is critical, because a purchase can't exist without a cart. Imagine that the cart has an id of 5, so the fk_dep column of the purchase will look like this.
{
belongs_to_cart: {carts_table: 5}
}
An undo event event will create the following
- Insert cart | undo event
- insert purchases | undo event The newly created cart will have a new id of 6. the previous id of 5 is not recycled. But here we have a big problem, the undo function will apply values from the previous state, so it will use the same old cart_id:5 which doesn't exist anymore so we will get a big fat error. fk_deps column to the rescue! since we stored the foreign key dependency we can see what was the event that voided the event that deleted cart_id:5, voiding the delete means that we inserted a new cart_id, and that cart_id is the one that replaced cart_id:5 so the new cart id we should use is 6. We will write a recursive function that can track the restored object ids later.
Note that we are not allowed to recycle primary keys of the other tables, the id is always generated and it's always sequential. Say we have an Accounts table, and we created an account with Id 1, then we deleted that account, then we performed an undo to restore the account. The restored account id will be 2, even though it's the exact same account. This is a good practice to not recycle primary keys and to always increment.
Stacking undos and redos
There are some considerations that need to be made when stacking undos and redos. The code already handles these cases, but I mentioned them here for educational purposes.
A redo can only reapply user actions, but can it also reapply other redos?
An undo_event can undo a user_event or a redo_event. A redo can only "cancel" an undo_event, in other words reapply a user_action. But should it be able to reapply another redo? Short answer is yes and here's why. Imagine the following:
- 1. user event (an edit on some table)
- 2. undo that event (remove user event and reverse user's action)
- 3. redo (cancel the undo, restore user event)
- 4. undo the redo (remove user event)
- 5. redo ( cancel the undo, restore user event)
- By action 5 you will run into a problem, because the redo will take an undo action as argument, and then it will look at the event that got voided by the undo, in case 5 that event is another redo. So the redo will be confused if it expects a user action, so you need to make sure it accepts a redo action too. This will works since the edited_fields will be the same on the redo_event and the user_event.
When undoing events in a transaction, why do we create event logs rows that had no changes?
Let's say you had two Accounts both with a balance of zero, and there was a Transfer of zero dollars from Account 1 to Account 2. You have 3 database entries happening in the same transaction a1, a2, t1. Mind that the only real change is the creation of the transfer record. The actual account rows were not changed. Their balances are exactly the same.
This will be recorded as 3 user events. And if we undo we will also create 3 user events. Not only that these events will have a transaction id. These is to signal that these events are atomic and go together. If we didn't record their transaction id as the same id, they will get identified as separate events and a new undo operation can try to action on of them without the others. This case (as seen below) can lead to bugs in data integrity. So we must always identify transaction events by giving them all the same transaction id, even if no values got changed. This is why we record some silly change in the update_query helper function above to make sure to trigger the event log write, even though nothing really changed.
Why do we need a recursive search even?
Recursion is fantastic for tree data structure traversal, we use it for traversing the foreign dependency tree, but what does that mean exactly and what happens if we dont use it?
Imagine the following sequence of events:
- User creates a new account in the account table
- User updates the account name
- User deletes the account
- User undos the delete (so the undo re-inserted the new account with a new ID)
- User undoes the update to get back the original name (BOOM you get a big fat error here, because the event_log that you undid has a row_id column of the original account before the delete not the new one after restoring with the undo. You need a recursive function to track all deletes and finds the last correct row_id to apply undoing the updates to the new restored account)
Why do we allow undo chains to be interrupted by user events, but not redo chains?
Short answer is to prevent implicit dependencies and branching. This is a subtle but incredibly complicated problem. The most intuitive solution is to not allow redos to be interrupted by user actions. Here is the explanation (fair warning, you need to pay attention and maybe read it more than once)
Think about this case. You have the financial_account table and the financial_transaction table (not to be confused with the database transaction)
- create 3 accounts: a1, a2, a3
- create two financial transactions, one subtracts 50\$ from a1, and the other adds 50\$ to a2 , so now a1's balance is -50 and a2 balance is positive 50. a3 balance is still zero.
- update the amount to 100\$, so now a1 is -100 and a2 is 100. a3 is still zero
- do an undo operation, so now a1 and a2 are back to -50 and 50.
- change the target of t2 to a3. so now a3 is 50 instead of zero, and a2 is now zero
- undo that t2 target change. so now a1 and a2 are back to -50 and 50. -- now start applying redos and here a problem happens can you detect it?
- redo. so now 6th event's undo is cancelled and the 5th event is reapplied so now t2 to a3. so now a3 is 50 instead of zero, and a2 is now zero.
- redo again, so now the 4th event's undo is canceled and the 3rd event is reapplied so now a1 is -100 and a2 is 100. But a3 is still 50, a3 should have been 0... this is a serious data integrity bug
What is going on? the bug is at the 8th event a3 should be 0 instead of 50
Here is a snapshot of what the Event Log
Event # | Type | Description |
---|---|---|
1 | User | Create a1, a2, a3 |
2 | User | t1: -50 from a1, t2: +50 to a2 |
3 | User | Update t1/t2 amount to 100 |
4 | Undo | Undo event 3 → a1 = -50, a2 = +50 |
5 | User | Change t2's target from a2 → a3 |
6 | Undo | Undo event 5 → t2 back to a2 |
7 | Redo | Redo event 5 → t2 target = a3, a1 = -50, a3 = 50 |
8 | Redo | Redo event 3 → a1 = -100, a2 = +100, a3 = 50❗❗ |
The 💥 Problem at Event 8A
-
t2
has been changed to point to a3, not a2 anymore. - But we are replaying event 3 — which assumes t2 is pointing to a2 and adds +100 to it.
⚠️ Why This Happens
There is are implicit event dependencies and the redo logic isn't checking that the context has not changed since the original event occurred.
In other words, event 3 depends on t2 being linked to a2 but that precondition is no longer true after redoing event 5.
So we are blindly reapplying old actions without validating whether they're still coherent.
Possible solutions
1. Best solution
- Simplest solution to avoid event-chains which lead to a compound state is to prevent redos from jumping over user-events. In other words it can work only if the top events are undos, the minute an event is a user event then redo will fail and will not have any thing it can redo.
- This solution is common in text editors and many user-facing software like Photoshop.
- The database state is directly caused by user action (Causal) so it's intuitve to an end user.
- Its causal nature allows for better tracablity as you don't get a state that depends on the order of the previous compound events. The final state is always derivable from the last event, we don't need a specific ordering of pervious events to arrive at the final state. So there's no ambiguity as to why a certain state is what it is.
2. Implement event-chains, branching solution (very complicated)
In this case, the desired state will be the following (at event 8)
- user creates cash(a1), groceries(a2), and bills(a3) accounts.
- he makes a financial transaction with 50$ from cash to groceries, so a1 is -50 and a3 is 50
- he updates that to 100$ so a1 and a3 are -100 and 100
- he undo's that, they are back to 50s
- he switches from groceries to bills account, so a1 is -50, and bills is 50. groceries is 0
- he undos that, and now a1 is -50, a2 is 50 and a3 is 0
- he redos, (so now we're back to state 5) a1 is -50, and bills is 50. a2 groceries is 0
- he redos again, so a1 is -100 and a3 100 (focus here), a2 is still zero
- Note that not any single user event made a3 100, it was rather arrived at by a series of undoes and redos
- You must accept that state can result from chains of events, not just single user actions. This breaks the causal relationship between user actions and final state.
Possible research we can look at
- A Temporal Model for Multi-Level Undo and Redo by W. Keith Edwards et al https://sites.cc.gatech.edu/people/home/keith/pubs/templogic-uist00.pdf
Top comments (0)