DEV Community

Cover image for Linking Data in PostgreSQL Without Explicit Keys: A Practical Guide
JetThoughts Dev for JetThoughts

Posted on • Edited on • Originally published at jetthoughts.com

Linking Data in PostgreSQL Without Explicit Keys: A Practical Guide

Databases often evolve in unpredictable ways. Tables that were initially designed to operate independently might later need to be connected. This situation can arise in various scenarios, such as when data is generated by separate processes, systems, or applications that lack coordination.

In this article, we’ll explore how to link two tables in PostgreSQL based on their creation timestamps when no explicit keys exist. Specifically, we’ll demonstrate how to establish a relationship between records where the difference in creation times is no more than 2 seconds.

The Problem

Let’s consider a realistic example: an e-commerce platform where orders and payments were initially tracked independently.

  • No explicit linking keys were added between these tables because they were designed for separate workflows.
  • Payments and orders are created by different processes, introducing potential time delays.
  • The only feasible way to connect them is by using their created_at timestamps, assuming payments happen within a few seconds of the corresponding order being placed.

The Data

Here’s how the tables are structured:

Orders Table

Column Type Description
id SERIAL Primary key
order_number VARCHAR Unique identifier for the order
created_at TIMESTAMP Timestamp of order creation

Payments Table

Column Type Description
id SERIAL Primary key
transaction_id VARCHAR Unique identifier for the payment
created_at TIMESTAMP Timestamp of payment creation

Sample data:

INSERT INTO orders (order_number, created_at)
VALUES
    ('ORD001', '2024-12-01 10:00:00'),
    ('ORD002', '2024-12-01 10:00:05');

INSERT INTO payments (transaction_id, created_at)
VALUES
    ('TXN12345', '2024-12-01 10:00:01'),
    ('TXN67890', '2024-12-01 10:00:07');
Enter fullscreen mode Exit fullscreen mode

The Solution

Therefore, we need to create that relation among these tables where the created time difference between the created_at values in both tables is not longer than 2 seconds apart. This can be achieved with a query calculated on time difference via EXTRACT(EPOCH) filtering out the rows that fulfill that condition.

Creating a View

CREATE VIEW order_payment_links AS
SELECT
    o.id AS order_id,
    o.order_number,
    p.id AS payment_id,
    p.transaction_id,
    o.created_at AS order_created_at,
    p.created_at AS payment_created_at
FROM
    orders o
JOIN
    payments p
ON
    ABS(EXTRACT(EPOCH FROM (o.created_at - p.created_at))) <= 2;
Enter fullscreen mode Exit fullscreen mode

This view dynamically generates the linked data without modifying the original tables.

Querying the View
You can retrieve the linked data with a simple query:

SELECT * FROM order_payment_links;
Enter fullscreen mode Exit fullscreen mode

Result:

order_id order_number payment_id transaction_id order_created_at payment_created_at
1 ORD001 1 TXN12345 2024-12-01 10:00:00 2024-12-01 10:00:01

Here, the first order (ORD001) is linked to the first payment (TXN12345) because the timestamps differ by only 1 second. The second order and payment are not linked due to the greater time difference.

Why This Approach Works

This solution is effective in scenarios where explicit foreign keys are missing, and retrofitting them isn’t feasible. By leveraging timestamp-based linking:

  • No schema changes are required, making it safe for existing systems.
  • Flexible linking is possible, based on business-specific rules.

However, this method has limitations:

  • Timestamp precision is critical; inaccurate timestamps could lead to incorrect links.
  • Performance may degrade as the dataset grows because the query compares every possible pair.

Conclusion

When working with systems where tables were not originally designed to be related, linking records by timestamps can provide a practical workaround. PostgreSQL’s robust query capabilities make it easy to establish relationships dynamically.

This approach isn’t perfect for every use case, but it’s a valuable tool when retrofitting connections into an existing database without disrupting the current setup.

Top comments (0)