Kyungsu Kang

Posted on Mar 24

Understanding Snapshot Structures

#erd #snapshot #database #architecture

Problem Situation 1

While running a post service, a user reported a post at some point.

By the time it was checked, the post had already been deleted, leaving no way to verify whether the user's claim was true.

Fortunately, modern developers use a technique called soft-delete to solve this problem.

Instead of actually deleting data, they record the deletion date and make the post invisible if the date exists.

By checking the list of deleted data, the reported post was found, and the issue was resolved.

However...

Problem Situation 2

One day, a user reported a post again.

Once again, the post had already been deleted, so an attempt was made to check the deleted content, but surprisingly, the post was empty.

Did the user lie? Checking the last modified time of the post, it appeared that something had been edited before deletion.

It turns out that the post author was a developer (!!!).

Before deleting the post, they edited the content, replacing it with spaces.

Since a database is just a table with rows and columns like an Excel sheet, once modified, the previous content is lost.

So how can this issue be solved?

Implementing a Snapshot Structure

Article ERD

To solve this issue, there is a concept called snapshots.

Developers often associate snapshots with replication, but what I mean here is not history.

Rather than thinking of it as a past record, consider it as existing data that varies based on time.

When watching a streaming video on YouTube, is the footage from one second ago a past video while the current frame is the present video?

The data exists according to the chosen point in time—it is not a past artifact.

To understand this concept, let's create a snapshot table.

Suppose we have an Article table with the following seven columns:

id
writerId
title
content
createdAt
updatedAt
deletedAt

Among these, the mutable columns are the title and content.

To create a snapshot structure, we extract these mutable columns into a separate table.

We now have an ArticleSnapshot table that stores these mutable columns.

The snapshots are stored separately, making Article hold a 1:N relationship with snapshots.

Instead of updating posts, we now create new snapshots—eliminating the need for DB update queries.
The latest snapshot represents the currently visible version of the post.

Lastly, we remove unnecessary columns.

Since updates no longer exist, updatedAt is unnecessary in Article.
Similarly, updatedAt and deletedAt are unnecessary in ArticleSnapshot.

If a post is modified, a new snapshot is created; if deleted, deletedAt is recorded in Article.

With this structure, we can track changes and even restore previous versions when needed.

Both images above represent the same concept.

The latest snapshot is the currently visible version of the post.

Now, since a snapshot is created for every modification, the problems we previously encountered can be resolved!

Another Use Case: Products & Shopping Carts

Snapshots can also be applied to products.

Whenever a product's name changes, a new snapshot is created.

If green-colored users represent those who added a product to their cart, we can gain interesting insights.

For example, how does the number of people adding a product to their cart change based on name modifications?

While users will always see the latest name, administrators may not need to.

This helps analyze whether name changes affect sales, which could be useful for vendors in a marketplace.

That's why snapshots are not past data but actively used data!

Benefits of Using Snapshots

Whenever a post is modified, a new snapshot is created, and the latest snapshot represents the current state.

But as mentioned earlier, previous snapshots are still useful:

To verify reported posts by checking their original content before deletion.
To track post contents at the time comments were made.
To allow users to restore a post to a previous version.
To analyze whether post modifications impact SEO positively or negatively.

In real-world scenarios, data rarely exists as a single node without historical tracking.

For example, if someone comments on a post and the author later modifies it, the comment might seem out of place.

What if a dispute arises, and someone requests the original post content at the time of the argument?

Again, snapshots are not past data but active, useful data.

Whether in posts, products, or shopping carts, snapshots are essential for understanding historical context and performing analytics.

Without a snapshot structure, companies may have to rely on data lakes, which can be extremely expensive.

Conclusion: My Company Doesn't Use Snapshots

Yes, most companies don’t use them, and they probably won’t.

I only encountered this concept at my current job, and I believe it's rare for the following reasons:

Small companies don't prioritize data analytics, and few people understand its value.
Backend developers don’t see the need to retain past data.
- Many think soft-delete is enough.
- Startups are focused on immediate problem-solving rather than long-term planning.
- This leads to simply adding columns per requirement.
If leadership doesn’t value data, developers who spend time on design may be seen as slow.

However, as companies grow, data becomes increasingly important.

At some point, customers will start requesting historical data.

How will you respond when they ask for it?

Without snapshots, companies may end up adopting expensive data lake solutions.

Wouldn’t it be better to avoid that cost altogether?

If you enjoyed this article, I’ll write another one about designing commerce or marketing ERDs!

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Top comments (0)

5 Playwright CLI Flags That Will Transform Your Testing Workflow

--last-failed: Zero in on just the tests that failed in your previous run
--only-changed: Test only the spec files you've modified in git
--repeat-each: Run tests multiple times to catch flaky behavior before it reaches production
--forbid-only: Prevent accidental test.only commits from breaking your CI pipeline
--ui --headed --workers 1: Debug visually with browser windows and sequential test execution

Learn how these powerful command-line options can save you time, strengthen your test suite, and streamline your Playwright testing experience. Practical examples included!

Watch Video 📹️

DEV Community