Eventual Consistency: When to Choose It Over Strong Consistency

#career #distributedsystems #architecture #consistencymodels

When I first started building distributed systems, I expected everything to be instantly up-to-date. The ACID guarantees offered by a classic SQL database put my mind at ease. But as things scaled, I personally experienced how these guarantees slowed me down and how the system became bottlenecked. That's when I encountered eventual consistency, and I can say it opened a new door in my architectural decisions.

In this post, I'll explain the fundamental differences between strong consistency and eventual consistency, when I chose which one in real-world scenarios from my own projects, and the trade-offs each brings. My goal isn't to give you the "right" answer, but to offer a perspective that will help you make informed decisions for your own systems.

Strong Consistency: Guarantees and Their Cost

When strong consistency comes to mind, the first thing I think of is the guarantee that all data in the system is identical everywhere, at all times. The ACID (Atomicity, Consistency, Isolation, Durability) properties of classic relational databases form the basis of this guarantee. Especially when the Isolation level is set to SERIALIZABLE, we envision a world where concurrently running transactions don't interfere with each other and produce a result as if they ran one by one.

I never compromised on strong consistency in the accounting module I developed for a bank's internal platform, or in critical inventory tracking areas of a manufacturing company's ERP. Incorrect balance or stock information there could cause millions of liras in losses for the company, and even lead to legal issues. For example, to prevent a user from making another transfer simultaneously while making one, locking rows with SELECT ... FOR UPDATE or using the SERIALIZABLE isolation level in PostgreSQL was a red line for me.

-- Example of a SERIALIZABLE transaction in PostgreSQL
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Check account balance
SELECT bakiye FROM hesaplar WHERE id = 123 FOR UPDATE;

-- If balance is sufficient, make the transfer
UPDATE hesaplar SET bakiye = bakiye - 100 WHERE id = 123;
UPDATE hesaplar SET bakiye = bakiye + 100 WHERE id = 456;

COMMIT;

The price of this guarantee is high. Providing strong consistency usually means higher latency, lower throughput, and a more complex architecture in distributed systems. For example, if an UPDATE operation doesn't return until it's acknowledged by all replicas, this directly affects the overall response time of the system. In my experience, exceeding 50-100 transactions per second in such systems during high-traffic moments could be challenging. There was always a bottleneck, and it was usually database locks or network latency.

Eventual Consistency: Flexibility and Its Cost

Eventual consistency, as the name suggests, guarantees that data will eventually become consistent. This means that when data is updated, it may take some time for this update to propagate to all copies across the system. During this period, users looking at the same data from different locations might see different results. However, if no new updates arrive in the system, it's assumed that all copies will eventually converge to the same value.

When processing user activity logs for my side product or generating a "popular products" list for a website, there was no need for every log or popularity metric to be instantly up-to-date everywhere. A delay of a few seconds, or even a few minutes, wouldn't cause a problem. For example, whether a counter for a product a user liked increased instantly or not didn't significantly affect the overall user experience. In such scenarios, I leveraged eventual consistency using a cache system like Redis or a message queue like Apache Kafka.

💡 Eventual consistency with Redis

When I used Redis for session management or real-time counters, I could write to Redis before writing to the primary data source, allowing the user to see immediate data. In the background, a worker would pull data from Redis and process it into the main database. This model provided quick feedback to the user while also reducing the load on the main database.

This flexibility, of course, comes at a cost: data staleness and the need for conflict resolution. If a user sees an outdated copy of the data, this can lead to confusion or incorrect operations. That's why, when I opted for eventual consistency, I had to think upfront about how to manage these potential inconsistencies and how to explain them to the user.

Real-World Choices: My Experiences

Finding a balance between strong and eventual consistency in my architectural decisions was usually shaped by business requirements and performance expectations. When developing an ERP for a manufacturing company, strong consistency was indispensable for modules like raw material inventory tracking or shipment lists. Incorrect stock information could halt the production line or lead to incorrect shipments. Here, an average of 80-100 critical transactions occurred per second, and the accuracy of each was vital.

However, for production speed indicators on operator screens in the same ERP, or for real-time visualization of sensor data across the factory, eventual consistency was more than sufficient. Since it was a metric changing every second, seeing it with a 5-second delay wouldn't bother anyone. In fact, this allowed us to process thousands of data points per second and provide near real-time dashboards. This dramatically reduced the load on the database, improving the overall system performance.

For example, a stock table in PostgreSQL operating with strong consistency:

CREATE TABLE stoklar (
    urun_id UUID PRIMARY KEY,
    miktar INTEGER NOT NULL,
    son_guncelleme TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

And a production statistics table operating with eventual consistency:

CREATE TABLE uretim_istatistikleri (
    operator_id UUID,
    vardiya_tarihi DATE,
    uretim_sayisi INTEGER,
    toplam_gecikme_ms INTEGER,
    last_processed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    PRIMARY KEY (operator_id, vardiya_tarihi)
);

My approach to these two tables was completely different. In the stock table, every transaction had to be committed instantly and propagated to all replicas. Production statistics, on the other hand, could be collected by a worker every 30 seconds and updated in bulk. This approach allowed different parts of the system to be optimized for different needs.

Managing Eventual Consistency: Mechanisms and Tactics

Working with eventual consistency doesn't mean "we can ignore the data" just because "data will eventually be consistent." On the contrary, it requires much more careful planning of inconsistency moments and how they will be resolved. In my projects, I used a few core mechanisms and tactics to manage this situation:

Idempotency: Using an idempotency key to ensure that the same request is processed only once, even if it arrives multiple times, became indispensable for my APIs. Especially in payment systems or critical operations, resending the same request due to network errors was a common scenario I encountered.
```
POST /api/orders HTTP/1.1
Content-Type: application/json
Idempotency-Key: e4d8c7a6-f1b2-4c3d-9e5a-7b0f1c2d3e4f

{
    "product_id": "prod_abc",
    "quantity": 1
}
```
Thanks to this Idempotency-Key, the server can reject a second request with the same key without processing it, or return the result of the first operation.
Transaction Outbox Pattern: I frequently used the Transaction Outbox pattern to ensure that two separate operations, such as writing a record to a database and sending a message to a message queue, were atomic. This pattern ensured that if the database operation was successful, the message would definitely be sent. Otherwise, inconsistencies could arise between the data written to the database and the state in the message queue.
Conflict Resolution: If two different users update the same data concurrently, and these updates are processed on different servers, a conflict arises. In this case, strategies such as last-writer-wins, merge, or custom logic need to be defined. When designing the synchronization mechanism for one of my side products, I incorrectly implemented the last-writer-wins logic and realized some updates were being lost. Users continued to see old data for a while, which was annoying. I then solved this problem by adding versioning and custom merge logic.

⚠️ The Deceptiveness of Eventual Consistency

The word "eventual" can sometimes lead developers to think "it will sort itself out somehow." However, this means you need to actively design how and when inconsistencies will be resolved. Otherwise, you might experience silent data loss or long-term inconsistencies in your system.

Decision Criteria: Finding the Right Approach

Choosing between strong and eventual consistency is not a "good" or "bad" decision; it depends entirely on your needs and the context of your project. I usually make a decision by considering the following criteria:

Criterion	Strong Consistency	Eventual Consistency
Business Domain	Finance, accounting, inventory management, order processing	Social media feeds, IoT data, analytics, logs
Data Criticality	High (data loss or inaccuracy unacceptable)	Medium-Low (temporary inconsistency tolerable)
User Experience	Expectation of instantly updated data	Slight delay or stale data acceptable
Performance Needs	High data accuracy, low transaction volume	High transaction volume, low latency (on writes)
Scalability	Horizontal scaling difficult, more costly	Horizontal scaling easy, more cost-effective
Cost	High (infrastructure, maintenance)	Lower

My general approach is usually to start with the simplest solution and progress towards complexity as needed. If business requirements don't absolutely demand strong consistency, starting with eventual consistency provides more flexibility and scalability. In critical areas, it's easier to add strong consistency guarantees or isolate those parts. For example, in a manufacturing ERP, I used a separate, highly strong consistent database for financial transactions, while building a separate, more flexible eventual consistency architecture for production tracking screens.

Common Mistakes and My Solutions

I made many mistakes while using these two consistency models and learned lessons from them. One of the most common mistakes was misinterpreting the word "eventual" and assuming that data would "somehow" become consistent. This could lead to data loss or long-term inconsistent states.

When designing the synchronization mechanism for one of my side products, I forgot to add a manual synchronization trigger. Users needed a mechanism to manually synchronize their data when they thought it wasn't updated. I, on the other hand, assumed everything would be handled automatically in the background. As a result, some users continued to see old data for a while, leading to annoying feedback. After this incident, I understood the importance of always adding a transparent feedback mechanism to the user, such as "data is not current right now, you can sync it if you want."

Another mistake was designing a workflow that expected strong consistency in a system with eventual consistency. For example, triggering another operation immediately after an order was created, assuming that the stock deduction had instantly occurred. However, if stock information was updated with eventual consistency, this could lead to incorrect decisions or problems like over-selling. In such cases, I had to redesign the workflow to account for eventual consistency or add a waiting mechanism (e.g., a polling mechanism) between critical steps. This was often something I solved with systemd timers or delay queue features in a message queue.

# Example of regularly running a consistency check script with a systemd timer
# /etc/systemd/system/check-consistency.service
[Unit]
Description=Data consistency check

[Service]
Type=oneshot
ExecStart=/usr/local/bin/check_consistency_script.sh

# /etc/systemd/system/check-consistency.timer
[Unit]
Description=Run data consistency check every 5 minutes

[Timer]
OnBootSec=5min
OnUnitActiveSec=5min

[Install]
WantedBy=timers.target

With such timers, I made the "eventual" part of eventual consistency more reliable by performing consistency checks in the background.

Conclusion: The Right Choice Based on Context

In conclusion, there is no "one-size-fits-all" solution in our architectural decisions. Both strong consistency and eventual consistency approaches have their own advantages and disadvantages. The important thing is to find the right balance by thoroughly understanding your project's requirements, budget, team's expertise, and performance goals. In my experience, a hybrid approach often yields the best results: maintaining strong consistency for critical workflows while leveraging the flexibility of eventual consistency in other areas that require scalability and performance.

Remember, every decision we make when designing a system comes with a trade-off. The important thing is to be aware of these trade-offs and find the most suitable solution. For those who want to delve deeper into this topic, you can refer to my article [related: error management in distributed systems].