Discussion on: Making The Invalid Impossible - Choosing The Right Data Model

View post

This is fine for synchronous method calls but will fail in distributed systems where asynchronous message passing means no service can determine whether or not some other service has updated a value (assuming we're not using update in place to a database record, for instance). For this particular problem, you need conflict-free replicated data types (CRDT).

If you had multiple instructors submitting grades for the same student (say in a class with Teaching Assistants), and you were the Dean facing a couple of anxious parents out in the hallway who are concerned that their child is going to fail out and cost them 10s of thousands of dollars, you would need to be able to tell that any particular update from an instructor was authoritative, or at least your data type would need to record any discrepancy so that a human (or Dean) could arbitrate the definitive value, or the system would need to implement a strategy for determining which conflicting write was the definitive one.

Frank Rosner • Nov 6 '17

Thanks for the comment! I agree with you that the solution is certainly not meant to work in a distributed system. The focus was more about how you can reduce the amount of possible state, especially by avoiding invalid state. I believe that this is true for both local and distributed systems.

In order to avoid conflicts with concurrent access you can use locking. In this case I believe that optimistic locking would be a good fit as it is probably not so likely to have a conflict when updating. In case you do, you'd have to retry.

If you need to synchronize replicas of this data structure you can store the data in a distributed data store that has mechanism to find consensus. Raft, e.g., is a very simple consensus algorithm that would allow you to have many replicas of the data but still be consistent.

Did I understand what you were refering to? How would you implement the solution using CRDTs?

maphengg • Nov 6 '17

Locking is fine in a single instance of an application but doesn't help in a distributed system where availability is the priority over consistency of data.

Frank Rosner • Nov 6 '17

I agree. You have to understand whether you want availability or consistency if there is a network partition. Based on your requirements you should select the coordination algorithm that fits your needs.

Aleksandr Sorokoumov • Nov 6 '17 • Edited

Hey @maphengg ,

Even in the presence of concurrent record modification, the last data model @frosnerd proposed would not lead to an inconsistent state.
For example, there is no sequence of transactions that would lead to CourseResult containing a different number of students and records.

Could you please elaborate how would CRDTs help in the situation you describe? It is not clear to me how different grades for a single student can be converged to a conflict-free state.

maphengg • Nov 6 '17 • Edited

If you have multiple instances of the service and multiple users hitting different instances of the service at the same time in different data centers, you will have conflict. CRDTs are designed for this very situation. When I found this article I thought it would be about event sourcing, version vectors, or something related to that topic. All I wanted to do was point out that there's a narrow band of applications that this solution is appropriate for.

Frank Rosner • Nov 6 '17

Thanks for sharing! I think there is a misconception here. What I presented is not a solution. I was discussing ways of avoiding invalid state by choosing the right data model. It has nothing to do with distributed systems, multiple data centers, etc.

Sorry that you didn't find what you expected. Would you like to share some more information about event sourcing, version vectors, etc.? I am very interested :)

Despite the fact that my post did not present a solution, I disagree that there is a narrow band of applications where making invalid state impossible is appropriate. In fact I believe that choosing a data model where you cannot even represent invalid state is appropriate in every application out there. May it be distributed or not.

maphengg • Nov 7 '17

The narrow band for this solution is a single-threaded application. That's totally fine; I just thought I would point out that it applies to that set of applications. What I expected to get from the title has to do with my own interests. I misread. Case closed.