There's this famous database design problem about account balance. Should you store the account balance in a column, or in a table of transactions and calculate it on the fly whenever you need it?
If you have an account balance column, you'll always depend on the back-end application, in database triggers or something like this to update the field.
If you have a transactions table, you'll have to process a lot of data to get this information, that can cost I/O and time. That seems not to scale well.
And, in my view, both can run into concurrency issues, right?
My question is: how banks do it?
Top comments (9)
Search around for event-sourcing. It is a standard way (ledger like) to keep important data and events. It is like your 2nd version, but to tackle the downside of processing, you can take snapshops every now and then. basically you can generate the 1st version if you have the raw data (2nd).
Event sourcing really blew my mind; Redux uses event sourcing to create predictable application state from events. At a high level, if you understand Redux, you understand how bank transactions might work.
Search around for CQRS-ES ! CQRS is powerful combined to ES !
I've never written software for banks/financials, but the event sourcing guy (Greg Young) did...
In your #2, the bank probably wouldn't recalculate the balance on every reading. They would keep the transactions as the source of truth, and the account balance in a secondary model (for example, a different table). As new transactions are entered, the secondary model is also updated to reflect the effect on the account balance. So there isn't a heavy cost in reading balance information. In effect it is doing the same thing as the account balance column, but it is an optimization, not the source of truth. I'd also recommend looking at Pet Helland's article, Accountants Don't Use Erasers.
At the end of the day, you have to find out how the business itself works and model software to match that. Concurrency and other issues are human problems as well as digital ones (how often do you see two people writing on the same piece of paper at the same time?), so the business likely and necessarily already has policies and procedures in place to deal with it. I usually find that modeling the way the business works obviates many technical challenges.
The truth is, they do both and more.
The accounts table has account information and is updated with a running total balance along with other items such as amount of overdraft/credit available and number of account holders. This table is used for day to day display of account data to end users and customers.
Thenas you rightly say there is a transactions table. This lists out each discrete transaction along with detail on type, references, amount etc.
It is rare that the bank will use the transactions data to calculate an account balance but it is user for reconciliations that run. The transactions table is also heavily used by analysts and fraud systems to monitor for transaction fraud and velocity changes.
Aside from the above two tables they also tend to have a transactions history table and the same for accounts. These become the long log of data you would expect to see whilst the main tables are typically limited. This is why many banks can only show X days or y months of transaction data to customers when they login to an app or online. The live tables become quicker to process since they aren't holding all of the data.
A thumb rule, never create calculated fields or columns in your DB. That's always for your front end apps or views to do as long as you keeping the details of debit/credit transactions that's enough to get the balance right into a report or UI.
That's more flexible to view balance in a more flexible way like opening balance and closing.
That’s actually a great design rule, but what to do when the database grows big? To calculate the balance based on the transactions in this situation can cost a lot, right?
Transactions data are always big in size specially at banks, the most transnational systems could be there. Having a calculated field or a separate field in your DB has higher cost than calculating it in a report or a view, users don't view their balances as much as they transact. If you do double transactions with each request: one to insert the transaction value and type and another one to update balance, that's bad for the performance. You want first your transactions to be fast as much as possible. Triggers impact performance if its one of the ways to do it.
Normally banks don't allow you to view balance over 10 years time in one shot, for example my current bank allows 3 months back on their online interface or to download. There is a limit for everything, so showing a huge amount of transactions for the user like over a year in a one click and expecting it to be fast is not achievable.
That's why in HTTP we have 201 response that can be used for long running jobs, for example you will request a balance report for a year period and the bank system will receive the request and handle it by running a background job probably over non-peak time and the then send back over email...different ways to handle this.
Field in a no-SQL, fast response data store (Mongo, DynamoDB, etc).
Transaction table for long term historic storage.
Yes, you will need logic to keep the column up to day based on the transaction table.
How do banks do it, deep down inside nearly every single back is a room that contains a bunch of IBM AS/400, AS/800, or similar. These massive cabinet sized Unix machines run COBOL/FORTRAN. Every other tech is an abstraction of this system