Back in 2020 Citigroup accidentally sent almost USD$900 million to Revlon's creditors. About half the money was returned, but 10 creditors held on to the money and a judge has just ruled that they can keep the money.
In short, Revlon sent Citi a few million for them to transfer to creditors as an interest payment. A Citigroup employee didn't fill out a form in a program correctly, so Citi paid off Revlon's loan in full using their own money. Oops.
Here's the form, can you see the problem?
(sourced from the court record)
Thought not. In order to not send lots of money you need to check all of the PRINCIPAL, FUND and FRONT boxes.
What can we learn from this?
The real question is what can the rest of us learn from this mistake? Not that people make mistakes or should be fired (three different people looked at and approved the transfer), but the importance of good UI. To me, there are three main takeaways.
Safe defaults
Usually we should make the common case the default, that's just good UX. But when one of the options is dangerous and the other safe, make the default the safe one.
Think about your ops tools, how many have --dry-run
flags instead of --for-realsies
? A repeated trend I've seen in outages is someone ran a command, didn't realise they were targeting production and oops.
UI states should match business logic states
Is every combination of checkboxes actually valid? If you have redundant information in the UI you should either consolidate it, or error on mismatch. Either you know what the user wants, in which case you didn't need to ask for it in the first place, or you don't, in which case don't guess. As with all rules you may want to bend this one for convenience but the dangerous an operation the more pedantic you should be.
Adding more reviewers only goes so far.
A common pattern after outages is to add more reviewers for changes: "someone ran the foobar tool and took out a datacentre, now all users of the foobar tool need a second person to check the command and approve before running it". In a pinch it can help, but it's a short-term mitigation. It slows you down, takes more person-time, randomises people, and still doesn't give you as much safety as improving the tool itself.
Obligatory Disclaimers
- The above is a very brief summary of what happened, it's more complicated than I made it out to be. Please read the linked article and court record for more details
- All opinions are my own and do not represent my employer
Top comments (0)