DEV Community

Cover image for Data Constraints: From Imperative to Declarative
Kevin Burns
Kevin Burns

Posted on • Edited on

Data Constraints: From Imperative to Declarative

There once was a client who wanted to port a Lambda from Typescript to Go. This was a Lambda with a capital L. Thousands of lines of code spread across dozens of Typescript files, many of them automatically generated database models copy/pasted from other repositories. The initiative was sound: To consolidate the business logic spread across various independent lambdas into a single application and in so doing unify a set of often inconsistent data structure definitions.

You see, the database was MongoDB which meant that the data model was whatever the developer decided the data model was when any given document was inserted into the database by any particular application. Some call this a “schemaless” database but I prefer the term “schema-on-read”.

Although MongoDB supports json schema validation on write as a functional equivalent to a database schema, this project used none. This is not surprising given that MongoDB’s marketing strategy consists primarily of incessantly trumpeting the advantages of unstructured data to unwitting javascript developers in the most obnoxious and damaging way possible.

Relational algebra is better than drugs

I successfully avoided MongoDB for my entire career until one day came a client with an interesting set of problems. Despite my aversion to MongoDB on principle, the strongest advice I could give was to not have used it in the first place which is sometimes not an option given the unreliable state of present day time travel technology.

In order to provide meaningful advice to clients on how to use MongoDB well and prevent existing implementations from degrading, I would need an expert knowledge of MongoDB and that could only come from falling neck deep for months into a real world system with truckloads of dynamic unstructured data feeding the needs of real world customers.

I captured the opportunity and took on many tasks solving complex, elusive problems which often lead to the discovery of undefined behavior. We will focus on one such task:

The lambda migration

This lambda wasn’t ready yet. The lambda was known to have bugs and edge cases and the company’s leadership (CEO, COO, CTO) would need to look at it from every conceivable angle to ensure that the business logic was consistent with their expectations before it could be ported to Go. Leadership worked together intensively behind closed doors for a week until they produced a version of the lambda that they felt met the needs of their customers.

It was my job as a contractor to port the logic from Typescript to Go. I examined the logic. Studied it judiciously. Then I began porting it to Go in such a way that I could run an integration test suite against both my new version in Go and the existing lambda in Typescript to ensure that the output and database effects were identical in all conceivable scenarios.

Once I was satisfied that the battery of tests proved that my implementation matched the intended logic, a growing suspicion arose that the internally consistent and complete nature of the business logic might indicate that it could be reduced to a set of unique database constraints.

Raising this concern with the client, I received the following response verbatim which I will treasure with all my heart until the day that I die:

“Our business logic is too complex to be expressed as database constraints.”

RIP Isaac Hayes

Naturally, I took this as a challenge. After a few hours learning how indexes work in MongoDB, I produced a pair of compound unique indices (one sparse and one full) that would impose the exact same constraints enforced by the business logic on all new and existing data.

Extending the new integration test suite to support a third side-by-side implementation quickly proved that the pair of indices did in fact produce identical behavior to the 50 or so lines of business logic (which were now safe to delete) under all conceivable scenarios.

Leadership was stunned. I had accurately transformed a set of critical business rules from imperative logic to declarative constraints. Management no longer had to reference a spreadsheet with dozens of combinatorial expected outcomes. A description of the expected behavior could be reduced to two statements in plain English.

However, collections in MongoDB are not the tidy containers of data you expect coming from a decade of experience with schema-on-write databases like MySQL and Postgres.

When applying the new unique indexes to the collection, the migration failed. The reason it failed is because there existed data in the collection which didn’t match the constraints. This data was inserted into the database at a time when the logic for the lambda(s) was different. The errant data would need to be altered or deleted in order for the constraints to be applied.

Analysis of this data showed that it drifted from the constraints in a number of different ways. These hundreds of records were reduced to a handful of ways in which they departed from the expected schema. Laid on a spreadsheet was each type of data problem, it’s first occurrence, its last occurrence, the number of occurrences, and a column to track what ought to be done about it (DELETE, FIX, or TBD) along with relevant notes.

Most of the errant rows could be discarded as complete trash with no discernable value. In some cases the problems could be fixed with a migration. However, there also existed a class of problems where the data was incorrect in a way that could be corrected, except that to do so would produce unexpected behavior for end users.

There were enough instances of data problems that would adversely affect customers if fixed that it presented a significant challenge to the business. How would they communicate to their B2B customers that they'd been overcharged as a result of a bug in the system? Would they need to issue refunds to those business customers and would those business customers need to issue refunds to their customers?

Had the imperative logic not been converted to declarative data constraints, these accounting errors would have continued unnoticed.

Issues like this are important because left unaddressed they often lead to a culture where it is generally understood (and too often accepted) that customers having longer patronage are more likely to encounter systemic defects than newer customers. This can lead to churn, damaging customer retention and brand reputation.

This pattern is interesting as it reflects the history of MongoDB itself when users left in droves after the industry became wise to the misleading durability and consistency guarantees used to achieve their early benchmarketing results. MongoDB has come a long way since then and it is now a lot better than it was at not losing data.

If there is hope for MongoDB, I believe there is hope for humanity.

Conclusion

Modeling business logic as declarative constraints from the outset is likely to save loads of time and significantly improve the quality of your software product. If you can push the line from imperative logic toward declarative constraints you will impress your peers by discovering unknown unknowns.

Top comments (0)