Skip to content

Data Quality: Technical Debt From Hell

Miguel Barba on September 24, 2017

This post was originally published here. Technical debt can take many forms as most of you probably know it. Data quality is just one of those f... [Read Full]
markdown guide

Nice post. I'm the author of an upcoming book (in early access program now) called The Art of Data Usability which is at its core about data quality. I've never thought of data quality as technical debt. That's a really nice way to frame it. I really like it :)

One thing I'd recommend is setting up monitoring of your quality attributes (like the incoherency you talk about). You monitor the attributes to make sure the quality continues to stay at the level you want (that you don't start collecting technical debt again) but you do it from the start (when you start the working on lowering the debt) to know when you've reached that level of quality. As you said, we start making mistakes, have a bad day or something. Monitoring quality helps us stay focused.

You can think of it as data quality tests. You monitor afterwards for regression testing and you develop the metrics and monitoring before you start as some sort of a TDD approach.

Again, a really good post and a fresh perspective on data quality.


Thanks for the feedback.

And congrats on your book, by the way!

"One thing I'd recommend is setting up monitoring of your quality attributes" - Yes, that would be the ideal scenario and it used to happen here but unfortunately the team responsible for doing it is from another department and our priorities and approaches to problem solving aren't always as aligned as they should be, so this ends up having a negative impact when it comes to detect and correct data issues on a regular basis.


I was reading the latest post by John Allspaw when I realized that it sums up perfectly the concept I was referring to when I wrote this post:

"My main argument isn’t that technical debt’s definition has morphed over time; many people have already made that observation. Instead, I believe that engineers have used the term to represent a different (and perhaps even more unsettling) phenomenon: a type of debt that can’t be recognized at the time of the code’s creation. They’ve used the term “technical debt” simply be- cause it’s the closest descriptive label they’ve had, not because it’s the same as what Cunningham meant. This phenomenon has no countermeasure like refactoring that can be applied in anticipation, because it’s invisible until an anomaly reveals its presence."

Feel free to read the complete post here, because it's quite worth it!


Painful stuff. One area I have seen recently is not owning or having a solid handle on the full domain model of one's data. Or even just being at the whim of a third parties representation of it. Huge effort behind this if it's not considered from the beginning or early on.

code of conduct - report abuse