DEV Community

Jon Staab
Jon Staab

Posted on

2

If in doubt, list it out

Today (and for the last few weeks), I've been dealing with a bad technical decision I made about three years ago, which has suddenly come back to bite me, as the hidden requirements have suddenly come to light. I want to walk through my situation, and how I'm learning the painful lesson that it's best to enumerate domain objects rather than mush them into a lossy aggregate.

Background

I work on an inventory management platform for consignment stores — when I was hired to build the system from scratch with no prior domain experience, it sounded simple.

It's not simple. Consignment normally works with unique inventory, except when it doesn't. In addition, you have two models of paying suppliers living side-by-side: traditional retail where you pay your suppliers when you acquire the inventory (or thereabouts), and consignment, where you pay your suppliers when the inventory sells.

To make this worse, it's possible for both payout models to be applied to a single SKU, either for the same instance of an item, or for multiple instances over time.

Where I went wrong

In the name of "saving space", I compressed every item in the system into a single row in the items table.

There's nothing necessarily wrong with this, as we keep an event log that lists every update to an item, so we can get at the item's state as of a point in time (e.g., when it was taken into inventory, or sold). We also keep various special purpose logs, including balance accruals and item status changes to help us decouple supplier payout state from the current state of the item.

But the code started to smell when we had to shoehorn the concept of "status" (a concept having to do with unique inventory) into our plural-cardinality model: is this item sold? lost? in stock?

I accommodated this with a jsonb field in our database that kept track of how many items for the SKU could be attributed to a given stock, e.g., {"active": 3, "sold": 1, "lost": 1}. I knew it was a bad design, but I couldn't predict how it might cause problems, so I left it.

Discovering my mistakes

I'm now writing customer-facing inventory analytics code, which includes calculating Cost of Goods Sold. Because the state of any given enumeration of an item has been lost (consignor split, price, and cost) due to subsequent edits, I now have to add a new table called "item_sales", which lists the attributes of an item needed to calculate COGS as of the time of the sale, and fill it by replaying the relevant events.

Also, because we need to support partial refunds, I can't compress these records into one per sale, with a record of quantity; I have to mark each enumeration of an item as refunded independently.

So I'm left with the design I tried to avoid in the first place, except it's ad-hoc and very difficult to maintain and understand. If I had enumerated every item instance in the first place, I would have had all the COGS data as of the sale date. And once an item sold I could have "locked" those instances to avoid mutating the data even if its un-sold brethren were modified.

Conclusion

We left designing analytics until we already had two years worth of customer data. Analytics is pretty much your worst-case scenario for understanding your data, and contrasts strongly with a CRUD-paradigm view of what's true now with no regard for history. Even an event log (though useful for migrating to a new schema) won't save you, since events often have to be procedurally re-played rather than simply aggregated.

Learn from my mistakes — if in doubt, list it out. It's as Rich Hickey says when evangelizing for Datomic: storage space is cheap, developer time isn't. Keeping redundant copies of your data is redundant until it isn't. You have to understand what's possible in your domain, and point your design in a direction that will handle that from every angle.

Quadratic AI

Quadratic AI – The Spreadsheet with AI, Code, and Connections

  • AI-Powered Insights: Ask questions in plain English and get instant visualizations
  • Multi-Language Support: Seamlessly switch between Python, SQL, and JavaScript in one workspace
  • Zero Setup Required: Connect to databases or drag-and-drop files straight from your browser
  • Live Collaboration: Work together in real-time, no matter where your team is located
  • Beyond Formulas: Tackle complex analysis that traditional spreadsheets can't handle

Get started for free.

Watch The Demo 📊✨

Top comments (0)

Image of DataStax

Langflow: Simplify AI Agent Building

Langflow is the easiest way to build and deploy AI-powered agents. Try it out for yourself and see why.

Get started for free

👋 Kindness is contagious

Explore a trove of insights in this engaging article, celebrated within our welcoming DEV Community. Developers from every background are invited to join and enhance our shared wisdom.

A genuine "thank you" can truly uplift someone’s day. Feel free to express your gratitude in the comments below!

On DEV, our collective exchange of knowledge lightens the road ahead and strengthens our community bonds. Found something valuable here? A small thank you to the author can make a big difference.

Okay