Untangling the Rails Monolith - quick look at the database

#ruby #rails #ddd #database

Having worked in a few quite big Rails projects already, I started to appreciate domains separation.

Each component should work as an (almost) independent service. This reduces complexity and dependency on other areas. This reduces development time and allows the team to operate faster.

Simplier code = quicker code reviews = faster releases

I have a monolith, what do I do

Every project has some areas that it can be split into. In most apps there're probably areas of Users management, Payments processing, Reports generators, X management, etc...

Of course we need to know if User X has already paid for theirs subscription, or what bank account was used to process last payment, but not necessarily all of this data has to be kept in one or two models.

We can manage Users - their personal info, contact data, preferences - without knowing on which subscription they currently are. When changing profile picture we don't have to know if they already added a credit card, or how many resources of X they already added to the system.

Where I'm going with this is - each area can operate independently from the others - and we should always try to decouple as much as possible and be diligent about keeping the context boundaries clear.

It's rather rare scenario that we end up in greenfield project where we can design everything from scratch (which btw isn't that easy as it might seem), so let's have a look on some legacy app and how we could implement small changes that would benefit in future.

Untangling legacy monolith - database level

I guess it's common scenario - let's prepare some MVP and check if it works...

Oh, it actually makes sense and there's demand for that! Let's build something more on top of it!

Oooh, there's even more demand, let's add some more features...

And the development lifecycle goes on and on, but the foundation stay the same - the small MVP project that wasn't prepared to scale that much. So we might end up with a tables similar to this:

# schema.rb

create_table "employees", ... do |t|
    t.string "uuid"
    t.string "name"
    t.string "email"
    t.integer "organization_id"
    t.boolean "active"
    t.string "onboarding_status"
    t.string "preferred_working_days"
    t.string "contact_preference"
    t.boolean "contract_up_to_date"
    t.string "payment_method"
    t.string "payment_details"
    t.string "address_street"
    t.string "city"
    t.string "zip"
    t.string "country"
    t.string "phone_number"
    t.string "notes"
end

It's a mess, but since business demanded, the devs had to deliver. But it could be done in a slightly more future-proof way.

We could split this table into at least three smaller ones, making them faster to operate with (in terms of performing queries and cognitive load for developers).

# schema.rb

create_table "employees", ... do |t|
    t.string "uuid"
    t.string "name"
    t.string "email" # might be a contact_detail but it's often used to find given employee by it's email
    t.integer "organization_id"
    t.boolean "active"
    t.string "notes"
    t.index ["email"]
end

create_table "employee_settings", ... do |t|
    t.string "preferred_working_days"
    t.boolean "contract_up_to_date"
    t.integer "employee_id"
    t.string "onboarding_status"
end

create_table "employee_payments", ... do |t|
    t.string "payment_method"
    t.string "payment_details"
    t.integer "employee_id"
end

create_table "employee_contact_details", ... do |t|
    t.string "address_street"
    t.string "city"
    t.string "zip"
    t.string "country"
    t.string "phone_number"
    t.string "contact_preference"
    t.integer "employee_id"
end

Each table represents separate area of employee "details". Those are related to payments, contact details or some generic ones (which with time might grow up to another big table that could be split into smaller ones).

How can we strive to achieve this kind of structure in a legacy app?

Every time we need to add some new setting, detail, boolean, etc. we should ask a question - can it be moved into a separate table? Is it crucial to keep in the same table as other data?

Every time we add new column to existing table, it might be an opportunity to initiate decoupling existing monolith. Start with small steps, but with time you'll see the benefit.

Need to store employee's t-shirt size that should be ordered? Let's launch a new table with employee_equipment - maybe in future you'll have to store their shoe size, or glove size if they work in a warehouse.

Look for opportunities to start with clean tables. Adding new column to existing ones is the easiest, but migrating data to new tables is a lot of work to do it safely and with no disruptions for the whole system.

Apart from reading task requirement "store employee's tshirt size", think about the big picture - how in future this might impact employees table? How can it be done in a more efficient way?

Untangling the code

In the next post I'll explore an approach which I find very usefull in setting clear boundaries of domains, while keeping the data flow between them. This, again, reduces the code coupling, complexity and probability of unexpected code behaviour. I'll talk about APIs inside the monolith.