DEV Community

Beck_Moulton
Beck_Moulton

Posted on

The Impossible Normalization: Why Your Database Hates Biology 🧬

We need to talk about User Objects.

If you’ve been in web dev for more than five minutes, you’ve probably built an e-commerce backend or a todo app. The data modeling usually goes something like this:

  1. A User exists.
  2. A Product exists (it has a SKU, a Price, and a Description).
  3. The User buys the Product.

It’s clean. It’s deterministic. If I buy a Red T-Shirt (Size L), that object doesn't change. It sits in a warehouse, it gets shipped, and it arrives as a Red T-Shirt.

Now, imagine that Red T-Shirt could spontaneously change its size depending on the weather, or describe itself as "sort of reddish-pink but hurts when you touch it."

Welcome to HealthTech.

I've been wrestling with medical data schemas lately, and I want to share the absolute engineering headache that is mapping biological variance—messy, fluid human problems—into rigid database tables.

The "Simple" Symptom Trap

Let's say you're building an app for patients to track their symptoms. The junior dev in us immediately thinks: "Easy. SQL table."

CREATE TABLE symptoms (
  id SERIAL PRIMARY KEY,
  user_id INT,
  name VARCHAR(255), -- "Headache"
  severity INT,      -- 1 to 10
  created_at TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

Ship it, right?

Wrong.

Two days later, a user writes in: "My headache is a 7/10, but it’s specifically behind my left eye and it pulses when I stand up."

Your name column just broke. You can't just store string "Headache". You need location (left eye), quality (pulsating), and aggravating factors (standing up).

So you think, okay, I'll switch to a NoSQL document store. I'll just dump a JSON blob!

{
  "symptom": "Headache",
  "meta": {
    "location": "Left retro-orbital",
    "quality": "Pulsating",
    "trigger": "Orthostatic"
  }
}
Enter fullscreen mode Exit fullscreen mode

Great. Now try to query that across 10,000 users to find everyone with "head issues." You can't, because User A wrote "Headache" and User B wrote "Migraine" and User C wrote "My head hurts."

Enter the Ontologies: SNOMED-CT and LOINC

To solve this, the industry invented standard terminologies. The big boss here is SNOMED-CT. It assigns a unique code to basically every medical concept in existence.

A headache isn't "Headache". It's SCTID: 25064002.

This sounds like a developer's dream (normalization!), but it turns into an implementation nightmare. Biology is a graph, not a list. A specific type of headache is a child of "Pain", which is a child of "Clinical Finding".

To store this properly, we usually look toward FHIR (Fast Healthcare Interoperability Resources). It's the standard for exchanging this data. But have you seen a FHIR object?

Here is a simplified JSON for a mere observation of body temperature:

{
  "resourceType": "Observation",
  "status": "final",
  "category": [
    {
      "coding": [
        {
          "system": "http://terminology.hl7.org/CodeSystem/observation-category",
          "code": "vital-signs",
          "display": "Vital Signs"
        }
      ]
    }
  ],
  "code": {
    "coding": [
      {
        "system": "http://loinc.org",
        "code": "8310-5",
        "display": "Body temperature"
      }
    ]
  },
  "valueQuantity": {
    "value": 39.1,
    "unit": "degrees C",
    "system": "http://unitsofmeasure.org",
    "code": "Cel"
  }
  // ... timestamps, performers, device used ...
}
Enter fullscreen mode Exit fullscreen mode

We went from temperature: 39.1 to a 30-line JSON object.

Why? Because in biology, context is everything. A temperature of 39.1 taken orally is different from one taken axillary (armpit). If you don't store the metadata, the data is medically dangerous.

The Map is Not The Territory

The biggest philosophical hurdle we face as engineers here is that human health is continuous, but databases are discrete.

When a user selects "Moderate Pain" from a dropdown, we are collapsing a complex, fluid biological sensation into a distinct enum. We lose resolution.

I actually wrote more about this specific loss of data resolution in my personal blog recently—if you're into the deeper architectural theory of this, check out my other articles.

The challenge is creating a UI that feels human (e.g., "Where does it hurt?") while backend mapping it to these rigid, terrifyingly complex ontologies like FHIR and SNOMED without the user ever knowing.

So, what's the solution?

There isn't a silver bullet, but here is what seems to work best in the wild:

  1. Store the Raw Intent: Keep what the user actually typed/clicked. Never throw away the source of truth.
  2. Map Later: Use an ingestion pipeline to map "My head hurts" to SCTID: 25064002. Don't force the user to speak SNOMED.
  3. Hybrid Schemas: Use relational columns for the high-level stuff (Patient ID, Date) and JSONB columns for the clinical payload (the FHIR resource).

It’s messy, it’s frustrating, and it’s definitely harder than selling T-shirts. But hey, at least it’s never boring.

Anyone else here suffering through HL7/FHIR integrations right now? Let me know in the comments so we can cry together. 😭

Top comments (0)