Hidaya Vanessa

Posted on Jan 19

Consuming APIs from a Backend POV: Normalizing Data Across Multiple Endpoints

#programming #python #architecture #api

How my bookstore project taught me that APIs don’t always tell the same story

I’ve been building an online bookstore called Hearthside Reads.
Nothing fancy just books, authors, and the metadata that makes a bookstore feel complete: ISBNs, descriptions, covers, the usual stuff.

Like most people, I didn’t want to manually enter all this data, so I did what made sense.

I turned to external APIs.

Where it all started

The first API I used was the Open Library API.

It worked… but not consistently.

Some books came back without ISBNs. Others had no descriptions. In some cases, the author data felt a bit off or incomplete.

At first, I thought:

“Maybe this is just how it is.”

But I wanted richer data, so I added a second source: Google Books API.

My thinking was simple:

If one API is missing something, the other one probably has it.

And that part was true.

What I didn’t anticipate was the new set of problems that came with it.

Where things started getting messy

Once I started consuming data from both APIs, I noticed a few things almost immediately:

The same book showed up more than once
Author names were formatted differently
ISBNs existed in one response but not the other
Descriptions didn’t always match

Same book.Different versions of the truth.

Here’s a simplified example.

Open Library response

{
  "title": "The Hobbit",
  "authors": [{ "name": "J.R.R. Tolkien" }],
  "isbn_10": ["0345339681"]
}

Google Books response

{
  "volumeInfo": {
    "title": "The Hobbit",
    "authors": ["J. R. R. Tolkien"],
    "industryIdentifiers": [
      { "type": "ISBN_13", "identifier": "9780345339683" }
    ]
  }
}

Both are correct. Both describe the same book But if you store this data as it comes, you’re asking for trouble.

The real problem (that took me a while to see)

The problem wasn’t Open Library and itcertainly wasn’t Google Books. The problem was me assuming external APIs would agree with each other.

They don’t.

Each API has its own structure, priorities, and idea of what “complete” data looks like. That’s when I ran into the concept that quietly fixed everything:Normalization.

So… what is normalization?

In the simplest terms:

Normalization is deciding what your data should look like, then forcing everything else to conform to it.

For non‑techies:

It’s cleaning and standardizing information before saving it
It’s making sure one book doesn’t end up with five slightly different identities

For techies:

It’s mapping external API responses into a single internal schema

Either way, the idea is the same:

One system. One structure. One source of truth.

Why normalization actually matters

Before normalization, I had:

Duplicate books in my database
Inconsistent author names
Unreliable ISBN lookups

After normalization:

One book = one record
Predictable fields
Much cleaner logic downstream

It’s one of those things that doesn’t feel exciting, but quietly saves you hours of debugging later.

Achieving Normailzation

Step one: Decide what a “book” means to you

Before touching any API logic, I had to answer a simple question:
"What does a book look like inside my system?"

Here’s the structure I settled on:

Book = {
    "title": str,
    "authors": list[str],
    "isbn_10": str | None,
    "isbn_13": str | None,
    "description": str | None
}

This became my reference point.

Anything coming from outside had to be reshaped to fit this.

Step two: Normalize each API separately

Instead of mixing logic, I treated each API independently.

Open Library normalization

def normalize_openlibrary(data):
    return {
        "title": data.get("title"),
        "authors": [a.get("name") for a in data.get("authors", [])],
        "isbn_10": data.get("isbn_10", [None])[0],
        "isbn_13": data.get("isbn_13", [None])[0],
        "description": data.get("description")
    }

Google Books normalization

def normalize_googlebooks(data):
    info = data.get("volumeInfo", {})

    isbn_10 = None
    isbn_13 = None

    for identifier in info.get("industryIdentifiers", []):
        if identifier["type"] == "ISBN_10":
            isbn_10 = identifier["identifier"]
        elif identifier["type"] == "ISBN_13":
            isbn_13 = identifier["identifier"]

    return {
        "title": info.get("title"),
        "authors": info.get("authors", []),
        "isbn_10": isbn_10,
        "isbn_13": isbn_13,
        "description": info.get("description")
    }

At this point, both APIs were finally speaking the same language.

Step three: merging without duplicating

Normalization gets your data into the same shape.Merging decides which data wins.

My rules were simple:

Prefer ISBN‑13 when available
Use Google Books as a fallback for missing descriptions

def merge_books(primary, fallback):
    return {
        "title": primary["title"] or fallback["title"],
        "authors": primary["authors"] or fallback["authors"],
        "isbn_10": primary["isbn_10"] or fallback["isbn_10"],
        "isbn_13": primary["isbn_13"] or fallback["isbn_13"],
        "description": primary["description"] or fallback["description"],
    }

Nothing fancy. Just clear rules.
The mental model that helped me was I started thinking of APIs as raw ingredients, normalization as the recipe and the database as the final dish.

If you skip the recipe, you still get food just not something you’d confidently serve.

What I took away from this

APIs don’t owe you consistency
More data sources = more responsibility
Normalization isn’t optional once you scale

Most importantly, I learned that backend work isn’t just about fetching data. It’s about deciding what truth looks like in your system and enforcing it.

If you’re consuming multiple APIs and things feel slightly off, normalization is probably the missing piece.

Happy building 🚀