Derek Ziemer

Posted on Mar 12 • Edited on Mar 23

Why Bible APIs Should Be Version-Agnostic

#webdev #api #ai #javascript

When I tell developers I built a Bible reference resolver, I can see them mentally categorizing it.

Regex. String matching. Probably handles Jn316 and John 3:16 and calls it done.

That's not what this is.

The Problem Nobody Talks About

Most Bible APIs are built around a simple assumption: the developer provides a clean, well-formed reference and the API returns verse text.

That assumption breaks immediately in production.

Real users write things like:

iitim3:16-17
Genisis 1:1
acts 2:1 to 4
Rom 8:1-4, 28; 12:1-2
Ps.23, vv.1–3
ESV John 3:16
canticles 2:1
Obadiah 15

A naive parser handles maybe 60% of these. A regex handles a bit more. But every edge case you add reveals three more you hadn't considered.

The deeper problem is structural. Most Bible APIs tie their identifiers to a specific translation:

John 3:16 in KJV
John 3:16 in NIV
John 3:16 in ESV

From a developer perspective, these represent the same coordinate. But most APIs treat them as different entities. That means your stored references, your database indexes, your AI retrieval pipeline—all of it is implicitly tied to a translation choice you made at build time.

That's fragile architecture.

What Bible Software Companies Figured Out (And Kept Private)

Here's something interesting: Bible software companies like Logos and Accordance built extremely sophisticated internal engines over decades. Reference normalization, canonical coordinates, traversal systems, cross-translation alignment—all of it exists, all of it works, and none of it is public.

The public API ecosystem never caught up. Most public Bible APIs are still essentially:

GET /verse
GET /chapter
GET /search

The developer is responsible for everything that happens before and after: parsing user input, validating references, handling edge cases, building traversal logic, and managing coordinates across translations.

That's a lot of infrastructure to build yourself before you've written a single line of your actual application.

The Real Problem: Input Is Messy, Coordinates Should Not Be

The insight that changed how I thought about this:

Reference parsing and text retrieval are two different problems.

Most APIs conflate them. You send a reference, you get text back. Clean input in, content out.

But in production systems—especially AI systems—you need something in between. You need a layer that takes messy human input and produces reliable, stable, version-agnostic coordinates that your system can store, traverse, and retrieve against regardless of which translation you eventually serve.

That's what I built.

The Architecture: Two Distinct Layers

Layer 1 — Version-Agnostic Infrastructure

Human Input → Reference Integrity Engine → Canonical Coordinates

/resolve — Normalize any human input to canonical coordinates.
/expand — Atomic verse IDs from any reference.
/context — Traverse surrounding verses by canonical index.
/lookup — Reconstruct human-readable references from coordinates.
/diff — Compare two references and return canonical overlap and differences.
/range — Resolve a reference range into canonical boundaries.
/distance — Compute verse distance between two coordinates.
/slice — Retrieve a canonical verse range by verse_index.

Layer 2 — Text Retrieval (Version Required)

/scripture — Fetch verse text in a specific translation.
/passage — Resolve and retrieve in one call.
/batch — Bulk retrieval.
/search — Full-text search.

Layer 1 works independently of Layer 2. You can use the entire canonical infrastructure stack without ever calling a text retrieval endpoint. Bring your own licensed translation or vector store—the coordinate system works regardless.

The Reference Integrity Engine

This is not a parser. A parser converts structured input into structured output. The Reference Integrity Engine handles input that is not well-formed. It recovers from malformed references, resolves ambiguity explicitly, and returns deterministic outputs.

Real-World Testing Examples

Roman numeral ordinals: iitim3:16-17 → 2Tim.3.16-2Tim.3.17 ✓

Misspelled words: Genisis 1:1 → Gen.1.1 ✓

Natural language ranges: acts 2:1 to 4 → Acts.2.1-Acts.2.4 ✓

Translation identifiers: ESV John 3:16 → John.3.16 (translation: ESV) ✓

Alternative book names: canticles 2:1 → Song.2.1 ✓

Single-chapter edge cases: Obadiah 15 → Obad.1.15 ✓

Compound disjoint references: Rom 8:1-4, 28; 12:1-2 → Rom.8.1-Rom.8.4, Rom.8.28, Rom.12.1-Rom.12.2 ✓

Explicit Ambiguity

Input:
Samuel

Response:

{
  "type": "single",
  "valid": false,
  "ambiguous": true,
  "candidates": [
    {
      "key": "1SA",
      "id": 9,
      "name": "1 Samuel",
      "osis": "1Sam",
      "weight": 70,
      "score": 0.8275
    },
    {
      "key": "2SA",
      "id": 10,
      "name": "2 Samuel",
      "osis": "2Sam",
      "weight": 70,
      "score": 0.8275
    }
  ]
}

Silent ambiguity resolution is how AI systems produce confident wrong answers. Surfacing it lets the application decide whether to prompt the user or log it for review.

The Canonical Coordinate System

Every resolved reference produces stable, version-agnostic coordinates:

Response:

{
    "status": "success",
    "results_count": 1,
    "data": [
        {
            "verse_id": 45008001,
            "verse_index": 28118,
            "book": {
                "id": 45,
                "name": "Romans"
            },
            "chapter": 8,
            "verse": 1,
            "reference": "Romans 8:1",
            "osis_id": "Rom.8.1"
        }
    ]
}

These identifiers are immutable. The verse_index is a global sequential position (1 through 31,102), making traversal possible without complex client-side boundary logic.

Why This Matters for AI Systems

LLMs often hallucinate Scripture references. Any serious AI application needs a grounding layer. This stack maps directly onto AI pipeline needs:

Grounding — Use /resolve to validate LLM output before it touches your retrieval system.
Indexing — Use /expand to convert references into atomic IDs for vector stores.
RAG Context — Use /context to retrieve surrounding verses, traversing chapter and book boundaries automatically.
Citations — Use /lookup to reconstruct readable references for the end-user.

What Gets Rejected and Why

Canon-aware validation means the engine knows the physical boundaries of the Bible:

ps 23:99 → Rejected (Invalid verse in span)

1 john 1-9 → Rejected (Invalid chapter span)

great commission → Rejected (Book not recognized)

Topical phrases are rejected cleanly. The engine resolves Scripture references, not theological concepts.

The Honest Limits

English-only: Currently supports English book name recognition.

No Apocrypha: Books like Sirach or Tobit return clean rejections.

One More Thing

The biggest long-term value here is the coordinate system.

A stable, public standard for Scripture coordinates doesn't currently exist. If enough developers use a shared coordinate system, the verse_id becomes something you can exchange between systems and reference in open datasets.