LLM-powered extraction kept silently corrupting my database. Here's what I built to fix it. tags: node, llm, opensource, api

Joyal Seejo — Fri, 12 Jun 2026 09:05:39 +0000

I've been building an extraction API for the past month. The use case is specific — reading informal WhatsApp orders in mixed Hindi/English/Malayalam and turning them into structured records for Indian distributors. Something like:

"bhai 50 bags opc 53 cement calicut tuesday urgent"

needs to become:

{
  "product": "OPC Grade 53 Cement",
  "quantity": 50,
  "unit": "bags",
  "location": "Calicut",
  "delivery_date": "Tuesday",
  "urgency": "urgent"
}

Regex dies on the first message. Template matching dies on the second. The only approach that actually works is LLMs. But the moment I put it in production I hit a problem nobody warned me about.

LLMs lie about returning JSON

Not in a hallucination sense. In a more annoying sense — they return almost JSON. Things like:

Here's the extracted data:

json
{ "product": "cement", "quantity": 50 }

plaintext

Or:

Based on the message, I can identify:
{ "product": "cement", "quantity": "50" }
Note: quantity returned as string since units weren't explicit.

javascript

Both of these throw a JSON.parse error. Neither throws any other error. If you're not checking carefully, you silently skip the record or crash the job.

The failure modes I documented:

JSON wrapped in markdown code fences
Explanatory text before or after the JSON
Fields present in schema but missing from response (not null, just absent)
Type mismatches (quantity: "50" instead of quantity: 50)
Field name variations (delivery_date vs date_of_delivery)

None of these are obvious bugs. They all look like successful API calls.

The fix that actually worked

First instinct: strip markdown fences with regex. That works until the model puts fences inside the JSON for a nested code field. Then it doesn't.

What actually worked was corrective prompting. When parsing fails, instead of retrying with the same prompt, you feed the bad response back and tell the model specifically what it did wrong:

async function extract(input, schema, maxRetries = 2) {
  const messages = [
    { role: 'user', content: buildPrompt(input, schema) }
  ]

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await anthropic.messages.create({
      model: selectModel(input, schema),
      messages
    })

    const text = response.content[0].text

    try {
      const parsed = JSON.parse(text)
      return { data: parsed, attempts: attempt + 1 }
    } catch {
      // don't just retry — tell it what it did wrong
      messages.push(
        { role: 'assistant', content: text },
        { role: 'user', content: 
          'That response was not valid JSON. Return ONLY a raw JSON object. ' +
          'First character must be {. Last character must be }. Nothing else.' 
        }
      )
    }
  }

  throw new ExtractorError('Failed after retries')
}

In practice about 90% of parse failures resolve on the second attempt with this approach. The model usually knows it produced bad output — it just needs to be explicitly called out on it.

The thing I didn't expect to need: per-field confidence

Once JSON was reliable, I hit the next problem. The model would confidently return:

{ "delivery_date": "Tuesday" }

But was that Tuesday this week or next week? Was "calicut" a city name, a warehouse code, or a customer shorthand? The extraction worked but I had no idea how much to trust individual fields.

I added a _meta object to the schema contract:

{
  "product": "OPC Grade 53 Cement",
  "quantity": 50,
  "delivery_date": "Tuesday",
  "_meta": {
    "confidence": 0.87,
    "field_confidences": {
      "product": 0.99,
      "quantity": 0.97,
      "delivery_date": 0.61
    },
    "warnings": ["delivery_date is ambiguous — no week specified"]
  }
}

This changes the extraction from a black box into something auditable. A downstream system can auto-approve high-confidence extractions and flag low-confidence ones for human review. For the distributor use case this matters a lot — a wrong delivery date costs real money.

Testing LLM-dependent code without burning API credits

This took me an embarrassingly long time to figure out. The answer is obvious in retrospect: mock the SDK entirely.

// __tests__/mocks/anthropic.js
jest.unstable_mockModule('@anthropic-ai/sdk', () => ({
  default: class MockAnthropic {
    messages = {
      create: jest.fn().mockResolvedValue({
        content: [{
          text: JSON.stringify({
            product: 'OPC Grade 53 Cement',
            quantity: 50,
            _meta: { confidence: 0.97, field_confidences: {}, warnings: [] }
          })
        }],
        usage: { input_tokens: 150, output_tokens: 40 }
      })
    }
  }
}))

Now the test suite runs in under 10 seconds and calls zero real API endpoints. 56 tests, no Anthropic bill.

What I ended up with

The full thing is open source — schema-defined extraction with the reliability layer, per-field confidence, retry logic, dead letter queue for persistent failures, webhook delivery logs, and a sandbox mode (mx_test_ keys) that returns instant mock data without hitting the API.

7 pre-built schemas included: invoice, receipt, purchase order, shipment, support ticket, lead contact, job application.

https://github.com/joyalseejo/morphex-api.git

The original problem (Indian B2B text in mixed languages) is still the primary use case I'm building toward. But the reliability layer turned out to be useful for any LLM extraction pipeline regardless of language or domain.