Jack Watanabe

Posted on Mar 23

UNABLE_TO_LOCK_ROW: unable to obtain exclusive access to this record — Why Your Bulk API Jobs Fail and How to Fix It

#integration #api #middleware #salesforce

You're running a Bulk API job to update 90,000 Opportunities. It's 2am. The job finishes, and you see this:

UNABLE_TO_LOCK_ROW: unable to obtain exclusive access to this record

Not on one record. On thousands. Some records updated, others silently skipped. No retry. No recovery. Just a partial failure that nobody will notice until a VP asks why the pipeline numbers don't add up.

I've hit this error on three separate projects over the past five years. The first time, I wasted two days trying to "fix" it with retry logic before realizing I was solving the wrong problem. This article is what I wish someone had handed me back then.

If you search the Salesforce StackExchange, you'll find hundreds of threads on this error spanning over a decade. The core issue hasn't changed — and there's still no satisfying native solution.

What Actually Happens: The 10-Second Death Window

Salesforce uses pessimistic row locking on its multi-tenant database. When a transaction touches a record, it acquires an exclusive lock. Any other transaction trying to touch the same record has to wait.

The wait timeout? 10 seconds. You can't change it.

Transaction A: UPDATE Opportunity WHERE AccountId = '001xx...'
  → Acquires lock on Account record (parent rollup)
  → Processing... (takes 8 seconds)

Transaction B: UPDATE Opportunity WHERE AccountId = '001xx...'
  → Tries to lock same Account record
  → Waits...
  → 10 seconds pass
  → UNABLE_TO_LOCK_ROW ❌

Here's what tripped me up the first time: Bulk API processes batches in parallel by default. If two batches contain child records of the same parent Account, they will collide. This isn't a bug — it's how multi-tenant row locking works.

The Parent Record Trap

The record you're updating often isn't the one causing the lock. It's the parent record with rollup summary fields or triggers:

You update:     Opportunity A  (AccountId = 001xx)
                Opportunity B  (AccountId = 001xx)  ← same parent

Salesforce also locks: Account 001xx (to recalculate rollup fields)

With 90,000 Opportunities spread across 5,000 Accounts, multiple batches will inevitably contain Opportunities from the same Account. The collision isn't a possibility — it's a certainty.

Data Skew Makes It Worse

Salesforce's own engineering team defines "data skew" as an Account with more than roughly 10,000 child records. Think enterprise accounts, partner records, or catch-all "house accounts." When multiple transactions hit a skewed Account, lock contention becomes catastrophic — every batch touching that Account's children will queue up behind the same lock.

The 4 Workarounds (And Their Honest Trade-offs)

I've used all four of these in production. None of them is a complete solution, but depending on your constraints, one might be good enough.

1. Switch to Serial Mode

The simplest fix. Force batches to process one at a time:

<!-- Bulk API v1: Job creation request -->
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
  <operation>update</operation>
  <object>Opportunity</object>
  <concurrencyMode>Serial</concurrencyMode>
  <contentType>CSV</contentType>
</jobInfo>

This eliminates lock contention entirely — batches never overlap. The downside is brutal: on one project, a 90,000-record job went from 3 minutes to over 40 minutes. If your integration runs on a schedule (say, syncing before market open), that kind of slowdown might not be acceptable. But if your job runs overnight and nobody cares about elapsed time, serial mode is the lowest-effort fix.

2. Sort Records by Parent ID Before Upload

Pre-sort your data so that all children of the same parent land in the same batch:

import pandas as pd

df = pd.read_csv('opportunities.csv')
df = df.sort_values('AccountId')  # Group by parent
df.to_csv('opportunities_sorted.csv', index=False)

This reduces cross-batch collisions dramatically. In my experience, sorting alone cut lock errors by about 80% on a dataset with moderate Account distribution. But it doesn't eliminate them — batch boundaries (every 10,000 records by default) can still split a parent group. And if another process touches the same Accounts concurrently, sorting within your job won't help.

3. Use `FOR UPDATE` in Apex

If you control the Apex trigger or batch class that fires during the load, you can acquire locks explicitly:

Account[] accts = [
    SELECT Id FROM Account
    WHERE Id IN :accountIds
    FOR UPDATE
];
// Now process — the lock is already held

This fails fast instead of timing out at 10 seconds. Useful when you want to catch contention early and handle it in code. The catch: FOR UPDATE is Apex-only. External integrations calling the REST or Bulk API can't use it. And acquiring locks early means holding them longer, which can paradoxically increase contention if your transaction does heavy processing.

4. Retry with Exponential Backoff

Retry only the failed records, with increasing delays:

import time
import random

def bulk_update_with_retry(sf, records, max_retries=3):
    for attempt in range(max_retries):
        result = sf.bulk.Opportunity.update(records)

        failed = [r for r in result if not r['success']
                  and 'UNABLE_TO_LOCK_ROW' in str(r.get('errors', []))]

        if not failed:
            return result

        wait = (2 ** attempt) + random.uniform(0, 1)
        print(f"Lock contention on {len(failed)} records. "
              f"Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(wait)

        records = [r for r in records
                   if r['Id'] in {f['id'] for f in failed}]

    raise Exception(f"{len(records)} records failed after {max_retries} retries")

This works for transient contention — the occasional overlapping batch. But under sustained load, retries can create a retry storm: multiple clients retrying simultaneously, all re-colliding. I once saw a retry loop that ran for 12 minutes before finally clearing, because three integrations kept stepping on each other's retries.

Quick Comparison

Approach	Lock Errors	Throughput	Complexity	Multi-Integration?
Serial mode	0	Very slow	None	No
Sort by parent	~80% reduction	Normal	Low	No
FOR UPDATE	Fails fast	Normal	Medium	No (Apex only)
Client retry	Eventually 0	Degraded	Medium	No (retry storm risk)

When Workarounds Aren't Enough: The Multi-Source Problem

Here's the scenario that broke my retry logic and forced me to rethink the whole approach:

09:00 — Marketing automation syncs 50K Contacts      (Bulk API)
09:00 — ERP integration updates 20K Opportunities    (Bulk API)
09:01 — Sales rep triggers Flow on Account            (UI)
09:01 — CPQ recalculates 5K Quotes                    (Apex batch)

Four independent processes, all touching records that share parent Accounts. Even with sorting, serial mode, and retry logic in each individual integration, they still collide with each other. Each integration is optimized in isolation, but lock contention is a coordination problem — you can't solve it without coordinating.

The Architectural Solution: External Write Serialization

The pattern that finally worked for us is an external queue that serializes writes to the same logical entity:

                     ┌─────────────────────┐
  Marketing sync ──→ │                     │
  ERP integration ──→│   Write Serializer  │──→ Salesforce
  CPQ batch ────────→│   (external queue)  │    (sequential writes
  Manual flows ────→ │                     │     per Account)
                     └─────────────────────┘

  Key: All writes to Account 001xx go through
  the same queue, in order, one at a time.

How It Works

Intercept: All write operations route through the serializer instead of hitting Salesforce directly
Group: Operations are grouped by their lock key (typically the parent Account ID)
Sequence: Within each group, operations execute one at a time
Parallelize: Different groups (different Accounts) still execute in parallel — throughput stays high
Retry: If a write fails, only that group retries, with the queue maintaining order

What It Actually Looks Like in Practice

Before: 90,000 records → ~12,000 lock errors     → 3am incident
After:  90,000 records → 0 lock errors            → ~15% longer total processing time

The trade-off is latency. Serializing writes per Account adds queue wait time — in our case, roughly 15% overhead on total job duration. You're also operating a Redis-backed queue that itself needs monitoring and failure handling. But compared to debugging partial data failures at 3am, every team I've worked with considers that a good deal.

Implementation with BullMQ (Node.js)

Here's a simplified version using Redis-backed queues. This is close to what we run in production, minus the auth and monitoring layers:

// write-serializer.js
const { Queue, Worker } = require('bullmq');
const jsforce = require('jsforce');

const connection = { host: process.env.REDIS_HOST, port: 6379 };
const writeQueue = new Queue('sf-writes', { connection });

// Group writes by parent Account ID
async function enqueueWrite(operation, records) {
  const groups = {};
  for (const record of records) {
    const key = record.AccountId || record.Id;
    if (!groups[key]) groups[key] = [];
    groups[key].push(record);
  }

  for (const [accountId, groupRecords] of Object.entries(groups)) {
    await writeQueue.add(
      `write-${accountId}`,
      { operation, records: groupRecords },
      {
        attempts: 3,
        backoff: { type: 'exponential', delay: 2000 },
        removeOnComplete: 100,
        // BullMQ group concurrency: 1 job per Account at a time
        group: { id: accountId, concurrency: 1 }
      }
    );
  }
}

// Process writes
const worker = new Worker('sf-writes', async (job) => {
  const conn = new jsforce.Connection({ /* ... */ });
  const { operation, records } = job.data;

  const result = await conn.sobject('Opportunity')
    [operation](records, { allOrNone: false });

  const failures = result.filter(r => !r.success);
  if (failures.length > 0) {
    throw new Error(`${failures.length} records failed: ${
      JSON.stringify(failures[0].errors)
    }`);
  }

  return { processed: records.length, failures: 0 };
}, { connection, concurrency: 20 });

What's missing from this example: OAuth token refresh with pre-emptive renewal, dynamic batch sizing based on object complexity, dead letter handling (Slack/PagerDuty alerts for records that fail after all retries), Prometheus metrics for queue depth and latency, and graceful shutdown that waits for in-flight operations. A production deployment probably needs all of these.

If building and maintaining this infrastructure in-house isn't practical for your team, AsyncGuard handles write serialization, dead letter management, and retry orchestration as a managed service — it's what I built after implementing this pattern one too many times.

The Trade-offs (Because There Are Always Trade-offs)

Let's be honest about what this pattern costs:

Operational overhead: You're now running a Redis cluster (or managed equivalent) that needs monitoring, backups, and failover planning. If Redis goes down, all writes queue up — or worse, drop.
Latency increase: Writes are no longer direct. Even in the best case, you're adding a queue hop. Under heavy load, the per-Account serialization can create hot queues for skewed Accounts.
Migration effort: Every integration that writes to Salesforce needs to route through the serializer. This is a significant coordination effort, especially if you have integrations managed by different teams or vendors.
Debugging complexity: When something fails, you're debugging across two systems (queue + Salesforce) instead of one.

For a single integration with occasional lock errors, this is probably overkill. Sort by parent ID and add retry logic — that's sufficient. The write serializer pattern pays for itself when you have three or more concurrent integrations hitting overlapping records.

When to Use Which Approach

Rather than a one-size-fits-all recommendation, here's how I think about it:

Single integration, low volume (<10K records):
Sort by parent ID. Maybe add retry. Done.

Single integration, high volume or time-sensitive:
Serial mode if you can afford the slowdown. Sort + retry if you can't.

Multiple integrations, shared parent records:
This is where the write serializer pattern earns its keep. The per-integration workarounds can't coordinate across processes.

Skewed data (10K+ children on a single Account):
Consider redistributing child records across multiple "parking" Accounts as Salesforce recommends. Then apply the appropriate pattern above for your remaining contention.

TL;DR

The UNABLE_TO_LOCK_ROW error comes from Salesforce's pessimistic row locking with a non-configurable 10-second timeout. Bulk API's default parallel processing makes collisions inevitable when batches share parent records.

Four workarounds exist (serial mode, parent sorting, FOR UPDATE, retry logic), but each optimizes a single integration in isolation. When multiple integrations touch overlapping records simultaneously, you need an external write serializer — a queue that coordinates writes per parent entity across all sources.

The pattern works. It has trade-offs. Choose the approach that matches your actual contention level.

If you've dealt with this error in a way I haven't covered, I'd genuinely like to hear about it — drop a comment below.

I built AsyncGuard to solve exactly this class of problem — silent failures in enterprise async processing. If you're tired of debugging partial data at 3am, it might be worth a look.

References:

DEV Community

UNABLE_TO_LOCK_ROW: unable to obtain exclusive access to this record — Why Your Bulk API Jobs Fail and How to Fix It

What Actually Happens: The 10-Second Death Window

The Parent Record Trap

Data Skew Makes It Worse

The 4 Workarounds (And Their Honest Trade-offs)

1. Switch to Serial Mode

2. Sort Records by Parent ID Before Upload

3. Use `FOR UPDATE` in Apex

4. Retry with Exponential Backoff

Quick Comparison

When Workarounds Aren't Enough: The Multi-Source Problem

The Architectural Solution: External Write Serialization

How It Works

What It Actually Looks Like in Practice

Implementation with BullMQ (Node.js)

The Trade-offs (Because There Are Always Trade-offs)

When to Use Which Approach

TL;DR

Top comments (0)

What Actually Happens: The 10-Second Death Window

The Parent Record Trap

Data Skew Makes It Worse

The 4 Workarounds (And Their Honest Trade-offs)

1. Switch to Serial Mode

2. Sort Records by Parent ID Before Upload

3. Use FOR UPDATE in Apex

4. Retry with Exponential Backoff

Quick Comparison

When Workarounds Aren't Enough: The Multi-Source Problem

The Architectural Solution: External Write Serialization

How It Works

What It Actually Looks Like in Practice

Implementation with BullMQ (Node.js)

The Trade-offs (Because There Are Always Trade-offs)

When to Use Which Approach

TL;DR

3. Use `FOR UPDATE` in Apex