You're running a Bulk API job to update 90,000 Opportunities. It's 2am. The job finishes, and you see this:
UNABLE_TO_LOCK_ROW: unable to obtain exclusive access to this record
Not on one record. On thousands. Some records updated, others silently skipped. No retry. No recovery. Just a partial failure that nobody will notice until a VP asks why the pipeline numbers don't add up.
I've hit this error on three separate projects over the past five years. The first time, I wasted two days trying to "fix" it with retry logic before realizing I was solving the wrong problem. This article is what I wish someone had handed me back then.
If you search the Salesforce StackExchange, you'll find hundreds of threads on this error spanning over a decade. The core issue hasn't changed — and there's still no satisfying native solution.
What Actually Happens: The 10-Second Death Window
Salesforce uses pessimistic row locking on its multi-tenant database. When a transaction touches a record, it acquires an exclusive lock. Any other transaction trying to touch the same record has to wait.
The wait timeout? 10 seconds. You can't change it.
Transaction A: UPDATE Opportunity WHERE AccountId = '001xx...'
→ Acquires lock on Account record (parent rollup)
→ Processing... (takes 8 seconds)
Transaction B: UPDATE Opportunity WHERE AccountId = '001xx...'
→ Tries to lock same Account record
→ Waits...
→ 10 seconds pass
→ UNABLE_TO_LOCK_ROW ❌
Here's what tripped me up the first time: Bulk API processes batches in parallel by default. If two batches contain child records of the same parent Account, they will collide. This isn't a bug — it's how multi-tenant row locking works.
The Parent Record Trap
The record you're updating often isn't the one causing the lock. It's the parent record with rollup summary fields or triggers:
You update: Opportunity A (AccountId = 001xx)
Opportunity B (AccountId = 001xx) ← same parent
Salesforce also locks: Account 001xx (to recalculate rollup fields)
With 90,000 Opportunities spread across 5,000 Accounts, multiple batches will inevitably contain Opportunities from the same Account. The collision isn't a possibility — it's a certainty.
Data Skew Makes It Worse
Salesforce's own engineering team defines "data skew" as an Account with more than roughly 10,000 child records. Think enterprise accounts, partner records, or catch-all "house accounts." When multiple transactions hit a skewed Account, lock contention becomes catastrophic — every batch touching that Account's children will queue up behind the same lock.
The 4 Workarounds (And Their Honest Trade-offs)
I've used all four of these in production. None of them is a complete solution, but depending on your constraints, one might be good enough.
1. Switch to Serial Mode
The simplest fix. Force batches to process one at a time:
<!-- Bulk API v1: Job creation request -->
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
<operation>update</operation>
<object>Opportunity</object>
<concurrencyMode>Serial</concurrencyMode>
<contentType>CSV</contentType>
</jobInfo>
This eliminates lock contention entirely — batches never overlap. The downside is brutal: on one project, a 90,000-record job went from 3 minutes to over 40 minutes. If your integration runs on a schedule (say, syncing before market open), that kind of slowdown might not be acceptable. But if your job runs overnight and nobody cares about elapsed time, serial mode is the lowest-effort fix.
2. Sort Records by Parent ID Before Upload
Pre-sort your data so that all children of the same parent land in the same batch:
import pandas as pd
df = pd.read_csv('opportunities.csv')
df = df.sort_values('AccountId') # Group by parent
df.to_csv('opportunities_sorted.csv', index=False)
This reduces cross-batch collisions dramatically. In my experience, sorting alone cut lock errors by about 80% on a dataset with moderate Account distribution. But it doesn't eliminate them — batch boundaries (every 10,000 records by default) can still split a parent group. And if another process touches the same Accounts concurrently, sorting within your job won't help.
3. Use FOR UPDATE in Apex
If you control the Apex trigger or batch class that fires during the load, you can acquire locks explicitly:
Account[] accts = [
SELECT Id FROM Account
WHERE Id IN :accountIds
FOR UPDATE
];
// Now process — the lock is already held
This fails fast instead of timing out at 10 seconds. Useful when you want to catch contention early and handle it in code. The catch: FOR UPDATE is Apex-only. External integrations calling the REST or Bulk API can't use it. And acquiring locks early means holding them longer, which can paradoxically increase contention if your transaction does heavy processing.
4. Retry with Exponential Backoff
Retry only the failed records, with increasing delays:
import time
import random
def bulk_update_with_retry(sf, records, max_retries=3):
for attempt in range(max_retries):
result = sf.bulk.Opportunity.update(records)
failed = [r for r in result if not r['success']
and 'UNABLE_TO_LOCK_ROW' in str(r.get('errors', []))]
if not failed:
return result
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Lock contention on {len(failed)} records. "
f"Retrying in {wait:.1f}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
records = [r for r in records
if r['Id'] in {f['id'] for f in failed}]
raise Exception(f"{len(records)} records failed after {max_retries} retries")
This works for transient contention — the occasional overlapping batch. But under sustained load, retries can create a retry storm: multiple clients retrying simultaneously, all re-colliding. I once saw a retry loop that ran for 12 minutes before finally clearing, because three integrations kept stepping on each other's retries.
Quick Comparison
| Approach | Lock Errors | Throughput | Complexity | Multi-Integration? |
|---|---|---|---|---|
| Serial mode | 0 | Very slow | None | No |
| Sort by parent | ~80% reduction | Normal | Low | No |
| FOR UPDATE | Fails fast | Normal | Medium | No (Apex only) |
| Client retry | Eventually 0 | Degraded | Medium | No (retry storm risk) |
When Workarounds Aren't Enough: The Multi-Source Problem
Here's the scenario that broke my retry logic and forced me to rethink the whole approach:
09:00 — Marketing automation syncs 50K Contacts (Bulk API)
09:00 — ERP integration updates 20K Opportunities (Bulk API)
09:01 — Sales rep triggers Flow on Account (UI)
09:01 — CPQ recalculates 5K Quotes (Apex batch)
Four independent processes, all touching records that share parent Accounts. Even with sorting, serial mode, and retry logic in each individual integration, they still collide with each other. Each integration is optimized in isolation, but lock contention is a coordination problem — you can't solve it without coordinating.
The Architectural Solution: External Write Serialization
The pattern that finally worked for us is an external queue that serializes writes to the same logical entity:
┌─────────────────────┐
Marketing sync ──→ │ │
ERP integration ──→│ Write Serializer │──→ Salesforce
CPQ batch ────────→│ (external queue) │ (sequential writes
Manual flows ────→ │ │ per Account)
└─────────────────────┘
Key: All writes to Account 001xx go through
the same queue, in order, one at a time.
How It Works
- Intercept: All write operations route through the serializer instead of hitting Salesforce directly
- Group: Operations are grouped by their lock key (typically the parent Account ID)
- Sequence: Within each group, operations execute one at a time
- Parallelize: Different groups (different Accounts) still execute in parallel — throughput stays high
- Retry: If a write fails, only that group retries, with the queue maintaining order
What It Actually Looks Like in Practice
Before: 90,000 records → ~12,000 lock errors → 3am incident
After: 90,000 records → 0 lock errors → ~15% longer total processing time
The trade-off is latency. Serializing writes per Account adds queue wait time — in our case, roughly 15% overhead on total job duration. You're also operating a Redis-backed queue that itself needs monitoring and failure handling. But compared to debugging partial data failures at 3am, every team I've worked with considers that a good deal.
Implementation with BullMQ (Node.js)
Here's a simplified version using Redis-backed queues. This is close to what we run in production, minus the auth and monitoring layers:
// write-serializer.js
const { Queue, Worker } = require('bullmq');
const jsforce = require('jsforce');
const connection = { host: process.env.REDIS_HOST, port: 6379 };
const writeQueue = new Queue('sf-writes', { connection });
// Group writes by parent Account ID
async function enqueueWrite(operation, records) {
const groups = {};
for (const record of records) {
const key = record.AccountId || record.Id;
if (!groups[key]) groups[key] = [];
groups[key].push(record);
}
for (const [accountId, groupRecords] of Object.entries(groups)) {
await writeQueue.add(
`write-${accountId}`,
{ operation, records: groupRecords },
{
attempts: 3,
backoff: { type: 'exponential', delay: 2000 },
removeOnComplete: 100,
// BullMQ group concurrency: 1 job per Account at a time
group: { id: accountId, concurrency: 1 }
}
);
}
}
// Process writes
const worker = new Worker('sf-writes', async (job) => {
const conn = new jsforce.Connection({ /* ... */ });
const { operation, records } = job.data;
const result = await conn.sobject('Opportunity')
[operation](records, { allOrNone: false });
const failures = result.filter(r => !r.success);
if (failures.length > 0) {
throw new Error(`${failures.length} records failed: ${
JSON.stringify(failures[0].errors)
}`);
}
return { processed: records.length, failures: 0 };
}, { connection, concurrency: 20 });
What's missing from this example: OAuth token refresh with pre-emptive renewal, dynamic batch sizing based on object complexity, dead letter handling (Slack/PagerDuty alerts for records that fail after all retries), Prometheus metrics for queue depth and latency, and graceful shutdown that waits for in-flight operations. A production deployment probably needs all of these.
If building and maintaining this infrastructure in-house isn't practical for your team, AsyncGuard handles write serialization, dead letter management, and retry orchestration as a managed service — it's what I built after implementing this pattern one too many times.
The Trade-offs (Because There Are Always Trade-offs)
Let's be honest about what this pattern costs:
- Operational overhead: You're now running a Redis cluster (or managed equivalent) that needs monitoring, backups, and failover planning. If Redis goes down, all writes queue up — or worse, drop.
- Latency increase: Writes are no longer direct. Even in the best case, you're adding a queue hop. Under heavy load, the per-Account serialization can create hot queues for skewed Accounts.
- Migration effort: Every integration that writes to Salesforce needs to route through the serializer. This is a significant coordination effort, especially if you have integrations managed by different teams or vendors.
- Debugging complexity: When something fails, you're debugging across two systems (queue + Salesforce) instead of one.
For a single integration with occasional lock errors, this is probably overkill. Sort by parent ID and add retry logic — that's sufficient. The write serializer pattern pays for itself when you have three or more concurrent integrations hitting overlapping records.
When to Use Which Approach
Rather than a one-size-fits-all recommendation, here's how I think about it:
Single integration, low volume (<10K records):
Sort by parent ID. Maybe add retry. Done.
Single integration, high volume or time-sensitive:
Serial mode if you can afford the slowdown. Sort + retry if you can't.
Multiple integrations, shared parent records:
This is where the write serializer pattern earns its keep. The per-integration workarounds can't coordinate across processes.
Skewed data (10K+ children on a single Account):
Consider redistributing child records across multiple "parking" Accounts as Salesforce recommends. Then apply the appropriate pattern above for your remaining contention.
TL;DR
The UNABLE_TO_LOCK_ROW error comes from Salesforce's pessimistic row locking with a non-configurable 10-second timeout. Bulk API's default parallel processing makes collisions inevitable when batches share parent records.
Four workarounds exist (serial mode, parent sorting, FOR UPDATE, retry logic), but each optimizes a single integration in isolation. When multiple integrations touch overlapping records simultaneously, you need an external write serializer — a queue that coordinates writes per parent entity across all sources.
The pattern works. It has trade-offs. Choose the approach that matches your actual contention level.
If you've dealt with this error in a way I haven't covered, I'd genuinely like to hear about it — drop a comment below.
I built AsyncGuard to solve exactly this class of problem — silent failures in enterprise async processing. If you're tired of debugging partial data at 3am, it might be worth a look.
References:
Top comments (0)