A browser automation queue can look healthy while it is already corrupting state.
One worker picks up a task. Another worker picks up a different task. Both resolve to the same account context. Both try to use the same browser Profile.
The scheduler thinks it is doing useful work.
The browser state tells a different story.
One run sees a logged-in session. Another run gets redirected to login. A cookie changes during execution. Local storage is updated halfway through a task. A proxy check passes, but the page looks like it belongs to another run.
This is not always a Playwright problem.
It is often a Profile ownership problem.
Once browser automation depends on real Profile state, the Profile cannot be treated like a stateless input folder. It needs a lock.
Where this problem appears
A basic queue usually routes work like this:
task -> any free worker
That is fine for stateless jobs.
It is not enough when a browser task depends on a specific account environment. For account-based automation, the routing model usually becomes:
task -> account_context -> profile_id -> worker
That solves the first problem: the task runs in the intended account context. I covered that routing layer in routing browser automation tasks by account context, where the main problem is choosing the right Profile before assigning a worker.
This article starts at the next failure point: once the right Profile is selected, how do you prevent two workers from using it at the same time?
task_101 -> profile_store_us_018
task_102 -> profile_store_us_018
If both tasks start, two workers can mutate the same browser state.
That state may include:
- cookies
- local storage
- IndexedDB
- cache
- extension state
- active tabs
- login session
- proxy binding
- task notes
- recovery checkpoints
A browser Profile is not just a launch argument. In a real automation workflow, it is part of the account context.
Browser-level locks are not a workflow model
Some teams assume Chromium or the operating system will protect the Profile directory.
Sometimes it will.
A second process may fail to open the same user data directory. You may see a browser launch error. You may get a locked directory error. You may get a vague crash.
But that is still too late.
At that point, the scheduler has already assigned the task. The worker has already tried to start. The queue may retry. Another worker may pick up the same task. The logs may say “browser failed to launch” instead of “Profile already in use.”
A useful Profile lock should be visible to the automation system before the browser launches.
The scheduler needs to know something like this:
{
"profile_id": "profile_store_us_018",
"locked_by": "worker_07",
"task_id": "task_101",
"lease_expires_at": "2026-06-17T15:30:00Z",
"status": "running"
}
Now the system can make a routing decision before touching browser state.
A Profile lock should answer three questions
A lock that only says “busy” is not enough.
It should answer three questions.
First: who owns this Profile right now?
profile_id -> worker_id + task_id
Second: how long is that ownership valid?
lease_expires_at -> when the lock should be considered stale
Third: what should happen if the worker disappears?
stale lock -> recover, inspect, retry, or require human review
Without those answers, the lock becomes another hidden failure mode.
A simple Profile lock model
The simplest model is one active lock per Profile.
profile_id: unique
task_id: current task
worker_id: current worker
lease_expires_at: timestamp
heartbeat_at: timestamp
status: running | releasing | stale | failed
The scheduler should acquire the lock before the worker opens the browser.
A basic flow looks like this:
1. Receive task
2. Resolve account context
3. Resolve profile_id
4. Try to acquire Profile lock
5. If acquired, start browser task
6. Send heartbeat during execution
7. Save evidence before release
8. Release lock after task finishes
A Redis-style lock may look like this:
SET profile_lock:profile_store_us_018 task_101 NX PX 900000
That gives the worker a lease.
If the worker crashes, the lock eventually expires.
But expiration alone is not enough. You still need an execution record explaining what happened before the lock disappeared.
Add ownership checks before launching Playwright
In Playwright, the risky part usually begins when you launch a persistent browser context or connect automation to an existing Profile.
The lock should happen before that.
A simplified example:
async function runTask(task) {
const profileId = resolveProfileId(task.accountContext);
const lock = await acquireProfileLock({
profileId,
taskId: task.id,
workerId: process.env.WORKER_ID,
leaseMs: 15 * 60 * 1000
});
if (!lock.acquired) {
await rescheduleTask(task.id, {
reason: "profile_lock_conflict",
profileId,
currentOwner: lock.currentOwner
});
return;
}
try {
const context = await chromium.launchPersistentContext(
profilePath(profileId),
{
headless: false
}
);
await executeBrowserTask(context, task);
await saveTaskResult(task.id, "completed");
} catch (error) {
await saveFailureEvidence({
taskId: task.id,
profileId,
error
});
throw error;
} finally {
await releaseProfileLock({
profileId,
taskId: task.id
});
}
}
The exact storage layer does not matter as much as the rule:
Do not open the Profile until ownership is confirmed.
Do not release the lock before saving evidence
A common mistake is releasing the Profile lock immediately after a failure.
That makes the queue look healthy, but it can destroy the debugging path.
The safer order is:
1. Task fails
2. Capture current URL
3. Capture screenshot
4. Save console or network summary if available
5. Save profile_id, proxy_id, task_id, and worker_id
6. Mark final task status
7. Release Profile lock
The lock protects the Profile during execution.
The evidence protects the team after execution.
If the lock is released before evidence is saved, the next worker may open the same Profile and change the state you needed to inspect.
Worker concurrency should be controlled per Profile
Many automation systems only control global concurrency:
max_workers = 10
That limits total execution volume.
It does not prevent two workers from touching the same Profile.
For account-based browser automation, concurrency should also exist at the Profile level:
profile_store_us_018 -> max_active_runs = 1
profile_store_de_042 -> max_active_runs = 1
profile_test_guest_001 -> max_active_runs = 3
Not every Profile needs the same rule.
A logged-in business account should usually have one active owner.
A disposable test Profile may allow more parallelism.
A read-only monitoring Profile may have different rules from a Profile that performs state-changing actions.
The rule should be based on what the task can mutate.
When a Profile may not need an exclusive lock
An exclusive Profile lock is not always required.
The important question is whether the task can mutate shared browser state.
A Profile may not need exclusive ownership when:
- it is a short-lived temporary Profile
- it has no shared login state
- it does not reuse cookies or local storage
- it is used only for isolated test runs
- the task is fully read-only
- the Profile is intentionally disposable after execution
For example, a worker that creates a new temporary browser context for every run may not need a Profile-level lock. There is no long-lived account state to protect.
A read-only monitoring task may also allow limited concurrency if it does not modify cookies, local storage, account settings, or active workflow state.
But the moment a Profile represents a real account environment, the default should change.
If the task can alter login state, account settings, session data, proxy expectations, or recovery evidence, exclusive ownership is safer.
A practical rule is:
shared Profile state -> lock it
temporary isolated state -> concurrency may be allowed
Lock conflicts are workflow signals
A Profile lock conflict is not just a queue delay.
It may mean:
- two tasks were routed to the same account context
- a previous worker crashed
- a task took longer than expected
- the Profile is being used manually
- the task priority model is unclear
- the Profile should be split into separate workflows
That is why lock conflicts should be logged as workflow events.
Example:
{
"event": "profile_lock_conflict",
"profile_id": "profile_store_us_018",
"blocked_task_id": "task_102",
"current_task_id": "task_101",
"current_worker_id": "worker_07",
"decision": "rescheduled",
"next_check_seconds": 300
}
This gives the team a useful signal.
The system may need better task sequencing, not just more workers.
Add Profile behavior to the task manifest
The worker should not infer Profile rules at runtime.
A task manifest can make the expected behavior explicit:
{
"task_id": "task_101",
"account_id": "store_us_018",
"profile_id": "profile_store_us_018",
"proxy_id": "proxy_us_dallas_01",
"requires_exclusive_profile_lock": true,
"allowed_actions": ["inspect_login", "read_dashboard"],
"lock_timeout_seconds": 900,
"human_review_on_lock_conflict": true
}
This makes the task easier to reason about.
If another task already owns the Profile, the scheduler should not blindly retry. It should decide whether to wait, reschedule, skip, or send the task to review.
Checklist before scaling workers
Before increasing worker count, check whether your Profile model can answer these questions:
- Can every task resolve to one intended Profile?
- Can the scheduler see whether that Profile is already owned?
- Does the lock include task ID and worker ID?
- Does the lock have a lease and heartbeat?
- Is stale lock recovery handled explicitly?
- Is evidence saved before lock release?
- Are lock conflicts visible in task logs?
- Can some Profiles allow concurrency while others require exclusivity?
- Can manual use of a Profile be detected or recorded?
- Can the team tell whether a failure came from code, Profile state, proxy mapping, or worker contention?
If the answer is no, adding more workers may only make state corruption faster.
The practical rule
Account-aware routing answers one question:
Where should this task run?
Profile locking answers another:
Is it safe to run there right now?
A reliable browser automation system needs both.
routing decides the intended environment
locking protects the environment during execution
For small scripts, this may feel like overengineering.
For teams running browser tasks against real account contexts, it becomes basic infrastructure.
This is where browser automation starts to look less like isolated scripting and more like environment operations. If your team is already debugging Profile state, proxy mapping, cookies, and task logs together, tools such as Web4 Browser can help manage those pieces inside a shared browser environment workspace instead of scattering the logic across queue code, browser scripts, and manual notes.
Top comments (0)