DEV Community

CodeKing
CodeKing

Posted on

"My Two AI Tasks Kept Fighting for the Same Mouse"

The second my local AI assistant learned how to operate Windows apps, I ran into a new bug.

Not a model bug. Not a prompt bug.

A traffic bug.

One task was logging into a site through the desktop. Another task wanted to open a different app. Both requests were valid. Both were "in progress." And both were trying to use the same physical mouse, keyboard, and screen.

That is when an assistant stops feeling like software and starts feeling like two interns fighting over one laptop.

The first version knew there were multiple runs, but not one shared desktop

In CliGate, I already had the shape of a resident assistant: background runs, task records, channel conversations, runtime delegation, and desktop automation on Windows.

The problem was that "multiple tasks" and "multiple desktop tasks" are not the same thing.

Most work can run in parallel just fine. A coding task can edit files while another task checks the weather or summarizes a document. But desktop control is a physical boundary. There is only one active pointer, one focused window, one keyboard target.

If the system treats every run as equally parallel, desktop tasks do not feel concurrent. They feel destructive.

One can steal focus. Another can click into the wrong window. A third can decide there is already a similar run and cancel the wrong thing entirely.

The bug was confusing activity with resource ownership

The old mental model was too shallow:

conversation -> active runs -> decide whether to cancel one
Enter fullscreen mode Exit fullscreen mode

That can work for status displays. It is not enough for desktop scheduling.

What I actually needed was a separate truth the assistant could look at:

resource: desktop
holder: run X
waiters: run Y, run Z
Enter fullscreen mode Exit fullscreen mode

That changed the question from:

  • "Are there other active runs?"

into:

  • "Is the desktop currently held by another run, and if so should this task wait, answer from state, or be cancelled because the user explicitly said stop?"

That is a much more useful question.

The fix was resource-aware scheduling, not blanket cancellation

I ended up moving toward a simple rule set.

If two tasks do not need the same exclusive resource, let them run in parallel.

If they both need the desktop, serialize them.

If the user asks for status, answer from the existing run instead of touching the desktop again.

If the user says "stop," cancel the target run explicitly.

That sounds obvious in hindsight, but it changed the behavior a lot.

Instead of seeing another run and trying to kill it, the assistant can now treat the desktop like a queueable resource. One task holds it. The next desktop task waits. When the holder finishes, the next one starts automatically.

That is much closer to how a human assistant would behave. You would not cancel a login flow just because someone also asked you to open another app. You would say: "I'm in the middle of this one, I'll do the next desktop step right after."

It also cleaned up the UX around interruption

The nice side effect is that the assistant becomes easier to talk to during long tasks.

While a desktop job is running, I can still ask:

  • what happened to the last run?
  • how much is left?
  • can you also write a script in the repo?

Those should not all be treated as reasons to interrupt the desktop flow.

The assistant can answer status questions from run state, do non-desktop work in parallel, and only queue the things that truly need the same physical controls.

That made the product feel less like a brittle automation demo and more like an actual operator with limited hands.

The rule I am keeping

If an agent can touch the real desktop, it needs to understand the difference between:

  • parallel work
  • exclusive resources
  • explicit cancellation
  • simple status follow-ups

Without that split, concurrency just becomes another word for collision.

That is now part of how I am shaping CliGate, the local control plane I use for Claude Code, Codex CLI, Gemini CLI, channels, desktop automation, and a resident assistant layer on top.

The project is open source here: CliGate.

If you are building local agents, are you treating the desktop as just another tool call, or as a resource that needs scheduling?

Top comments (0)