DEV Community: Sergey Dobrov

40 Lines of Python to Fake a Serial Mouse

Sergey Dobrov — Tue, 10 Mar 2026 15:41:55 +0000

During the COVID lockdowns my son and I started playing The Settlers II in DOSBox.

One of the coolest features of the game is that two players can play on the same computer in split screen, each controlling their own cursor — a surprisingly social multiplayer mode for a 1996 strategy game.

The trick is that the second player uses a serial mouse.

Unfortunately modern operating systems don't really expose the concept of multiple independent mice anymore — they all get merged into a single pointer.

So if I wanted that old-school multiplayer experience back, I needed an adapter.

My first thought was simple: maybe I can fake a serial mouse?

In Unix systems everything is a file. If DOSBox expects a serial device, perhaps I can generate the right byte stream and feed it to it?

Reading through the Linux docs and trying to hexdump the mouse device I realized I need to convert from PS/2 mouse protocol into Microsoft serial mouse protocol. The descriptions of both I quickly found on the Internet: https://roborooter.com/post/serial-mice/.

Quickly noticing some key differences:

Both communicate via groups of three bytes;
The most significant bit of each byte is used for framing in the Microsoft protocol, so the high bits of the coordinates are packed into the first byte;
Both encode movement deltas, but PS/2 uses sign bits in the status byte while the Microsoft serial protocol splits the high bits of the coordinates into the first byte;
As a result PS/2 effectively has one extra bit of precision for each axis.

The first idea was to just write the packets into a pipe and connect DOSBox to it.

Unfortunately, DOSBox expects something that behaves like a real serial device, not just a pipe.

Eventually I landed on socat, which can create a pair of pseudo-terminals:

socat -d -d pty,raw,echo=0 pty,raw,echo=0

This creates two linked devices:

/dev/pts/24
/dev/pts/25

Whatever you write to one appears on the other.

Now DOSBox can connect to one side:

serial1=directserial realport:pts/25 rxdelay:0

And the script writes to the other.

The first version worked — but the mouse felt strange: movements were extremely smooth and continued even after I stopped moving the mouse.

Modern mice have extremely high DPI, which means the adapter was sending a huge number of tiny movements.

DOSBox replayed them with a delay.

The solution was simple: accumulate movement and send it in larger steps.

I deliberately didn’t try to make it neat or reusable — it only needed to run on my laptop:

import struct


def send(conn, x, y):
    (x,) = struct.pack('b', x)
    (y,) = struct.pack('b', y)
    #
    byte1 = 0b01000000
    byte1 += (x & 0b11000000) >> 6
    byte1 += (y & 0b11000000) >> 4
    #
    buff = [byte1]
    buff.append(x & 0b00111111)
    buff.append(y & 0b00111111)
    conn.write(bytes(buff))
    conn.flush()


BUNDLING = 1
SENSITIVITY = 1


with open('/dev/pts/24', 'wb', 0) as conn:
    acc_dx = acc_dy = 0
    with open('/dev/input/mouse1', 'rb', 0) as mouse:
        print("let's get it started!")
        while True:
            b1, dx, dy = mouse.read(3)
            if b1 & 0b00010000:
                dx += -256
            if b1 & 0b00100000:
                dy += -256
            acc_dx += dx
            acc_dy += dy

            dx = dy = 0
            if abs(acc_dx) >= BUNDLING:
                dx = int(acc_dx // SENSITIVITY)
                acc_dx -= dx * SENSITIVITY

            if abs(acc_dy) >= BUNDLING:
                dy = int(acc_dy // SENSITIVITY)
                acc_dy -= dy * SENSITIVITY

            if dx or dy:
                send(conn, dx, -dy)

As you can see the script is very simple:

Main loop reads PS/2 packets from /dev/input/mouse1;
Decodes movement deltas;
Accumulates them over time;
Applies BUNDLING to avoid overwhelming DOSBox with packets;
Applies SENSITIVITY so that our high-DPI mouse feels good in small resolution of the old game;
Every now and then the send routine builds MS Serial Mouse packets back from the accumulated movements and sends it to the pts devices so that socat mirrors it into the other pts which DOSBox will read.

Note that The Microsoft protocol has slightly lower resolution for movement deltas than PS/2, so very large movements would technically need to be split across multiple packets. In practice this isn't an issue because mice report movement in small increments.

The UNIX way

In the end the entire adapter was about 40 lines of Python, no third-party libraries — just the standard library and a bit of Unix plumbing.

Interestingly, modern DOSBox builds now support this feature natively: https://www.dosbox-staging.org/releases/release-notes/0.80.0/?utm_source=chatgpt.com#dual-mouse-gaming, so the little adapter is no longer necessary.

But solving the problem was half the fun — and it’s a nice illustration of how Unix-style abstractions let you insert a tiny translator between two pieces of software from completely different eras.

Even if one of them thinks it's talking to a serial mouse from 1995.

The Best Engineers Can Move Between Abstraction Layers

Sergey Dobrov — Wed, 18 Feb 2026 10:57:14 +0000

One pattern I’ve noticed over the years: the strongest engineers can move between abstraction layers almost effortlessly.

They write high-level code comfortably — but when something behaves strangely, they instinctively descend a few layers, reason about what’s happening underneath, and come back with a fix.

Here’s a simple thought experiment.

Ask someone why Python code using dense numeric arrays is much faster than similar logic written with regular Python objects.

A surface-level answer is:

“Because NumPy is written in C.”

A deeper answer mentions:

contiguous memory
cache locality
pointer indirection
object metadata overhead
how CPUs actually operate on memory

That explanation travels from Python syntax down to memory layout and hardware behavior — and back.

That ability to move across layers isn’t academic. It shows up in very ordinary work.

A Very Ordinary API Problem

Recently I was integrating with an external REST API that was painfully slow.

It returned paginated results and also reported the total number of matching records. The task was straightforward: fetch everything.

The original implementation used broad filters over the entire time range and fetched page by page with large page sizes.

At the HTTP layer, that approach makes sense:

Fewer requests
Larger pages
Simpler control flow
“Minimize round trips”

It feels efficient. But performance was terrible.

Instead of tweaking page size or adding concurrency, I asked a different question:

— Why is the API returning total count?

— That likely means a COUNT(*) somewhere. And pagination with LIMIT … OFFSET … usually means something else:

To serve page N, the database must scan and discard the previous N × page_size rows.

So with broad filters over a large time range, each successive page becomes more expensive than the previous one.

At the REST layer, everything looked clean.

Underneath, it was probably:

REST
→ controller
→ SQL with COUNT(*)
→ LIMIT/OFFSET
→ large scans and discarded rows

So instead of fetching everything broadly, I changed the access pattern.

Fetch day by day:

Smaller slices.
Better index selectivity.
Much smaller offsets.
Less discarded work.

Result: roughly a 10× speedup.

No access to their database.
No vendor escalation.
No architectural rewrite.

Just switching layers mentally, adjusting the shape of the query, and moving back up.

Why This Matters

Most of our daily work lives at high levels of abstraction: frameworks, APIs, cloud services.

And that’s good — that’s where productivity lives.

But performance issues, strange behavior, and cost explosions rarely originate at the same layer where they surface.

The engineers who consistently deliver under pressure are usually the ones who can descend a few layers, reason about what’s actually happening, and then come back with a pragmatic fix.

Not because they love low-level code.

But because they’re not confined to a single abstraction.

And this becomes even more important as more code is generated for us and more infrastructure is abstracted away. When AI writes the boilerplate and platforms hide the machinery, the differentiator isn’t how quickly you can produce code — it’s how well you understand what happens underneath it.

That ability to move across layers becomes rarer.

And that skill compounds.

Rediscovering Unix Pipelines: Two Backup Problems, One mindset

Sergey Dobrov — Mon, 08 Dec 2025 11:31:08 +0000

Modern engineers often reach for JSON parsing, temporary files, or orchestration tools. Unix pipelines still outperform them more often than you'd expect. Two recent backup tasks reminded me how often we over-engineer simple data-movement problems.

Two backup problems, same pattern: both reached for complex solutions involving temporary files, JSON parsing, and extra dependencies. Both could be solved with simple pipelines.

Problem 1: Pruning Old Backups

I needed to keep the latest seven backups in object storage and delete everything older.

First instinct: treat it like an application problem.

mc ls --recursive --json bucket/ \
  | jq -r '. | "\(.lastModified) \(.key)"' \
  | sort \
  | … # extract keys, build array, loop, delete

Parse JSON, build arrays, extract timestamps, loop through objects.

But backups already encode timestamps in filenames. mc find already prints one file per line. sort already orders strings. head already drops lines.

The whole task collapses:

mc find bucket/ --name "backup-*.gz" \
  | sort \
  | head -n -7 \
  | while read -r file; do mc rm "$file"; done

No JSON. No arrays. No parsing. Just text flowing through composable tools.

Problem 2: Creating Backups

A teammate needed to dump PostgreSQL, compress it, and upload to object storage.

His approach (roughly):

First, upload a script to the server:

scp backup.sh db-server:/tmp/

Then run it:

ssh db-server /tmp/backup.sh

The script itself:

#!/bin/bash
pg_dump mydb > /tmp/backup.sql
gzip /tmp/backup.sql
mc cp /tmp/backup.sql.gz s3://bucket/backup-$(date +%Y%m%d).sql.gz
rm /tmp/backup.sql.gz

This requires:

Uploading the script first
Installing mc on the database server
Managing temporary files
Cleanup logic
Disk space for both uncompressed and compressed data

Each of these steps adds friction, state, and failure modes.
But none of them are actually required:

ssh db-server "pg_dump mydb | gzip" \
  | mc pipe s3://bucket/backup-$(date +%Y%m%d).sql.gz

From two commands plus a bash script to one line. From five requirements to zero.

Bonus: Free Parallelism

That backup pipeline runs three processes simultaneously: pg_dump generating data, gzip compressing it, mc uploading it. No threading code. No coordination.

The temporary-file version? Strictly sequential. Each step waits for the previous to finish.

Pipes give you streaming parallelism by default.

Need more? xargs and GNU parallel scale across cores:

# Compress 100 files with 4 workers
find . -name "*.log" | xargs -P 4 -I {} gzip {}

Same pipeline thinking, multiplied across CPUs.

When NOT to Use Pipes

Pipelines aren't always the answer:

Complex state management - multiple passes over data, tracking relationships
Explicit error handling - pipelines can fail silently
Extreme scale - terabytes need Spark/Hadoop
Bash quirkiness - arcane quoting, clunky error handling
Framework requirements - ETL tools gain orchestration but lose composability
Team familiarity - if pipelines are cryptic to your team, write Python

Unix pipes work best for glue code: moving data between systems, transforming formats, filtering streams.

The Pattern to Look For

Whenever you find yourself:

Loading data into arrays
Writing temporary files
Parsing structured data just to transform it
Installing tools on systems just to move data

…there's probably a pipeline waiting to be discovered.

Not because pipelines are "better." Because they're simpler. Simple solutions are easier to debug, modify, and maintain.

Conclusion

Fifty years later, Unix pipelines still win through elimination.

They eliminate temporary state. They eliminate dependencies. They eliminate the complexity of treating data movement as a programming exercise.

Your turn: What's a problem you recently solved with a pipeline instead of code? Or where you wrote code when a pipeline would have worked?

You don't understand GIL

Sergey Dobrov — Thu, 27 Nov 2025 16:14:07 +0000

Some time ago I was chatting with a friend about programming languages, and the conversation drifted — inevitably — to why Python is “bad.”

The first argument was that Python has “no types,” which makes it error-prone.
I pushed back: Python’s modern type system is surprisingly expressive, and meanwhile Java manages to produce endless null-pointer exceptions despite all its ceremony. That point didn’t land well.

So the next argument came out:

“Anyway, Python can’t even use more than one CPU core because of the GIL.”

I didn’t try to debate it.
Instead, I opened a terminal, ran a data-processing script I had lying around, and showed them top.

One Python process was using exactly 300% CPU. They did not believe it.

And that moment captures something important: the GIL is one of the most confidently misunderstood ideas in all of programming.

Why the GIL Seems Simple but Isn’t

What makes this even trickier is that the GIL actually exists to make Python simple.
It lets most of the runtime behave like a friendly, high-level environment where you never have to think about memory management, object lifetimes, or thread safety inside the interpreter.

But concurrency is where that abstraction finally hits its boundary.
The GIL doesn’t behave the same way in every situation — it takes different forms under different workloads.

That’s why so many explanations are technically correct yet still misleading.

So let’s peel it back one layer at a time.

Layer 1: “Threads Are Useless in Python”

A common starting point is the idea that Python threads are “useless” because of the GIL.

If that were true, CPython wouldn’t bother with real OS threads — it could have simulated concurrency with simple user-level green threads.

But CPython uses real pthreads for a reason: threads matter in Python.

Layer 2: “Threads Only Help for I/O”

A slightly more sophisticated version of this is:

“Threads are only useful when something blocks on I/O. Otherwise the GIL stops everything.”

This sounds reasonable — but it’s still not right.

And we already saw that in the very beginning: our Python process was happily using 300% CPU, with no I/O involved at all.

So clearly something else is going on.

Layer 3: “Threads Help for I/O or Native Code — So Problem Solved?”

Once people realize threads aren’t limited to I/O, they usually expand the model:

“Okay, fine — if a thread is blocked on I/O or running C code, another thread can run. That’s the whole story.”

Closer, but still not exactly. Because here’s another catch:

Not all C libraries release the GIL. Some explicitly request it.

And when they do, the interpreter goes right back to single-threaded behavior, even though no Python code is executing.

This is why you can see one native operation scale beautifully across cores, while another operation — also written in C — completely freezes out every other thread in the process.

Layer 4: Why Some C Libraries Must Hold the GIL

So, the “I/O or C computation frees another thread” intuition is still incomplete.
It depends entirely on what the C library is doing under the hood... and whether it needs exclusive access to Python objects or interpreter state.

Every Python object — even something as small as an int or a bytes object — carries a reference count that tracks how many places are using it.
Incrementing and decrementing these counts must happen one at a time.
If two threads updated them independently, you’d get leaked objects, prematurely freed objects, or corrupted memory.

But refcounts are only the beginning.

Any C code that:

creates or destroys Python objects
mutates a Python object (appending to a list, updating a dict, modifying a set)
raises exceptions
calls back into Python code

…relies on parts of the interpreter that assume exclusive access.

And the only way to guarantee that is to hold the GIL.

This is why one C extension can run happily across multiple cores while another — also written in C — blocks every thread in the process.
It depends entirely on whether that extension needs to interact with Python objects or whether it can work on its own data structures without consulting the interpreter.

Layer 5: Why “Just Use Multiprocessing” Is an Oversimplification

When people get frustrated with the GIL, the next instinct is usually: “Okay, forget threads. Just use multiprocessing.”

On paper it sounds perfect — each process has its own GIL, so you get true parallelism, and sometimes that is the right approach.

But as a general rule, it’s still an oversimplification.

Multiprocessing has real costs:

Most data still has to be serialized to move between processes (Shared memory only works for a limited set of data types)
Even with copy-on-write on Unix, large Python objects often end up duplicated anyway — refcount updates alone are enough to break CoW
Starting a process has much higher overhead than starting a thread (a new interpreter instance, new memory space, new OS structures — not just a new stack)
Context switching between processes is heavier for the OS than between threads
Coordinating state is harder when it can’t be freely shared

So yes — multiprocessing sidesteps the GIL for CPU-bound Python bytecode.
But it brings fully separate runtimes, higher startup costs, heavier context switches, and more complicated data sharing.

How to Tell Whether a C Library Releases the GIL

At this point a natural question comes up:

“So how do I know whether a C library actually releases the GIL?”

Unfortunately, you can’t always tell from the outside — two libraries that look identical in Python can have completely different GIL behavior.
But there are a few practical ways to reason about it.

1. If a C extension touches Python objects, it must hold the GIL — but only for those parts
This doesn’t mean it needs the GIL for the entire function.
A well-written C extension typically does this:

Acquire GIL
Inspect or convert Python inputs (refcounts, type checks, copies, etc.)
Release GIL
Run the heavy native computation
Reacquire GIL
Build Python output objects
Return

So:

touching Python objects → requires the GIL
heavy inner computation → often does not

This is why you can see “C code + threads” giving real multi-core speedups despite Python objects being involved at the boundaries.

Some of the standard library’s own C extensions follow this pattern.
For example, zlib, bz2, and hashlib all parse Python arguments and allocate Python objects while holding the GIL, then drop the GIL around the inner compression or hashing loop, and reacquire it only to wrap up the result.

2. Pure native computation usually releases the GIL cleanly
Libraries like:

zlib / bz2 / lzma
hashing
crypto
NumPy (when you do array1 + array2, NumPy releases the GIL around the actual arithmetic loop. But when you do something like array.tolist(), it can't because it's creating Python objects)
many image codecs

operate on raw buffers or on their own internal data structures.
They typically wrap the expensive part with:

Py_BEGIN_ALLOW_THREADS
    /* heavy computation */
Py_END_ALLOW_THREADS

This is how you get things like “Python using 300% CPU” from a single process.

3. Some native libraries can’t release the GIL much — because their core work is Python interaction

Examples:

regex engines working on Python string objects (standard re module doesn't release GIL)
Python-level parsing loops (e.g. json)
libraries that mutate lists or dicts internally (e.g. py-radix)
things that allocate many intermediate Python objects (e.g. csv, json)

These must hold the GIL almost the whole time.

4. Docs might mention it — but inconsistently

Some libraries explicitly say “this releases the GIL,” but many don’t. (e.g. NumPy, SciPy)

If documentation does say it, you can trust it.
If it doesn’t—no conclusion can be drawn.

5. You can measure it
A simple test:

run the operation in many threads
watch CPU usage in top or htop

If you see a single core saturated → it's likely holding the GIL
If you see multiple cores fully used → it's releasing the GIL
If you see partial scaling → it's partial GIL release (very common)

With Big Power Comes Big Responsibility

Releasing the GIL — or running code that doesn’t use it — doesn’t magically make concurrency “safe.” It just gives you more freedom and more ways to shoot yourself in the foot.

If two threads start modifying the same NumPy array, or the same shared buffer, or the same Python object through a C extension, nothing protects you anymore. You can get classic race conditions, torn writes, and inconsistent data just like in any other multithreaded language.

The GIL wasn’t only a limitation — it was also a guardrail.
Once it’s out of the way, you have to be just as careful as you would be in C++ or Java.

Conclusion

The GIL has sharp edges and real limitations — but it also hides a lot of complexity and keeps Python usable without turning every piece of code into a concurrency puzzle.

Once you understand the layers behind it, the whole picture becomes much clearer:

threads aren’t “useless,”
I/O isn’t the whole story,
native code can run in parallel,
native code sometimes can’t,
and “just use multiprocessing” solves one problem while introducing several others.

In practice, Python can make excellent use of multiple cores — it just depends on what kind of work you’re doing and which libraries you’re using.

I’d love to hear your own experiences:

times when multiprocessing was total overkill,
or when a single Python process maxed out all your cores
or situations where threads surprised you (for better or worse)

Leave a comment, share a story, or correct me if I got something wrong — the whole point of this post is to make the conversation around the GIL more honest and less magical.

tl;dr

Your workload is:
├─ Pure Python computation → multiprocessing
├─ I/O bound (network, disk) → threading or asyncio
├─ Calling C libraries
│  ├─ Don't know if GIL-safe → measure with top/htop or check docs
│  ├─ Releases GIL → threading is fine
│  └─ Requests GIL → multiprocessing or asyncio
└─ Many small tasks → consider overhead cost