Why Your SFTP Transfer Is Stuck at 2 MB/s (and the Fix Is a Protocol from 1983)

#opensource #rust #ssh #tooling

Two minutes to copy a 274 MB file to a VM running on localhost. Not over the internet. Not to a cloud instance across the country. Localhost. The same machine, loopback, zero network latency.

That was the experience a user reported in issue #290 on cubic, a lightweight CLI for managing QEMU/KVM virtual machines. The maintainer reproduced it, traced the problem to the upstream russh-sftp crate, and posted a comment asking if anyone had ideas about where the bottleneck was. I did. The answer turned out to be a protocol design decision that limits every Rust project using this crate to about 2 MB/s on file transfers, regardless of how fast the link is.

The fix was to stop using SFTP entirely and fall back to a simpler, older protocol.

What Is cubic?

cubic is a CLI tool for creating and managing lightweight virtual machines on Linux and macOS. Think of it as the middle ground between running Docker containers and spinning up full VMs in libvirt. You run cubic create myvm --image debian and get a cloud-init provisioned VM with SSH access, a dedicated disk, and port forwarding. cubic ssh myvm drops you into a shell. cubic scp file.tar.gz myvm:~/ copies files in. It's about 7,000 lines of Rust, built on QEMU/KVM with cloud-init for provisioning.

Under 40 stars. The maintainer (rogkne) commits daily and reviews external PRs within hours.

The Snapshot


Project	cubic
Stars	~37 at time of writing
Maintainer	Solo developer, committing daily
Code health	~7,000 lines of clean Rust, 104 unit tests, clap + thiserror + tokio
Docs	Good README, CONTRIBUTING.md with conventional commit rules
Contributor UX	Fast reviews, specific feedback, merged shell completions PR in multi-round review
Worth using	Yes, if you want lightweight VMs without libvirt's complexity

Under the Hood

cubic has a clean layered architecture. CLI commands live in src/commands/ (one file per subcommand, clap with derive macros). Business logic lives in src/actions/. The instance model, serialization (TOML and YAML), and storage live in src/instance/. Image fetching and distro definitions live in src/image/. SSH and file transfer live in src/ssh_cmd/.

The dependency list is lean. Four crates handle the heavy lifting: russh for SSH connections, russh-sftp for file transfers, clap for CLI parsing, and reqwest for image downloads. Everything else is standard library or small utility crates. The Cargo.toml is not trying to be clever.

One pattern that caught my eye: the project is async internally (tokio, russh) but sync at the CLI boundary. An AsyncCaller struct wraps a tokio multi-threaded runtime and exposes a call() method that blocks on a future. Every command creates one, runs its async work through it, and returns a sync result. It's simple and it works. No async bleeding into the CLI layer.

The image pipeline is solid. cubic fetches cloud images from distro mirrors, verifies SHA-256/SHA-512 checksums against the upstream checksum file, shows a progress bar during download, and caches images locally. Adding a new distro means adding one entry to the DISTROS static in image_factory.rs. Rocky Linux was added in a recent PR following this exact pattern.

The rough edges are in the SSH layer. The SFTP implementation delegates to russh-sftp, which turned out to be the source of the performance bug. The progress bar during file transfers is coupled to the async read wrapper (AsyncTransferView), which works but makes it hard to swap the underlying transfer mechanism without touching the view layer. The test coverage is good for models and serialization but thin for the SSH and QEMU interaction code, which is typical for tools that depend on external services.

The Contribution

The performance issue (#290) reported that cubic scp transferred files at roughly 2 MB/s on loopback. I dug into the russh-sftp internals to find out why.

The answer is in how russh-sftp implements AsyncWrite. Every call to poll_write() creates a one-shot channel, sends an SFTP write request, and blocks until the server responds with an acknowledgment. One write in flight at a time. No pipelining. The SFTP protocol (RFC 4253) explicitly supports pipelining: clients can send many write requests with different IDs and collect the responses asynchronously. OpenSSH's sftp client does exactly this with 64 outstanding requests by default. russh-sftp doesn't. The upstream issue (#70) has been open since June 2025 with no fix.

For a 274 MB file at the default 255 KB max write size, that's roughly 1,075 round-trips, each waiting for an ACK. Even on loopback, the per-request overhead adds up to minutes.

Wrapping the writer in a BufWriter wouldn't help. It coalesces small writes into larger ones, but each poll_write() still blocks on the ACK. You'd go from many small round-trips to fewer large ones, but the bottleneck is the same.

The fix was to bypass SFTP for single-file transfers and use SCP instead. SCP is a much simpler protocol: open an SSH exec channel with scp -t <path>, send a one-line header (C0644 <size> <filename>\n), stream the raw bytes, send a null byte, done. No request IDs, no per-packet ACKs during data transfer. Just a header and a byte stream.

I added a new scp.rs module (~170 lines) that implements SCP upload and download over a raw russh channel via channel.into_stream(). The async_copy function in russh.rs now detects single-file host-to-guest transfers and routes them through SCP. Directory copies and guest-to-guest transfers still use SFTP. Guest-to-host tries SCP first and falls back to SFTP if it fails (which it will for directories).

The review was thorough. The maintainer requested eight changes, all cleanups: use BufReader.read_line() instead of byte-by-byte loops, add error message prefixes, reuse the ack-reading function in the download path, validate the end-of-transfer marker byte. All reasonable, all addressed. He also asked (politely) whether the PR was AI-generated. I explained my workflow and he was satisfied. The PR went through two review rounds over 12 days and merged. PR #311.

The Verdict

cubic is for developers who want lightweight VMs without the weight of libvirt or the constraints of Docker. If you're testing deployment scripts, need an isolated Linux environment for a project, or just want to spin up a Debian box and SSH into it without thinking about Vagrant files, this does the job.

The project is young (v0.19.0, solo maintainer) but the trajectory is good. New distros get added regularly. The contributor experience is above average: specific review feedback, no ego, merged with thanks. The maintainer is clearly using this tool daily and fixing things as they surface.

What would push cubic to the next level? The SFTP performance fix helps, but the bigger opportunity is user experience. A cubic init that scaffolds a project config file, better error messages when QEMU isn't installed, and a Homebrew formula for macOS users would all lower the barrier. The foundation is clean. It just needs more people kicking the tires.

Go Look At This

If you manage VMs from the command line, try cubic. cubic create myvm --image debian and you're running in under a minute. If you've been burned by slow file transfers to VMs before, the SCP fix in PR #311 is worth a look for the protocol analysis alone.

Star the repo. The codebase is small enough to read in a sitting, and there are open issues at every difficulty level.

This is Review Bomb #8, a series where I find under-the-radar projects on GitHub, read the code, contribute something, and write it up. If you know a project that deserves more eyeballs, drop it in the comments.

This post was originally published at wshoffner.dev/blog. If you liked it, the Review Bomb series lives there too.