David Berry

Posted on Jun 2

Goroutines in Rust

#go #rust #goroutines #claude

Building go-lib: A Chronicle

How a faithful port of Go's M:N scheduler came to life in Rust.

The Idea

The Rust language doesn't pick a single concurrency model, instead, it provides primitives and ownership rules that make any model safe to implement. Unlike Java (virtual threads), Go (goroutines + channels) or Erlang (actor model), Rust ships without a runtime or a preferred style. The asynchronous working group and the async-std libraries were unsuccessful attempts to add concurrency to the language. The async-std library has been abandoned in favor of smol. The asynchronous working group has not met in years.

The standard library gives you:

std::thread — OS threads, nothing more
std::sync::{Mutex, RwLock, Condvar, Barrier} — shared-state primitives
std::sync::mpsc — a multi-producer/single-consumer channel

That's it. Everything else — async/await, actors, work-stealing executors, lock-free data structures — lives in the ecosystem (tokio, rayon, crossbeam, actix, etc.).

I needed a concurrency model for another project and started researching smol and tokio. In researching tokio I found that the tokio scheduler is based on the go scheduler. This prompted me to look for a goroutines crate, and ultimately writing my own implementation.

Why goroutines

Go has one of the most elegant concurrency models in any systems language: goroutines that start tiny and grow, channels that block without burning threads, a work-stealing scheduler that squeezes every CPU core. The question was whether that model could be brought to Rust — not via async/await, not by wrapping tokio, but by porting Go's actual runtime, source file by source file, into safe-ish Rust.

The answer became go-lib: a crate that lets you write:

go_lib::run(|| {
    let (tx, rx) = go_lib::chan::chan::<String>(0);
    for i in 0..5 {
        let tx = tx.clone();
        go!(move || tx.send(format!("hello from goroutine {i}")));
    }
    drop(tx);
    while let Some(msg) = rx.recv() { println!("{msg}"); }
});

No async, no executor, no tokio — just goroutines.

The Implementation

I chose Claude to be the primary implementor of go-lib. This complex project was a perfect candidate for Claude. I had not yet used Claude for developing a project end-to-end and wanted the experience. Spoiler alert, I had to do more work than I imagined.

The Plan

I like to iterate on a plan with Claude before starting large initiatives. I can get my thoughts written down and refine them into a solid working plan. I chose to use Opus for creating the plan and started with this prompt:

Please create a step by step plan to port goroutines from go to rust. It should utilize as many of the native rust constructs as possible, eg Channels.

Claude created a plan to expose

tokio::spawn(async move { … }), as go! macro

and implement channels using flume.

This was not what I wanted. Claude did not understand what I meant by "port" so I needed to refine the plan. As it turns out, the entire project was a series of me asking Claude to do something, and then making course corrections as we went along.

My second prompt:

update the steps to remove the use of tokio, and include porting the scheduler and other missing parts from the go repo: https://github.com/golang/go

Claude understood now. Claude created an acceptable plan:

Phase A — Scaffolding and low-level primitives

Crate layout.
Port the Gobuf / register save area (runtime/runtime2.go).
Implement gogo and mcall in src/runtime/asm_arm64.rs (and x86_64 sibling).
Stack allocator (runtime/stack.go → src/runtime/stack.rs).

Phase B — The G/M/P scheduler

Port G (src/runtime/g.rs) from runtime2.go.
Port M (src/runtime/m.rs).
Port P (src/runtime/p.rs) and its runqueue.
Port the scheduler core (src/runtime/sched.rs) from proc.go.
Bootstrap (src/runtime/sched.rs: schedinit/main).
Port sysmon (src/runtime/sysmon.rs).
Cooperative preemption only.

Phase C — Channels, synchronization

Port sudog (src/runtime/sudog.rs).
Port hchan (src/chan.rs) from chan.go.
Port select (src/select.rs) from select.go.
Port WaitGroup (src/sync/waitgroup.rs) from sync/waitgroup.go.
Port sync.Mutex (src/sync/mutex.rs) from sync/mutex.go.

Phase D — Timers, polish, ergonomics

Port timer heap (src/runtime/time.rs) from time.go.
Public API and macros (src/lib.rs, src/go_macro.rs).
Tests and examples.
Known-skipped, called out in docs. GC integration, stack growth/copying, async preemption via signals, netpoll, race detector, cgo, defer/recover, GOMAXPROCS at runtime. Each is a future phase.

One thing I noticed is that channels and mutexes were being ported over from go, when they already exist in rust. I questioned whether the rust thread and mutex can be reused. Mutexes could be reused, channels could not as they did not align with the channels in go. Claude updated the plan to use the rust mutex. Another thing to note is step 20, the things that Claude is leaving out for future phases. I chose to implement them right away, but that may not have been the best decision.

The Code

I switched to Sonnet and had Claude proceed with step 1. When it was done, I had it proceed with step 2. We did this for all 20 steps. In hindsight, it would have been good to test the working code after step 19. Testing after step 20 caused multiple levels of errors that were harder to track down. In fact, Claude had to disable some features from step 20 to be able to isolate problems from the first 19 steps.

Claude did a great job porting the code, (go, asm) -> (rust, asm). I certainly could not have done this in the same timeframe. Claude also included doc comments, doc tests, unit tests, and integrations tests in the port. When Claude got the project to the point where all tests passed, it said it was done. Little did I know the work was just beginning.

It took about 3 days to get to this point. This is mostly due to me running out of tokens and having to wait 3 hours to refresh my session. I also learned that it is better to prompt Claude on how to proceed after running out of tokens, rather than hitting "try again" on the error screen. Prompting

Continue where you left off, implementing xxx

works well.

Testing

Initially the tests were flaky. Sometimes they would pass and sometimes they would fail. You can waste a lot of time with Claude, and I did, having Claude try and find the errors causing flaky tests. It is always better having a test which causes the error consistently or at least at a higher percentage of time.

The best way to test a library is to use the library. Once I felt that go_lib was reasonably stable, I put it to use. The project that inspired me to find a better concurrency model is the one I used to test go-lib. I immediately found two features needed to make the library more usable: A runtime attribute; and scope.

The original api required that the contents of main() were wrapped in a go_lib::run(|| {}); block. I thought that using a runtime attribute is cleaner, so I had Claude add the #[go_lib::run] attribute. The second missing feature is scope. Migrating my test project using std::thread::scope to using just go! was complicated, so I had Claude add go_lib::scope to the library. Now using scoped go functions is as simple as

let data = vec![1_i64, 2, 3, 4, 5];

let sum = go_lib::scope(|s| {
    let h1 = s.go(|| data[..3].iter().sum::<i64>());
    let h2 = s.go(|| data[3..].iter().sum::<i64>());
    h1.join().unwrap() + h2.join().unwrap()
});

Once the features were implemented, I had everything I needed to start using go_lib. The first attempt at using go_lib exposed errors and hangs. My test project creates 10s of thousands of goroutines, and go_lib could not handle it. I tried having Claude use the test project to debug go_lib, but it was too confusing for Claude. It kept telling me that my test project was creating too many goroutines and that I should create work groups of smaller number of goroutines.

The solution was to create a test, many_goroutines, and set the workers to a large number. This would give Claude an in-project integration test that it can use to resolve the issues. This was the longest phase of this project, helping Claude find the most subtle hangs and panics. Claude would sometimes increase memory or stack sizes to make problems go away, and it would be necessary to reiterate some requirements such as, goroutines use a 2KB stack.

Humorously, Claude would tell me of "pre-existing" bugs, when it is investigating one bug and finds a different bug. If the bug was from earlier commits it would say that the bug was a "pre-existing bug not from my changes".

Documentation

Just a note on documentation. I had to prompt Claude to go back and check for stale documentation, comments, and README.md. I was okay with this, as I would do it once before tagging and pushing out a new version.

Where It Stands

go-lib v0.5.0 is a working, tested implementation of Go-style concurrency in Rust with no async machinery. The runtime passes CI across Ubuntu x86-64, macOS AArch64, and Windows x86-64 in both standard and loom model-check configurations.

Async preemption via SIGURG is fully operational. The full chain — sysmon fires, signal handler redirects RIP, trampoline saves RFLAGS + all GPRs + XMM registers, scheduler yields, trampoline restores, goroutine resumes — is correct, tested at 75_000 concurrent goroutines, and free of known crash modes.

The stack growth path works correctly for the common case (frames smaller than the initial stack size). The correct long-term fix — morestack-style compiler-generated stack checks at every function entry — remains future work; the current guard-page approach cannot intercept a single frame larger than the initial stack size.

Goroutines work. Channels work. Select works. Scope works. The scheduler steals work. Goroutines preempt. Stacks grow. RFLAGS round-trips correctly across a yield.

It runs.

DEV Community