Leapcell

Posted on Mar 27

Concurrency in Go vs Rust/C++: Goroutines vs Coroutines

#go #rust #gin #cpp

Leapcell: The Best of Serverless Web Hosting

In-depth Analysis of Coroutines in Golang, Rust, and C++

Nowadays, coroutines have become an important part of modern programming languages and are widely used in scenarios such as asynchronous programming and concurrent control. Many mainstream programming languages, such as goroutines in Golang and async/await in JavaScript, provide support for coroutines. Although the names and implementation methods of coroutines vary among different languages, essentially, coroutines are mainly divided into two categories: stackful coroutines and stackless coroutines. The former is represented by goroutines, and the latter is typified by async/await.

1. The Differences between Stackful and Stackless Coroutines

The terms "stackful" and "stackless" mentioned here do not mean whether a stack space is required when the coroutine is running. In fact, in most programming languages, function calls inevitably involve the call stack. The core difference lies in whether a coroutine can be suspended in any nested function (such as a sub-function, anonymous function, etc.). Stackful coroutines have this ability, while stackless coroutines do not. To deeply understand this difference, we need to start with the working mechanism of the function call stack.

1.1 The Working Mechanism of the Function Call Stack

This article's discussion is based on the x86 platform and takes a 32-bit system as the object. In the x86 platform, the address of the call stack grows from a high address to a low address. The call stack is a contiguous address space, and both the caller and the callee are located within it. The address space occupied by each function in the call stack is called a "stack frame", and the entire call stack is composed of multiple stack frames. The following is a typical call stack model, sourced from Wikipedia:

Through the Compiler Explorer, it is convenient to convert C code into assembly code to understand the underlying execution process. The following is the AT&T syntax assembly code generated by x86_64 gcc 9.3 with the compilation parameter -m32:

int callee() { 
    int x = 0; 
    return x;  
}

int caller() { 
    callee(); 
    return 0; 
}

The corresponding assembly code is as follows:

callee:
    pushl %ebp
    movl  %esp, %ebp
    subl  $16, %esp
    movl  $0, -4(%ebp)
    movl -4(%ebp), %eax
    leave
    ret

caller:
    pushl %ebp
    movl  %esp, %ebp
    call  callee
    movl  $0, %eax
    popl  %ebp
    ret

When caller calls callee, the execution steps are as follows:

Push the instruction address stored in eip (that is, the return address of caller, the address of the movl $0, %eax instruction in caller) onto the stack for preservation.
Jump to callee.
Push the bottom address of the caller's stack frame onto the stack for preservation.
Use the top address of the call stack at this time as the bottom address of the callee's stack frame.
Extend the top of the call stack by 16 bytes as the stack frame space of callee. Since the call stack address in the x86 platform grows from a high address to a low address, the subl instruction is used.

When callee returns to caller, the execution steps are as follows:

Align the top of the call stack with the bottom of the callee stack frame and release the callee stack frame space.
Pop the previously saved bottom address of the caller's stack frame from the stack and assign it to ebp.
Pop the previously saved return address of caller from the stack and assign it to eip, that is, the address of the movl $0, %eax instruction of caller.
caller returns from callee and continues to execute the subsequent instructions.

The actual operation process of the call stack is more complex. In order to simplify the discussion in this article, details such as function parameter passing are ignored.

2. The Implementation and Principle of Stackful Coroutines (Goroutine)

The key to implementing coroutines lies in saving, restoring, and switching the context. Since functions run on the call stack, it is natural to think that saving the context means saving the values in the continuous stack frames of the function and its nested functions, as well as the values of the current registers; restoring the context means rewriting these values back into the corresponding stack frames and registers; switching the context means saving the context of the currently running function and restoring the context of the next function to be run. Stackful coroutines are exactly implemented based on this idea.

2.1 The Implementation of Stackful Coroutines

To implement stackful coroutines, first, a piece of memory space needs to be allocated to store the context. One can choose to copy the context to this piece of memory or directly use this piece of memory as the stack frame space when the coroutine is running, in order to avoid the performance loss caused by copying. However, it should be noted that the size of the memory space needs to be allocated reasonably. If it is too small, it may cause a stack overflow when the coroutine is running, and if it is too large, it will waste memory.

At the same time, the values of the registers also need to be saved. In the function call stack, according to the convention, registers such as eax, ecx, and edx are saved by the caller, while registers such as ebx, edi, and esi are saved by the callee. For the called coroutine, it is necessary to save the register values related to the callee, the values of ebp and esp related to the call stack, and the return address stored in eip.

// *(ctx + CTX_SIZE - 1) stores the return address
// *(ctx + CTX_SIZE - 2) stores ebx
// *(ctx + CTX_SIZE - 3) stores edi
// *(ctx + CTX_SIZE - 4) stores esi
// *(ctx + CTX_SIZE - 5) stores ebp
// *(ctx + CTX_SIZE - 6) stores esp

// Note that the stack growth direction of x86 is from high address to low address, so the addressing is an offset downward
char **init_ctx(char *func) {
    size_t size = sizeof(char *) * CTX_SIZE;
    char **ctx = malloc(size);
    memset(ctx, 0, size);

    *(ctx + CTX_SIZE - 1) = (char *) func;

    *(ctx + CTX_SIZE - 6) = (char *) (ctx + CTX_SIZE - 7);
    return ctx + CTX_SIZE;
}

In order to save and restore the values of the registers, assembly code needs to be written. Assuming that the memory address storing the context has been assigned to eax, the saving logic is as follows:

movl %ebx,  -8(%eax)
movl %edi, -12(%eax)
movl %esi, -16(%eax)
movl %ebp, -20(%eax)
movl %esp, -24(%eax)
movl (%esp), %ecx
movl %ecx, -4(%eax)

The restoration logic is as follows:

movl  -8(%eax), %ebx
movl -12(%eax), %edi
movl -16(%eax), %esi
movl -20(%eax), %ebp
movl -24(%eax), %esp
movl -4(%eax), %ecx
movl %ecx, (%esp)

Based on the above assembly code, the void swap_ctx(char **current, char **next) function can be constructed. By passing in the context constructed by char **init_ctx(char *func), the context switch can be realized. For convenience of use, the swap_ctx() function can also be encapsulated into the yield() function to implement the function scheduling logic. The following is a complete example:

char **MAIN_CTX;
char **NEST_CTX;
char **FUNC_CTX_1;
char **FUNC_CTX_2;

void nest_yield() {
    yield();
}

void nest() {
    int tag = rand() % 100;
    for (int i = 0; i < 3; i++) {
        printf("nest, tag: %d, index: %d\n", tag, i);
        nest_yield();
    }
}

void func() {
    int tag = rand() % 100;
    for (int i = 0; i < 3; i++) {
        printf("func, tag: %d, index: %d\n", tag, i);
        yield();
    }
}

int main() {
    MAIN_CTX = init_ctx((char *) main);
    NEST_CTX = init_ctx((char *) nest);
    FUNC_CTX_1 = init_ctx((char *) func);
    FUNC_CTX_2 = init_ctx((char *) func);

    int tag = rand() % 100;
    for (int i = 0; i < 3; i++) {
        printf("main, tag: %d, index: %d\n", tag, i);
        yield();
    }

    free(MAIN_CTX - CTX_SIZE);
    free(NEST_CTX - CTX_SIZE);
    free(FUNC_CTX_1 - CTX_SIZE);
    free(FUNC_CTX_2 - CTX_SIZE);
    return 0;
}

The complete code can be obtained through the link. Compile it using gcc -m32 stackful.c stackful.s and run ./a.out. The running results show that the nest() function can indeed be suspended in any of its nested functions, and the same function runs on different stack frame spaces when called multiple times.

3. The Implementation and Principle of Stackless Coroutines

Unlike stackful coroutines, which directly switch stack frames, stackless coroutines implement context switching in a way similar to a generator without changing the function call stack.

Since stackless coroutines do not change the function call stack, it is almost impossible to suspend the coroutine in any nested function. However, precisely because there is no need to switch stack frames, stackless coroutines usually have higher performance than stackful coroutines. In addition, as can be seen from the coroutine.h in the above article, the author encapsulates all the variables of the coroutine into a structure through C language macros and allocates memory space for this structure, thus avoiding memory waste, which is difficult for stackful coroutines to achieve.

4. Coroutines in Rust and C++ (Stackless Coroutines)

4.1 Coroutines in Rust

Rust supports asynchronous programming through the async and await keywords, which are essentially stackless coroutines. Rust's asynchronous runtime (such as Tokio) is responsible for scheduling and managing these coroutines. For example:

async fn fetch_data() -> Result<String, reqwest::Error> {
    let client = reqwest::Client::new();
    let response = client.get("https://example.com").send().await?;
    response.text().await
}

In Rust, an async function returns an object that implements the Future trait, and the await keyword is used to pause the current coroutine and wait for the Future to complete.

4.2 Coroutines in C++

C++20 introduced support for coroutines and implemented coroutine functions through keywords such as co_await, co_yield, and co_return. C++'s coroutine model is more flexible and can implement stackful or stackless coroutines as needed. For example:

#include <iostream>
#include <experimental/coroutine>

struct task {
    struct promise_type {
        task get_return_object() { return task{this}; }
        auto initial_suspend() { return std::experimental::suspend_always{}; }
        auto final_suspend() noexcept { return std::experimental::suspend_always{}; }
        void return_void() {}
        void unhandled_exception() {}
    };
    task(promise_type* p) : coro(std::experimental::coroutine_handle<promise_type>::from_promise(*p)) {}
    ~task() { coro.destroy(); }
    void resume() { coro.resume(); }
    bool done() { return coro.done(); }

private:
    std::experimental::coroutine_handle<promise_type> coro;
};

task async_function() {
    std::cout << "Start" << std::endl;
    co_await std::experimental::suspend_always{};
    std::cout << "Resume" << std::endl;
}

5. Conclusion

Through an in-depth analysis of stackful and stackless coroutines, we have a clearer understanding of their underlying implementations. Although stackful and stackless coroutines are named according to the context storage mechanism, their essential difference lies in whether they can be suspended in any nested function. This difference determines that stackful coroutines have higher freedom when suspended and are more convenient in terms of being compatible with existing synchronous code; while stackless coroutines, although limited in the freedom of suspension, have higher performance and better memory management capabilities. In practical applications, the appropriate type of coroutine should be selected according to specific needs.

Leapcell: The Best of Serverless Web Hosting

Finally, I would like to recommend a platform Leapcell that is most suitable for deploying Go/Rust services.

🚀 Build with Your Favorite Language

Develop effortlessly in JavaScript, Python, Go, or Rust.

🌍 Deploy Unlimited Projects for Free

Only pay for what you use—no requests, no charges.

⚡ Pay-as-You-Go, No Hidden Costs

No idle fees, just seamless scalability.

📖 Explore Our Documentation

🔹 Follow us on Twitter: @LeapcellHQ

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community