DEV Community

Cover image for Concurrency is Not Parallelism — And Most Developers Conflate Them
Shrestha Pandey
Shrestha Pandey

Posted on

Concurrency is Not Parallelism — And Most Developers Conflate Them

There's a quote I keep returning to whenever this topic comes up.

Rob Pike, one of Go's creators, said it at Heroku's Waza conference back in 2012: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."

This one sentence, fourteen years ago. And I still see developers confusing the two every time.

I'm not saying that as a criticism — I've mixed them up before too. These two ideas are very close to each other, and in a lot of languages they even use the same tools. On top of that, documentation across the industry has been using the terms interchangeably for years. When languages like Go, JavaScript, Python, Java, and Rust all handle things a little differently and each uses slightly different wording, it’s pretty easy to see why people mix them up.

So here's my attempt at a proper technical explanation.

Start Here: The Actual Definitions

Concurrency is a structural thing. A program is said to be concurrent if it is written in such a way that many operations can be in progress at the same time. This does not mean that all operations will be in progress at exactly the same time. Instead, it means that the operations can take turns being in progress. This is what concurrency is all about.

On the other hand, parallelism is all about the actual execution of the tasks and implies that the tasks are being performed at the same time. In fact, parallelism is not possible with a single-core processor.

The difference between these two terms has been described in the literature of computer science as follows: "Concurrency means that two or more actions are in progress at the same time. Parallelism means that two or more actions are executed at the same moment."

The term in progress is of significant importance. It is quite obvious that a task can be in progress without being actively engaged in the execution of the program. For example, a task may be in progress while waiting for the result from the database, and another task may be in progress while engaged in the execution of the program. This is why it is possible for tasks to be concurrent without them actually executing concurrently.

Pike's full quote is worth reading once more:

"Concurrency is about structure, parallelism is about execution. Concurrency provides a way to structure a solution to solve a problem that may — but not necessarily — be parallelizable."

One thing that people commonly misunderstand is that writing concurrent code does not necessarily mean that it is parallel. You can write code that has many things in progress, but still run it on a single core CPU where only one thing is actually happening at any time. The tasks simply take turns making progress.

However, the reverse is also true. It is possible that you are dealing with parallelism without concurrency. For instance, you could divide one large calculation into many different CPU processes. The calculation would occur simultaneously. However, there would not be any processes interacting with other processes.

In other words, concurrency and parallelism are independent concepts. While concurrency and parallelism often occur together in practice, they are not the same thing.

Why Everyone Mixes These Up

First, threads blur the boundary. Concurrency and parallelism can be done using threads. It is possible that the same Thread object could represent processes that alternate on the same processor core, or processes that run concurrently on different processor cores. The programming language syntax for threads is the same in both cases, but the reality depends on the environment in which the code is executed. You can't tell which one you're getting just from the API.

Second, documentation is often imprecise. If you glance at many references dealing with programming languages, you will find that terms like concurrent execution are used when, in fact, the author means parallel execution, and vice versa. Again, this is not specific to any particular ecosystem.

Third, async and await have made the confusion even stronger. The terms async and await have gained popularity in languages such as JavaScript, Python, C#, Rust, Swift, and Kotlin. The terms have been used to define a pattern of code that can handle multiple tasks. The confusion arises from the fact that the syntax for the terms looks the same for both I/O-bound concurrency and for speeding up CPU-heavy code, which actually requires parallelism. The code can look identical, even though the runtime behavior is completely different.

The four actual combinations

It is helpful to think beyond the notion of concurrency and parallelism as simply true or false. Instead, there are four different combinations, and it is a good thing to know which of the combinations your system actually falls into.

Concurrent but not parallel: In this case, there are many different tasks that exist at the same time. However, they share the same single CPU core. This means that only one task is ever being executed at any particular time. However, the system will switch between the different tasks so that they can all make progress. A classic example of this is Node.js handling thousands of different HTTP connections on a single thread.

Parallel but not concurrent: In this case, there is only one task, but that task is split across many different CPU cores. This means that different parts of the task will be executing at the same time. A classic example of this is image rendering. This is one task that is simply split across many different parts.

Concurrent and parallel: There are a number of independent tasks to be executed, and the system is capable of executing several tasks concurrently. A good example is a Go language-based web server that distributes the goroutines to the different CPUs. The requests are defined as independent tasks that can be executed concurrently.

Neither concurrent nor parallel: This is simple sequential execution. One task runs to completion before the next begins. Many programs start this way, and some remain this way because it keeps the system simple and predictable.

Most web services will turn out to belong to either the concurrent only group or the concurrent and parallel group. It may make a big difference in your thinking if you know that your system actually belongs to one of these groups.

How different runtimes actually do this

These abstract concepts become clearer when you understand how actual runtime environments implement these concepts. Each ecosystem has its own design choices, hence concurrency can vary significantly depending on the programming language in use.

JavaScript / Node.js:

JavaScript is single threaded, meaning that the V8 engine runs the code on a single call stack. It runs the code one frame at a time. This is where the event loop comes in. This is what makes concurrency in JavaScript code possible.

This is what enables the JavaScript code to perform async operations. This is what the event loop is. It is what makes the JavaScript code perform async operations. For example, if an HTTP request is made, the operation is handed over to the system. This system is libuv. It uses the OS's async I/O. While the operation is being performed, the JavaScript code is still being executed on the call stack. Once the operation is done, the callback is queued, and the event loop runs the operation once the call stack is free.

Another thing that often catches developers out is the way that microtasks and macrotasks are handled. While microtasks, such as settled Promises and queueMicrotask, are fully completed before macrotasks, such as setTimeout, setInterval, and I/O callbacks are executed, that is simply the way the specification is written and is not an error. However, that is often not the way that the developer would have expected things to work, and that is something that needs to be understood if you are going to use JavaScript properly.

MDN describes the way that the event loop works in the following simple terms: "The event loop enables asynchronous programming in JavaScript while remaining single threaded.”

Because of the way that the event loop is designed, Node.js is concurrent rather than parallel. It can handle thousands of connections at once because most of those connections are simply waiting. The event loop is very good at handling thousands of waiting connections, but that does not mean that thousands of things are happening at once.

So, if you need parallelism, you need to use Web Workers in browsers or Worker Threads in Node.js. They provide separate execution contexts in separate OS threads. Workers cannot access shared DOM or shared object references. They can only do it by using structured cloning or SharedArrayBuffer with Atomics. Yes, you need to ask for parallelism in JavaScript.

This means that CPU-intensive code in the main thread blocks the event loop. It blocks everything else in Node.js from running as long as it is running. async/await does not change this. Worker Threads do.

// still blocking the main thread, async doesn't help here
const result = await heavyComputation();

// this runs on a separate OS thread
const { Worker } = require('worker_threads');
const worker = new Worker('./heavy-task.js');
worker.on('message', (result) => console.log(result));
Enter fullscreen mode Exit fullscreen mode

Go:

In the case of the Go language, concurrency is handled as if it were actually a part of the language. This is due to the fact that concurrency is handled via the basic concurrency construct called the Goroutine, which is essentially a lightweight function that can run concurrently. It is to be noted that goroutines are not threads. Instead, goroutines are handled via the Go runtime. This is done via the M:N Threading Model.

The advantage that is gained is that the memory requirements are much lower. This is due to the fact that the goroutine starts off with a stack size of only about 2KB, which can increase or decrease dynamically. In the case of threads, however, it is necessary to reserve about 1 to 2 MB of memory. This is the reason why it is possible to run hundreds of thousands of goroutines simultaneously without any issues. It would simply not be possible to run that many threads.

The scheduler in Go uses another algorithm, known as Work Stealing. Each processor, P, has its own queue of goroutines, known as the run queue. If one processor is out of work, it can steal work from another processor’s run queue. The scheduler is also preemptive in Go versions 1.14 and later. This means that the runtime can stop the execution of the goroutines at safe points, like function calls or loop back edges. This is done to prevent any one goroutine from consuming an OS thread indefinitely.

When a goroutine is blocked doing I/O or channel receive, it parks, and another goroutine is executed on the same OS thread. The code looks simple, like it is doing a simple sequence of operations, but it is not. The scheduling is going on in the background.

func fetchData(url string, ch chan string) {
    resp, err := http.Get(url) // goroutine parks here, OS thread stays free
    if err != nil {
        ch <- ""
        return
    }
    defer resp.Body.Close()
    body, _ := io.ReadAll(resp.Body)
    ch <- string(body)
}

func main() {
    ch := make(chan string, 3)
    go fetchData("https://api.example.com/a", ch)
    go fetchData("https://api.example.com/b", ch)
    go fetchData("https://api.example.com/c", ch)
    for i := 0; i < 3; i++ {
        fmt.Println(<-ch)
    }
}
Enter fullscreen mode Exit fullscreen mode

GOMAXPROCS controls how many OS threads can run Go code at the same time. Since Go 1.5 it defaults to the number of CPU cores. Goroutines are then distributed across those threads automatically.

The Go team also pushes a specific coordination philosophy: “Do not communicate by sharing memory; instead, share memory by communicating.” Data moves through channels instead of shared variables. The goal is fewer race conditions by design, not just by careful discipline.

Python:

Python’s history here is almost a cautionary story. Not because Python is bad, but because it shows what happens when one design decision, made early on, ends up influencing the ecosystem for decades and how hard it is to change that decision.

The GIL – Global Interpreter Lock is a mutex used in the CPython interpreter to ensure that only one thread executes Python code at a time. The number of CPU cores you have can be as many as you want. You cannot have multiple threads of execution running Python code at the same time in the CPython interpreter because of the GIL.

The reason for the GIL was that the Python interpreter used reference counting for memory management. If the Python interpreter didn’t have the GIL, multiple threads of execution could have potentially updated the reference count of a Python object at the same time, and the memory state of the Python interpreter would have been inconsistent. The GIL ensured that the Python interpreter was simple, stable, and easy to integrate with code written in another language – C. The cost of this simplicity was that Python threads were only ever really useful for I/O-bound concurrency. For CPU-bound concurrency, they didn’t really do anything to help, and they could even have a negative effect.

So, the ecosystem has developed workarounds. asyncio has become the de facto standard for I/O-bound concurrent code with an event loop, similar to Node.js. multiprocessing has become the standard for CPU-bound code with separate processes, each with their own interpreter and GIL.

However, all of this is changing with the advent of Python 3.13 in October 2024. PEP 703 has introduced an optional "free-threaded build" of CPython, which can run without the GIL. Python 3.14 in October 2025 has taken this further by including an optional thread-safe incremental garbage collector, which solves the latency issue in the 3.13 build.

Removing the GIL wasn’t simple. The approach uses biased reference counting. Each object tracks an owning thread. The owning thread can use fast non-atomic operations, while other threads must use slower atomic ones. This prevents the cache thrashing that would occur if all reference count operations had to be atomic.

CPU-bound multithreaded Python code can finally scale across cores in the free-threaded build, at times even approaching linear speedups. This is just not possible in the original CPython.

There is, however, a cost.

The free-threaded build will be slightly slower for single-threaded code. This is shown by the pyperformance benchmarks, where the free-threaded build is 1% slower on ARM (macOS aarch64) and up to 8% slower on x86-64 Linux compared to the normal GIL build.

The GIL can also quietly return. Note that if you import a C extension module that was not built with free-threading support and is missing the Py_mod_gil slot, the interpreter will in fact re-enable the GIL instead of crashing. Your code will still work, but it will be serialized again.

It also requires explicit installation. The default Python build still has the GIL. If you require free threading, you must use the 3.13t or 3.14t builds. Library support is still variable.

The long-term plan is to remove the GIL entirely. This is expected to occur in Python 3.20.

Java:

Java's original approach was simple: "one Java thread equals one operating system thread." It was solid, it worked, but it didn't scale well. So if your server has 10,000 simultaneous connections, that's 10,000 operating system threads and approximately 10-20 GB of pre-allocated stack memory just for the threads before doing any real work. The solution has been "reactive programming": non-blocking programming frameworks, chains of CompletableFuture, and reactive streams. They work, and they are the solution, although writing and debugging them can be pretty painful.

Virtual threads in Java 21 - JEP 444, September 2023 - are the solution. Virtual threads are lightweight threads that are managed by the JVM and are multiplexed onto a pool of platform threads. When a virtual thread is blocking due to I/O operations, the virtual thread yields the underlying platform thread, and the JVM immediately runs another virtual thread on that underlying platform thread. Normal sequential blocking code is written while the JVM does the work behind the scenes.

One detail worth knowing: virtual thread scheduling is cooperative by default. A virtual thread yields when it reaches a blocking operation. If you have a tight CPU loop, it will not yield until something blocks. Go’s preemptive scheduler differs from this in some ways that are important for CPU-bound code.

For CPU-bound parallelism, Java uses ForkJoinPool and StructuredTaskScope (preview in Java 21, evolving in later versions) for the distribution of computation over multiple CPUs.

I/O-bound vs CPU-bound: the actual decision

When you want to make a program faster, the real question is not “should I use threads?” The real question is what is actually slow.

If the task is I/O bound, the program spends most of its time waiting. It could be waiting for a network response, a database query, disk access, or an external API. During that time the CPU is often idle. In these cases concurrency helps because the program can work on other tasks while one task is waiting. This is why platforms like Node.js can handle many connections even with a single thread.

But if the task is CPU bound, the CPU is already busy doing calculations. Adding more async or concurrency will not make it faster. It mostly helps organize the code. What actually helps in this case is parallelism, where multiple CPU cores work on the problem at the same time.

What you're building Right tool Won't help
API handling many concurrent requests Concurrency — async, event loop, goroutines Blocking thread per request
Batch image or video processing Parallelism — Worker Threads, multiprocessing Sequential processing
Fanning out multiple DB queries Concurrency — Promise.all, asyncio.gather Querying one at a time
ML inference on CPU Parallelism — multiprocessing, native thread pools Python threads pre-3.13
Chat server with thousands of idle connections Concurrency One OS thread per connection
Video encoding Parallelism — frames are independent, distribute them Single-threaded encoding

A server with 10,000 concurrent connections does not need 10,000 cores. Most of the connections are just waiting for the server to respond. Concurrency is enough for this problem. A video encoder trying to encode 10,000 videos as fast as possible is a very different problem.

Same number of tasks on paper, but completely different requirements in practice.

What async/await actually does

This is probably where the most confusion rears its ugly head in real-world development.

async/await is just syntax for doing concurrent I/O-bound code but with the illusion of sequential code. When you await something, the task just stops, and other tasks are run. When the task is done, the original task continues. That is it.

What async/await does not do is give you more CPU cores or allow you to run code in parallel. If you await a CPU-heavy function, the function runs on the same thread and blocks everything else on that thread until the function is done.

# concurrent — total time ≈ the slowest single call
async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(
            fetch(session, "https://api.example.com/a"),
            fetch(session, "https://api.example.com/b"),
            fetch(session, "https://api.example.com/c"),
        )

# still blocking the event loop for the entire duration
async def slow():
    result = sum(i * i for i in range(10_000_000))
    return result
Enter fullscreen mode Exit fullscreen mode

However, simply making the function async and sprinkling await keywords throughout the code does not help CPU-bound code. It is simply an ornament. To handle CPU-bound code in Python, ProcessPoolExecutor is the way to go. The code will run in other processes, and in another process, there is no GIL. For the free-threaded build of Python 3.14, you can use ThreadPoolExecutor too. You will get true parallelism using many threads.

What’s changed recently that matters

A few things changed over the last couple of years worth being aware of.

Python has finally killed the GIL, at least optionally. Python 3.13, October 2024, includes a free-threaded interpreter with the GIL off. Python 3.14, October 2025, took it further. Free threading is now officially supported, not just something you test carefully before shipping. For the first time in the history of Python, threads are capable of running CPU-intensive code in parallel. The path to the GIL being removed as an option, as a default, is planned through Python 3.20.

Java 21 virtual threads make most of the point of writing reactive Java code for I/O-intensive services irrelevant. One thread per OS thread is no longer a scalability concern. Just write your sequential blocking code, and let the JVM schedule it. For web API services, they are almost I/O bound, so this is a big win.

Rust's async story is settled, and Tokio is the dominant runtime with a mature ecosystem. Rust's ownership model means data races are a compile-time error, not a runtime surprise. If your systems need to be both correct and fast, Rust is now a viable option.

Edge runtimes have fundamentally changed the concept of scaling. Cloudflare Workers, Deno Deploy, these are all isolated event loop instances running all over the world. You scale by deploying to more machines, not by deploying more threads to one machine. The mind shift of thinking of your server as a function running in 200 different locations at once is a big one.

References

  • Rob Pike — "Concurrency is Not Parallelism" — Heroku Waza Conference, January 2012. go.dev/talks/2012/waza.slide
  • Rob Pike — "Go Concurrency Patterns" — Google I/O, June 2012. go.dev/talks/2012/concurrency.slide
  • MDN Web Docs — "JavaScript execution model" — developer.mozilla.org, updated 2025
  • Node.js Documentation — "The Node.js Event Loop" — nodejs.org, 2026
  • Go Documentation — Goroutines and Channels — go.dev
  • Python PEP 703 — "Making the GIL Optional in CPython" — peps.python.org
  • Python 3.14 Docs — "Free Threading" — docs.python.org/3.14
  • JetBrains Blog — "Faster Python: Unlocking the GIL" — blog.jetbrains.com, December 2025
  • JEP 444 — "Virtual Threads" — openjdk.org, Java 21
  • Manning — "Concurrency vs Parallelism" — freecontent.manning.com

For more such developer content, visit:
https://vickybytes.com

Note: Edited with AI Assistance

Top comments (0)