<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AZ</title>
    <description>The latest articles on DEV Community by AZ (@tjiaz).</description>
    <link>https://dev.to/tjiaz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1135813%2F66de465e-4c4d-4101-8712-49de1c836c63.jpeg</url>
      <title>DEV Community: AZ</title>
      <link>https://dev.to/tjiaz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tjiaz"/>
    <language>en</language>
    <item>
      <title>Speeding Up Your Python Programs with Concurrency</title>
      <dc:creator>AZ</dc:creator>
      <pubDate>Wed, 01 Apr 2026 22:18:46 +0000</pubDate>
      <link>https://dev.to/tjiaz/speeding-up-your-python-programs-with-concurrency-3m2h</link>
      <guid>https://dev.to/tjiaz/speeding-up-your-python-programs-with-concurrency-3m2h</guid>
      <description>&lt;p&gt;What Is Concurrency?&lt;/p&gt;

&lt;p&gt;At its core, concurrency means a program can juggle multiple sequences of work. In Python, these sequences go by different names — threads, tasks, and processes — but they all share the same basic idea: each one represents a line of execution that can be paused and resumed.&lt;/p&gt;

&lt;p&gt;The important distinction is that threads and asynchronous tasks run on a single processor, switching between each other cleverly rather than truly running side by side. Processes, on the other hand, can run on separate CPU cores simultaneously — that's true parallelism.&lt;/p&gt;

&lt;p&gt;Python offers three main tools for concurrency:&lt;/p&gt;

&lt;p&gt;I/O-Bound vs CPU-Bound Problems&lt;/p&gt;

&lt;p&gt;Before choosing a concurrency approach, it's important to understand what kind of problem you're solving.&lt;/p&gt;

&lt;p&gt;I/O-bound problems are those where your program spends most of its time waiting — for a network response, a file to load, or a database query to return. The CPU sits idle during these waits, so overlapping them with other work can bring big gains.&lt;/p&gt;

&lt;p&gt;CPU-bound problems are the opposite — the bottleneck is the processor itself, working flat out on heavy computation like number crunching or data processing.&lt;br&gt;
Speeding Up I/O-Bound Tasks&lt;/p&gt;

&lt;p&gt;The Synchronous Approach&lt;/p&gt;

&lt;p&gt;A simple synchronous script downloads web pages one by one, waiting for each to finish before moving on. It's easy to write and understand, but slow — all that waiting time is wasted.&lt;/p&gt;

&lt;p&gt;`# sync_downloader.py&lt;br&gt;
import time&lt;br&gt;
import requests&lt;/p&gt;

&lt;p&gt;def fetch_page(url, session):&lt;br&gt;
    with session.get(url) as resp:&lt;br&gt;
        print(f"Fetched {len(resp.content)} bytes from {url}")&lt;/p&gt;

&lt;p&gt;def fetch_all(urls):&lt;br&gt;
    with requests.Session() as session:&lt;br&gt;
        for url in urls:&lt;br&gt;
            fetch_page(url, session)&lt;/p&gt;

&lt;p&gt;def main():&lt;br&gt;
    targets = [&lt;br&gt;
        "&lt;a href="https://www.jython.org" rel="noopener noreferrer"&gt;https://www.jython.org&lt;/a&gt;",&lt;br&gt;
        "&lt;a href="http://olympus.realpython.org/dice" rel="noopener noreferrer"&gt;http://olympus.realpython.org/dice&lt;/a&gt;",&lt;br&gt;
    ] * 80&lt;br&gt;
    t0 = time.perf_counter()&lt;br&gt;
    fetch_all(targets)&lt;br&gt;
    elapsed = time.perf_counter() - t0&lt;br&gt;
    print(f"Fetched {len(targets)} pages in {elapsed:.2f}s")&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()`&lt;/p&gt;

&lt;p&gt;On a typical connection, this might take around 14 seconds for 160 pages. Fine for a one-off script, but painful if you run it regularly.&lt;/p&gt;

&lt;p&gt;Using Threads&lt;/p&gt;

&lt;p&gt;The ThreadPoolExecutor from concurrent.futures makes it easy to distribute work across multiple threads. With a small pool, several downloads can run at the same time.&lt;/p&gt;

&lt;p&gt;One key requirement: requests.Session is not thread-safe, so each thread must maintain its own session using threading.local().&lt;/p&gt;

&lt;p&gt;`# threaded_downloader.py&lt;br&gt;
import threading&lt;br&gt;
import time&lt;br&gt;
from concurrent.futures import ThreadPoolExecutor&lt;br&gt;
import requests&lt;/p&gt;

&lt;p&gt;_thread_local = threading.local()&lt;/p&gt;

&lt;p&gt;def get_session():&lt;br&gt;
    if not hasattr(_thread_local, "session"):&lt;br&gt;
        _thread_local.session = requests.Session()&lt;br&gt;
    return _thread_local.session&lt;/p&gt;

&lt;p&gt;def fetch_page(url):&lt;br&gt;
    session = get_session()&lt;br&gt;
    with session.get(url) as resp:&lt;br&gt;
        print(f"Fetched {len(resp.content)} bytes from {url}")&lt;/p&gt;

&lt;p&gt;def fetch_all(urls):&lt;br&gt;
    with ThreadPoolExecutor(max_workers=5) as pool:&lt;br&gt;
        pool.map(fetch_page, urls)&lt;/p&gt;

&lt;p&gt;def main():&lt;br&gt;
    targets = [&lt;br&gt;
        "&lt;a href="https://www.jython.org" rel="noopener noreferrer"&gt;https://www.jython.org&lt;/a&gt;",&lt;br&gt;
        "&lt;a href="http://olympus.realpython.org/dice" rel="noopener noreferrer"&gt;http://olympus.realpython.org/dice&lt;/a&gt;",&lt;br&gt;
    ] * 80&lt;br&gt;
    t0 = time.perf_counter()&lt;br&gt;
    fetch_all(targets)&lt;br&gt;
    elapsed = time.perf_counter() - t0&lt;br&gt;
    print(f"Fetched {len(targets)} pages in {elapsed:.2f}s")&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()`&lt;/p&gt;

&lt;p&gt;This typically completes in around 3 seconds — roughly 4–5x faster than the synchronous version. The thread pool keeps a fixed number of workers alive and reuses them across all tasks, avoiding the overhead of creating a fresh thread for every request.&lt;/p&gt;

&lt;p&gt;Using asyncio&lt;/p&gt;

&lt;p&gt;The asyncio approach uses a single-threaded event loop with coroutines — lightweight, pausable functions. When a coroutine hits an await, it hands control back to the loop, which immediately starts another task. You'll need aiohttp instead of requests since the latter doesn't support async:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install aiohttp&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;`# async_downloader.py&lt;br&gt;
import asyncio&lt;br&gt;
import time&lt;br&gt;
import aiohttp&lt;/p&gt;

&lt;p&gt;async def fetch_page(url, session):&lt;br&gt;
    async with session.get(url) as resp:&lt;br&gt;
        data = await resp.read()&lt;br&gt;
        print(f"Fetched {len(data)} bytes from {url}")&lt;/p&gt;

&lt;p&gt;async def fetch_all(urls):&lt;br&gt;
    async with aiohttp.ClientSession() as session:&lt;br&gt;
        jobs = [fetch_page(url, session) for url in urls]&lt;br&gt;
        await asyncio.gather(*jobs, return_exceptions=True)&lt;/p&gt;

&lt;p&gt;async def main():&lt;br&gt;
    targets = [&lt;br&gt;
        "&lt;a href="https://www.jython.org" rel="noopener noreferrer"&gt;https://www.jython.org&lt;/a&gt;",&lt;br&gt;
        "&lt;a href="http://olympus.realpython.org/dice" rel="noopener noreferrer"&gt;http://olympus.realpython.org/dice&lt;/a&gt;",&lt;br&gt;
    ] * 80&lt;br&gt;
    t0 = time.perf_counter()&lt;br&gt;
    await fetch_all(targets)&lt;br&gt;
    elapsed = time.perf_counter() - t0&lt;br&gt;
    print(f"Fetched {len(targets)} pages in {elapsed:.2f}s")&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    asyncio.run(main())`&lt;br&gt;
This version can finish in under 0.5 seconds — over 30x faster than the synchronous version and noticeably quicker than threads. Because all tasks share a single session on one thread, there are no thread-safety concerns. The trade-off is that any blocking call inside a coroutine will stall the entire event loop.&lt;/p&gt;

&lt;p&gt;Multiprocessing for I/O (Not Recommended)&lt;/p&gt;

&lt;p&gt;You can technically use ProcessPoolExecutor for I/O-bound work, but it's generally a poor fit. Launching separate Python processes adds significant startup overhead that outweighs any benefit when the real bottleneck is network latency. It ends up slower than the threaded version.&lt;br&gt;
Speeding Up CPU-Bound Tasks&lt;/p&gt;

&lt;p&gt;For this section, a recursive Fibonacci function serves as a stand-in for any heavy computation:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;def fib(n):&lt;br&gt;
    return n if n &amp;lt; 2 else fib(n - 2) + fib(n - 1)&lt;/code&gt;&lt;br&gt;
This grows in complexity very quickly, making it a useful placeholder for real CPU-intensive work.&lt;/p&gt;

&lt;p&gt;Synchronous Version&lt;/p&gt;

&lt;p&gt;`# sync_cpu.py&lt;br&gt;
import time&lt;/p&gt;

&lt;p&gt;def fib(n):&lt;br&gt;
    return n if n &amp;lt; 2 else fib(n - 2) + fib(n - 1)&lt;/p&gt;

&lt;p&gt;def main():&lt;br&gt;
    t0 = time.perf_counter()&lt;br&gt;
    for _ in range(20):&lt;br&gt;
        fib(35)&lt;br&gt;
    elapsed = time.perf_counter() - t0&lt;br&gt;
    print(f"Completed in {elapsed:.2f}s")&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()`&lt;br&gt;
Running this takes around 35 seconds. All work happens on a single CPU core.&lt;/p&gt;

&lt;p&gt;Why Threads and asyncio Won't Help Here&lt;/p&gt;

&lt;p&gt;Because of Python's Global Interpreter Lock (GIL), only one thread can execute Python code at a time. Adding threads to a CPU-bound task doesn't parallelise anything — it just adds overhead. The threaded version runs in roughly 40 seconds, slightly slower than the synchronous one.&lt;/p&gt;

&lt;p&gt;The async version fares even worse at around 86 seconds, because the overhead of suspending and resuming coroutines at every single await compounds massively over millions of recursive calls.&lt;/p&gt;

&lt;p&gt;Multiprocessing — The Right Tool for CPU Work&lt;/p&gt;

&lt;p&gt;ProcessPoolExecutor launches a separate Python interpreter for each CPU core, bypassing the GIL entirely. Each process runs independently and in true parallel:&lt;/p&gt;

&lt;p&gt;`# parallel_cpu.py&lt;br&gt;
import time&lt;br&gt;
from concurrent.futures import ProcessPoolExecutor&lt;/p&gt;

&lt;p&gt;def fib(n):&lt;br&gt;
    return n if n &amp;lt; 2 else fib(n - 2) + fib(n - 1)&lt;/p&gt;

&lt;p&gt;def main():&lt;br&gt;
    t0 = time.perf_counter()&lt;br&gt;
    with ProcessPoolExecutor() as pool:&lt;br&gt;
        pool.map(fib, [35] * 20)&lt;br&gt;
    elapsed = time.perf_counter() - t0&lt;br&gt;
    print(f"Completed in {elapsed:.2f}s")&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()`&lt;/p&gt;

&lt;p&gt;On a four-core machine this finishes in about 10 seconds — less than a third of the synchronous time. Notice the code is almost identical to the threaded version; you only swapped ThreadPoolExecutor for ProcessPoolExecutor.&lt;/p&gt;

&lt;p&gt;One thing to keep in mind: since each process has its own memory space, shared objects like database connections or sessions need to be initialised per-process using the initializer parameter.&lt;br&gt;
Choosing the Right Tool&lt;/p&gt;

&lt;p&gt;Here's a simple decision framework:&lt;/p&gt;

&lt;p&gt;Step 1 — Don't add concurrency prematurely. Measure first and optimise only where it's actually slow.&lt;/p&gt;

&lt;p&gt;Step 2 — Identify whether your bottleneck is I/O or CPU.&lt;/p&gt;

&lt;p&gt;Step 3 — Pick the right tool:&lt;/p&gt;

&lt;p&gt;Summary&lt;/p&gt;

&lt;p&gt;Concurrency is a powerful tool, but it's not free — it adds complexity and potential for subtle bugs. Applied to the right problem with the right tool, though, it can turn a 35-second script into a 10-second one, or a 14-second downloader into something that finishes in half a second. Understand your bottleneck first, then reach for the appropriate model.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>performance</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
