Timothy was optimizing a web scraper when he hit a wall. "Margaret, I don't understand threading in Python. I rewrote my scraper to use 4 threads thinking it would be 4x faster, but it's actually slower than the single-threaded version! My CPU monitor shows only one core is being used. Everyone says 'it's the GIL,' but what is the GIL? And why does Python have this limitation?"
Margaret leaned back with a knowing smile. "The GIL - the Global Interpreter Lock. It's Python's most misunderstood feature and the source of endless debate. But here's the secret: the GIL isn't a bug, it's a design tradeoff that shaped Python's entire ecosystem. Understanding it will completely change how you write concurrent Python code."
"A design tradeoff?" Timothy looked skeptical. "It sounds like a limitation."
"It is a limitation - but one that makes other things possible," Margaret said. "The real mystery isn't what the GIL is, it's why it exists and when it actually matters. Let me show you the puzzle first, then we'll uncover the truth behind Python's threading."
She leaned forward. "This isn't just about locks. It's about choosing the right tool for the job. We'll start with why your scraper slowed down, then we'll master the three concurrency approaches—Threading for I/O, Multiprocessing for CPU, and Async/Await for modern web servers. Finally, we'll look at how Python 3.13 is making the GIL optional. By the end, you'll know exactly when the GIL matters and when it doesn't."
The Puzzle: Threads That Don't Speed Things Up
Timothy showed Margaret his confusing benchmark:
import time
import threading
def cpu_bound_task(n):
"""CPU-intensive work"""
count = 0
for i in range(n):
count += i * i
return count
def benchmark_single_thread():
"""Run task in single thread"""
start = time.time()
result = cpu_bound_task(10_000_000)
elapsed = time.time() - start
print(f"Single thread: {elapsed:.2f} seconds")
return elapsed
def benchmark_multi_thread():
"""Run same task with 4 threads"""
start = time.time()
threads = []
for _ in range(4):
thread = threading.Thread(target=cpu_bound_task, args=(10_000_000,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
elapsed = time.time() - start
print(f"Four threads: {elapsed:.2f} seconds")
return elapsed
print("CPU-bound task benchmark:")
single = benchmark_single_thread()
multi = benchmark_multi_thread()
print(f"Speedup: {single/multi:.2f}x")
Output:
CPU-bound task benchmark:
Single thread: 0.85 seconds
Four threads: 1.12 seconds
Speedup: 0.76x
"See?" Timothy pointed at the output. "Four threads should be 4x faster, but it's actually slower! What's going on?"
"That's the GIL in action," Margaret said. "Only one thread can execute Python bytecode at a time. When you use multiple threads for CPU-bound work, they fight over the GIL, creating overhead without parallelism."
"Wait," Timothy said slowly. "Only one thread at a time? Then what's the point of threading in Python at all?"
"Perfect question. Let me show you what the GIL actually is - and more importantly, when it matters."
What Is the GIL?
Margaret pulled up a comprehensive explanation:
"""
THE GIL (GLOBAL INTERPRETER LOCK): A mutex that protects Python objects
KEY CONCEPTS:
- Only ONE thread executes Python bytecode at a time
- The GIL is a single lock protecting the entire Python interpreter
- Threads take turns holding the GIL (time-based switching, ~5ms default)
- CPU-bound threads compete for the GIL (overhead, no speedup)
- I/O-bound threads release the GIL while waiting (parallelism!)
THE GIL IS:
- A mutex (mutual exclusion lock)
- Global (one lock for entire interpreter)
- Protecting access to Python objects
- Released during I/O operations
- Specific to CPython (not Python the language)
WHY IT EXISTS:
- Simplifies CPython's memory management
- Makes C extensions easier to write
- Protects reference counts from race conditions
- Historical decision (1990s single-core era)
THE TRADEOFF:
✓ Simpler implementation
✓ Faster single-threaded performance
✓ Easy C extension integration
✗ No multi-core CPU parallelism for pure Python
✗ Threading doesn't speed up CPU-bound code
"""
import sys
def demonstrate_gil_concept():
"""Show GIL behavior conceptually"""
import threading
import time
print("GIL Demonstration:")
print("Thread A wants to execute Python code")
print(" 1. Thread A acquires GIL")
print(" 2. Thread A executes Python code for ~5ms (default)")
print(" 3. Thread A releases GIL (or forced to release)")
print(" 4. Thread B acquires GIL")
print(" 5. Thread B executes Python code for ~5ms")
print(" 6. Thread B releases GIL")
print(" 7. Repeat...\n")
print("Result: Only ONE thread runs Python code at any instant")
print("The threads take turns, creating the illusion of concurrency")
print(f"Switch interval: {sys.getswitchinterval()} seconds\n")
# Show actual thread switching
counter = {'value': 0}
def increment():
for _ in range(3):
current = counter['value']
print(f" Thread {threading.current_thread().name}: read {current}")
time.sleep(0.01) # Force context switch
counter['value'] = current + 1
print(f" Thread {threading.current_thread().name}: wrote {counter['value']}")
print("Thread interleaving (with forced context switches):")
t1 = threading.Thread(target=increment, name='A')
t2 = threading.Thread(target=increment, name='B')
t1.start()
t2.start()
t1.join()
t2.join()
print(f"\nFinal value: {counter['value']}")
print("✓ Threads interleave, but only one executes at a time")
print("✓ Note: This demonstrates interleaving, not race-free execution")
demonstrate_gil_concept()
Output:
GIL Demonstration:
Thread A wants to execute Python code
1. Thread A acquires GIL
2. Thread A executes Python code for ~5ms (default)
3. Thread A releases GIL (or forced to release)
4. Thread B acquires GIL
5. Thread B executes Python code for ~5ms
6. Thread B releases GIL
7. Repeat...
Result: Only ONE thread runs Python code at any instant
The threads take turns, creating the illusion of concurrency
Switch interval: 0.005 seconds
Thread interleaving (with forced context switches):
Thread A: read 0
Thread A: wrote 1
Thread B: read 1
Thread B: wrote 2
Thread A: read 2
Thread A: wrote 3
Thread B: read 3
Thread B: wrote 4
Thread A: read 4
Thread A: wrote 5
Thread B: read 5
Thread B: wrote 6
Final value: 6
✓ Threads interleave, but only one executes at a time
✓ Note: This demonstrates interleaving, not race-free execution
Timothy studied the output carefully. "So the GIL is like a single microphone at a debate - only one person can speak at a time. The threads pass the microphone back and forth every few milliseconds, but they can't all talk simultaneously."
"Perfect analogy! And this is why your CPU-bound code didn't get faster with threads. All four threads were fighting over the same microphone, with constant handoffs creating overhead."
"But then why does Python have threading at all?" Timothy asked.
"Because not all work is CPU-bound. The GIL has a secret: it gets released during I/O operations. Let me show you where threading actually helps."
CPU-Bound vs I/O-Bound: The Critical Distinction
Margaret opened a revealing comparison:
import time
import threading
def demonstrate_io_bound():
"""Show that threading DOES help with I/O-bound work"""
def download_file(url):
"""Simulate downloading a file (I/O-bound)"""
print(f" Downloading {url}...")
time.sleep(1) # Simulates network I/O - GIL is released here!
print(f" Finished {url}")
return f"data from {url}"
urls = ['url1', 'url2', 'url3', 'url4']
# Single-threaded approach
print("Single-threaded downloads:")
start = time.time()
for url in urls:
download_file(url)
single_time = time.time() - start
print(f"Total time: {single_time:.2f} seconds\n")
# Multi-threaded approach
print("Multi-threaded downloads:")
start = time.time()
threads = []
for url in urls:
thread = threading.Thread(target=download_file, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
multi_time = time.time() - start
print(f"Total time: {multi_time:.2f} seconds")
print(f"Speedup: {single_time/multi_time:.2f}x")
print("\n✓ Threading speeds up I/O-bound work!")
print("✓ The GIL is released during I/O operations")
print("✓ While one thread waits for I/O, others can run")
demonstrate_io_bound()
Output:
Single-threaded downloads:
Downloading url1...
Finished url1
Downloading url2...
Finished url2
Downloading url3...
Finished url3
Downloading url4...
Finished url4
Total time: 4.00 seconds
Multi-threaded downloads:
Downloading url1...
Downloading url2...
Downloading url3...
Downloading url4...
Finished url1
Finished url2
Finished url3
Finished url4
Total time: 1.01 seconds
Speedup: 3.96x
✓ Threading speeds up I/O-bound work!
✓ The GIL is released during I/O operations
✓ While one thread waits for I/O, others can run
"Whoa!" Timothy exclaimed. "The same threading approach that made CPU-bound code slower made I/O-bound code 4x faster! What's the difference?"
"The key is what happens to the GIL," Margaret explained. "During I/O operations - network requests, file reads, database queries, even sleep() - Python releases the GIL. While one thread waits for I/O, other threads can acquire the GIL and run. This is why web scrapers, API clients, and database applications benefit from threading."
"So threading works when threads are mostly waiting, not computing," Timothy said.
"Exactly. But for real-world I/O work, there's an even better approach. Let me show you the modern way."
ThreadPoolExecutor: The Modern Approach
Margaret demonstrated the recommended pattern:
from concurrent.futures import ThreadPoolExecutor
import time
def demonstrate_threadpool():
"""Show the modern threading approach"""
def fetch_url(url):
"""Simulate fetching a URL"""
time.sleep(1) # Network I/O
return f"Content from {url}"
urls = [f"https://example.com/page{i}" for i in range(10)]
# Old way: manual thread management
print("Manual thread management:")
start = time.time()
threads = []
for url in urls:
t = threading.Thread(target=fetch_url, args=(url,))
threads.append(t)
t.start()
for t in threads:
t.join()
manual_time = time.time() - start
print(f" Time: {manual_time:.2f} seconds\n")
# Modern way: ThreadPoolExecutor
print("ThreadPoolExecutor (recommended):")
start = time.time()
with ThreadPoolExecutor(max_workers=10) as executor:
results = list(executor.map(fetch_url, urls))
pool_time = time.time() - start
print(f" Time: {pool_time:.2f} seconds")
print(f" Results: {len(results)} pages fetched")
print("\n✓ ThreadPoolExecutor is cleaner and safer")
print("✓ Automatically manages thread lifecycle")
print("✓ Built-in support for results and exceptions")
demonstrate_threadpool()
Output:
Manual thread management:
Time: 1.01 seconds
ThreadPoolExecutor (recommended):
Time: 1.01 seconds
Results: 10 pages fetched
✓ ThreadPoolExecutor is cleaner and safer
✓ Automatically manages thread lifecycle
✓ Built-in support for results and exceptions
"So for I/O-bound work, use ThreadPoolExecutor instead of raw threads," Timothy noted.
"Right. It's in the standard library, handles edge cases, and makes your code much cleaner. But there's another approach that's even more efficient for I/O..."
Async/Await: Single-Threaded Concurrency
Margaret showed the modern alternative:
import asyncio
import time
async def demonstrate_async():
"""Show async/await for I/O-bound work"""
async def download_file(url):
"""Async download simulation"""
print(f" Starting {url}")
await asyncio.sleep(1) # Async I/O
print(f" Finished {url}")
return f"data from {url}"
urls = ['url1', 'url2', 'url3', 'url4']
print("Async/await approach:")
start = time.time()
# Run all downloads concurrently
results = await asyncio.gather(*[download_file(url) for url in urls])
elapsed = time.time() - start
print(f"\nTotal time: {elapsed:.2f} seconds")
print(f"Results: {len(results)} files downloaded")
print("\n✓ Async/await provides concurrency without threads!")
print("✓ Single-threaded cooperative multitasking")
print("✓ More efficient than threading for I/O-bound work")
print("✓ No GIL contention because it's single-threaded")
# Run the async function
print("Async demonstration:")
asyncio.run(demonstrate_async())
Output:
Async demonstration:
Async/await approach:
Starting url1
Starting url2
Starting url3
Starting url4
Finished url1
Finished url2
Finished url3
Finished url4
Total time: 1.01 seconds
Results: 4 files downloaded
✓ Async/await provides concurrency without threads!
✓ Single-threaded cooperative multitasking
✓ More efficient than threading for I/O-bound work
✓ No GIL contention because it's single-threaded
"So async/await is like threading for I/O-bound work, but more efficient because it's single-threaded," Timothy observed.
"Exactly. Async/await uses cooperative multitasking - tasks voluntarily yield control at await points. No thread overhead, no GIL contention, perfect for I/O-bound workloads like web servers. Libraries like aiohttp and httpx provide async HTTP clients that work beautifully with this model."
"But what about CPU-bound work?" Timothy asked. "What do I use for that?"
The Solution: Multiprocessing for CPU-Bound Work
Margaret demonstrated the alternative:
import multiprocessing
import time
def cpu_intensive(n):
"""CPU-bound work"""
count = 0
for i in range(n):
count += i * i
return count
def demonstrate_multiprocessing():
"""Show multiprocessing for CPU-bound work"""
# Single process
print("Single process:")
start = time.time()
result = cpu_intensive(10_000_000)
single_time = time.time() - start
print(f" Time: {single_time:.2f} seconds\n")
# Multiple processes
print("Multiple processes (4 cores):")
start = time.time()
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(cpu_intensive, [10_000_000] * 4)
multi_time = time.time() - start
print(f" Time: {multi_time:.2f} seconds")
print(f" Speedup: {(single_time * 4)/multi_time:.2f}x")
print("\n✓ Multiprocessing bypasses the GIL!")
print("✓ Each process has its own Python interpreter and GIL")
print("✓ True parallel execution on multiple CPU cores")
print("✓ Use for CPU-bound work like data processing, computation")
if __name__ == '__main__':
demonstrate_multiprocessing()
"So instead of threads sharing one GIL, each process has its own Python interpreter with its own GIL," Timothy said. "That means true parallelism on multiple CPU cores."
"Exactly. Multiprocessing is heavier - starting processes takes time, and you can't share memory easily - but it's the solution for CPU-bound parallelism in Python. The if __name__ == '__main__' guard is essential to prevent infinite process spawning on Windows."
"What about NumPy and Pandas?" Timothy asked. "I've heard they can use multiple cores even with threads."
When C Extensions Release the GIL
Margaret showed a critical detail:
"""
C EXTENSIONS AND THE GIL:
Well-written C extensions release the GIL during computation.
This allows threading to provide speedup even for CPU-bound work!
LIBRARIES THAT RELEASE THE GIL:
✓ NumPy (during array operations)
✓ Pandas (during many operations)
✓ Pillow (image processing)
✓ Cryptography operations
✓ Compression libraries (zlib, lzma)
✓ Some database drivers
IMPORTANT CAVEATS:
✗ Python-level operations don't release GIL (indexing, slicing)
✗ Only the underlying C/Fortran/CUDA code releases GIL
✗ Not all operations in these libraries release GIL
✗ Element-wise Python operations are still GIL-bound
EXAMPLE:
arr1 @ arr2 # Matrix multiply - releases GIL ✓
arr1[0] # Indexing - keeps GIL ✗
arr1 + arr2 # Array addition - releases GIL ✓
[x*2 for x in arr1] # List comp - keeps GIL ✗
"""
import numpy as np
import time
import threading
def demonstrate_numpy_gil_release():
"""Show that NumPy operations can benefit from threading"""
print("NumPy with threading:")
print("(Results may vary based on BLAS library and CPU)")
def matrix_multiply():
"""CPU-intensive NumPy operation"""
arr = np.random.rand(1000, 1000)
result = arr @ arr # Matrix multiply releases GIL
return result
# Single thread
start = time.time()
matrix_multiply()
single_time = time.time() - start
print(f"\nSingle thread: {single_time:.3f} seconds")
# Multiple threads (may not show 4x speedup due to BLAS threading)
start = time.time()
threads = [threading.Thread(target=matrix_multiply) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
multi_time = time.time() - start
print(f"Four threads: {multi_time:.3f} seconds")
print(f"Speedup: {(single_time * 4)/multi_time:.2f}x")
print("\n✓ NumPy releases GIL during computation")
print("✓ Threading can provide speedup for NumPy operations")
print("⚠ Actual speedup depends on BLAS library configuration")
print("⚠ BLAS may already use multiple threads internally")
demonstrate_numpy_gil_release()
"So the GIL only blocks pure Python code," Timothy realized. "Libraries like NumPy that do heavy lifting in C can release the GIL and use multiple cores."
"Right, but with a caveat: you're only getting parallelism during the C-level operations. Python-level operations like indexing or list comprehensions are still GIL-bound. And some libraries like NumPy use BLAS libraries that already multithread internally, so adding more Python threads might not help."
"This is getting clearer," Timothy said. "But why does Python have the GIL in the first place? Why not just remove it?"
Why Python Has the GIL: Reference Counting
Margaret pulled up the technical explanation:
"""
WHY THE GIL EXISTS: Reference Counting
CPython uses reference counting for memory management.
Every Python object tracks how many references point to it.
When the count reaches zero, the object is freed.
THE PROBLEM WITHOUT GIL:
Without a global lock, every reference count operation needs its own lock.
That means a lock on EVERY Python object!
Example operations that change reference counts:
- Assignment: x = y
- Function calls: func(x)
- Container operations: list.append(x)
- Attribute access: obj.attr
- Variable deletion: del x
These happen CONSTANTLY in Python code.
Per-object locks would mean:
✗ Millions of lock/unlock operations
✗ Lock contention on popular objects
✗ Massive memory overhead (lock per object)
✗ Deadlock potential
✗ Cache thrashing
THE GIL SOLUTION:
✓ One global lock protects all reference counts
✓ Simple: acquire GIL, modify any object, release GIL
✓ Fast for single-threaded code (the common case)
✓ No per-object lock overhead
HISTORICAL CONTEXT:
- Created in 1991 when CPUs had one core
- Multi-core CPUs weren't common until ~2005
- By then, too much code depended on GIL
- C extensions assumed GIL protection
- Removing it would break the ecosystem
"""
import sys
def demonstrate_reference_counting():
"""Show reference counting in action"""
x = []
print(f"Reference count of x: {sys.getrefcount(x) - 1}") # -1 for getrefcount's arg
y = x
print(f"After y = x: {sys.getrefcount(x) - 1}")
z = [x, x, x] # Three references in list
print(f"After z = [x,x,x]: {sys.getrefcount(x) - 1}")
del y
print(f"After del y: {sys.getrefcount(x) - 1}")
del z
print(f"After del z: {sys.getrefcount(x) - 1}")
print("\n✓ Every assignment changes reference count")
print("✓ Without GIL, each change needs a lock")
print("✓ That's a lock on EVERY Python object!")
print("✓ GIL is simpler: one lock protects everything")
demonstrate_reference_counting()
Output:
Reference count of x: 1
After y = x: 2
After z = [x,x,x]: 5
After del y: 4
After del z: 1
✓ Every assignment changes reference count
✓ Without GIL, each change needs a lock
✓ That's a lock on EVERY Python object!
✓ GIL is simpler: one lock protects everything
"So the GIL is a performance optimization," Timothy said slowly. "One global lock is faster than millions of per-object locks. It makes single-threaded code faster at the cost of multi-threaded parallelism."
"Exactly. And it made C extensions trivial to write. NumPy, Pandas, Pillow - they all rely on the GIL for thread safety. Without it, every C extension would need complex thread safety code."
Thread Safety: When You Still Need Locks
Margaret showed an important caveat:
import threading
import time
def demonstrate_thread_safety():
"""Show that GIL doesn't make your code thread-safe"""
# Shared counter
counter = 0
def increment_unsafe():
"""Not thread-safe even with GIL"""
nonlocal counter
for _ in range(100000):
counter += 1 # This is NOT atomic!
print("Without locks (race condition possible):")
counter = 0
threads = [threading.Thread(target=increment_unsafe) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f" Expected: 400000")
print(f" Got: {counter}")
print(f" Lost updates: {400000 - counter}")
# With proper locking
counter = 0
lock = threading.Lock()
def increment_safe():
"""Thread-safe with explicit lock"""
nonlocal counter
for _ in range(100000):
with lock:
counter += 1
print("\nWith threading.Lock (thread-safe):")
threads = [threading.Thread(target=increment_safe) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f" Expected: 400000")
print(f" Got: {counter}")
print("\n✓ GIL protects Python internals, not your data!")
print("✓ You still need locks for shared mutable state")
print("✓ The GIL can be released between Python operations")
demonstrate_thread_safety()
Output:
Without locks (race condition possible):
Expected: 400000
Got: 387429
Lost updates: 12571
With threading.Lock (thread-safe):
Expected: 400000
Got: 400000
✓ GIL protects Python internals, not your data!
✓ You still need locks for shared mutable state
✓ The GIL can be released between Python operations
"Wait," Timothy said, "I thought the GIL meant only one thread runs at a time. How can there be race conditions?"
"Because counter += 1 is actually multiple operations: load counter, add 1, store counter. The GIL can be released between these operations. The GIL protects Python's internal structures - reference counts, memory allocator - but not your application data. You still need explicit locks for shared state."
Python 3.13: The Free-Threaded Future
Margaret showed the latest development:
"""
PYTHON 3.13 FREE-THREADED MODE (October 2024):
PEP 703: Making the Global Interpreter Lock Optional
WHAT'S NEW:
- Can compile CPython with --disable-gil
- No GIL = true multi-threaded parallelism
- Experimental in Python 3.13, stable in later versions
- Different binary: python3.13t (the 't' means free-threaded)
THE TRADEOFF:
✓ Multi-threaded CPU-bound code runs in parallel
✓ All CPU cores utilized for pure Python
✓ Better for specific workloads (see below)
✗ Single-threaded code can be slower (varies by workload)
✗ C extensions need updates for thread safety
✗ Higher memory usage
✗ More complex runtime behavior
HOW IT WORKS:
- Replaces global lock with per-object locks
- Uses biased reference counting
- Immortal objects for frequently-used objects
- Deferred reference counting optimizations
WHEN TO USE FREE-THREADED MODE:
✓ CPU-bound multi-threaded applications
✓ When you can't use multiprocessing (shared memory needed)
✓ Specific computational workloads
✗ Most web servers (async is better)
✗ I/O-bound applications (threading already works)
✗ Single-threaded scripts (may be slower)
EARLY PERFORMANCE RESULTS (as of 3.13.0):
- Single-threaded: 0-40% slower depending on workload
- Multi-threaded CPU-bound: Can see near-linear speedup
- I/O-bound: Similar to regular Python (GIL already released)
- Numbers are improving with each release
THE FUTURE:
- Python 3.13-3.15: Experimental, opt-in builds
- Performance gap will narrow over time
- Ecosystem needs time to adapt (C extensions)
- May become default in Python 4.0+, but not certain
CHECKING GIL STATUS:
import sys
print(sys._is_gil_enabled()) # False in free-threaded build
"""
import sys
def check_gil_status():
"""Check if GIL is enabled"""
print("GIL Status Check:")
# Check if we're in free-threaded mode
if hasattr(sys, '_is_gil_enabled'):
if sys._is_gil_enabled():
print(" ✓ GIL is ENABLED (standard CPython)")
print(f" ✓ Switch interval: {sys.getswitchinterval()} seconds")
else:
print(" ✓ GIL is DISABLED (free-threaded mode)")
print(" ✓ True multi-threaded parallelism available")
else:
print(" ✓ Python version < 3.13 (GIL is enabled)")
print(f"\nPython version: {sys.version}")
print(f"Thread switch interval: {sys.getswitchinterval()} seconds")
check_gil_status()
"So Python is finally removing the GIL," Timothy said.
"Not quite," Margaret corrected. "Python 3.13 makes the GIL optional. You can compile a special free-threaded build if you need true multi-threaded parallelism and accept the tradeoffs. But the standard Python build still has the GIL, and that's not changing anytime soon. The ecosystem needs years to adapt."
PEP 554: Subinterpreters - Another Approach
Margaret showed an alternative future direction:
"""
PEP 554: SUBINTERPRETERS
Another approach to parallelism: multiple isolated interpreters
in the same process.
CONCEPT:
- Each subinterpreter has its own GIL
- Subinterpreters share the same process
- Lighter than multiprocessing
- More isolated than threading
BENEFITS:
✓ Each subinterpreter can run Python in parallel
✓ Lighter than separate processes
✓ Better than multiprocessing for some workloads
✓ Doesn't require removing the GIL
STATUS:
- Experimental in Python 3.12+
- API still evolving
- Not yet recommended for production
- Future alternative to multiprocessing
This is another way Python is evolving to handle parallelism
without removing the GIL.
"""
print("Subinterpreters (PEP 554):")
print(" - Multiple interpreters in one process")
print(" - Each with its own GIL")
print(" - Status: Experimental")
print(" - Future alternative to multiprocessing")
The Traffic Light Metaphor
Margaret brought it all together with a metaphor:
"Think of the GIL like a traffic light at a busy intersection.
"Without the light (no GIL): Cars (threads) can all try to go at once. This creates chaos - they crash into each other (race conditions), resources are wasted on collision avoidance (per-object locks), and traffic actually moves slower overall. Every car needs its own traffic coordinator (per-object lock overhead).
"With the light (GIL): Cars take turns. Only one direction moves at a time. This seems limiting - cars could theoretically all go if they had perfect coordination. But in practice, the simple traffic light makes traffic flow faster and safer for the common case (single-threaded code).
"The light's secret (GIL release during I/O): The light turns green for cross-traffic when the main road is empty. If cars are just passing through to side streets (I/O operations), they leave the intersection quickly, and other directions get their turn. This is why threading works great for I/O-bound work.
"The highway problem (CPU-bound work): If you have a highway with 8 lanes of continuous traffic (CPU-bound work on 8 cores), a single traffic light becomes a bottleneck. Every lane has to wait for the light.
"The solutions:
- Separate roads (multiprocessing): Each process has its own intersection and traffic light. True parallelism, but more infrastructure overhead.
- Efficient merging (async/await): Instead of 8 lanes fighting for a light, use cooperative merging. Single-lane traffic that flows smoothly by design.
- Smart intersections (C extensions): Some traffic (NumPy, Pandas) uses special lanes that bypass the main light, allowing parallelism.
- Remove the light (free-threaded Python): Makes multi-lane traffic possible, but now every car needs complex coordination (per-object locks), making single-lane traffic slower.
"The GIL is like the traffic light: a simple solution that works well for common cases, with known workarounds for special cases."
Practical Guidelines: The Decision Tree
Margaret provided a comprehensive guide:
"""
CONCURRENCY DECISION TREE:
START HERE: What type of work are you doing?
┌─────────────────────────────────────────────┐
│ Is your work I/O-BOUND? │
│ (network, files, databases, waiting) │
└─────────────────────────────────────────────┘
│
├─YES → Use threading or async/await
│ ├─ Simple tasks → ThreadPoolExecutor
│ ├─ Web server/APIs → async/await (FastAPI, aiohttp)
│ ├─ Mixed I/O → ThreadPoolExecutor
│ └─ Need shared state → threading + locks
│
├─NO → Is it CPU-BOUND?
│ (computation, loops, data processing)
│
├─YES → Use multiprocessing or specialized libraries
│ ├─ Pure Python → multiprocessing.Pool
│ ├─ NumPy/Pandas → Use their operations (may release GIL)
│ ├─ Need shared memory → multiprocessing.shared_memory
│ ├─ Python 3.13+ → Consider free-threaded build
│ └─ High-level → ProcessPoolExecutor
│
└─MIXED (some I/O, some CPU) → Combine approaches
├─ ThreadPoolExecutor for I/O
├─ ProcessPoolExecutor for CPU
└─ Use both with concurrent.futures
REAL-WORLD EXAMPLES:
Web Scraping:
✓ Mostly I/O (network requests)
→ Use: ThreadPoolExecutor or async/await
→ Libraries: aiohttp, httpx
Data Analysis:
✓ CPU-bound (processing data)
→ Use: Pandas/NumPy (releases GIL) or multiprocessing
→ Libraries: pandas, numpy, dask
Web Server:
✓ Mostly I/O (handling requests)
→ Use: async/await
→ Libraries: FastAPI, aiohttp, uvicorn
Machine Learning Training:
✓ CPU/GPU-bound
→ Use: PyTorch/TensorFlow (releases GIL for GPU ops)
→ The GIL doesn't affect GPU computation
Image Processing:
✓ CPU-bound
→ Use: Pillow (releases GIL) or multiprocessing
→ Libraries: PIL, OpenCV
File Processing (reading many files):
✓ I/O-bound
→ Use: ThreadPoolExecutor
→ Process files in parallel
Complex Calculation (pure Python):
✓ CPU-bound
→ Use: multiprocessing.Pool
→ Each process on separate core
API Client (calling multiple APIs):
✓ I/O-bound
→ Use: async/await with aiohttp
→ Concurrent API calls
"""
def print_decision_tree():
"""Visual decision tree"""
print("QUICK REFERENCE:")
print()
print("I/O-BOUND (network, files, DB):")
print(" → ThreadPoolExecutor or async/await")
print(" → GIL released during I/O")
print(" → Threading works great!")
print()
print("CPU-BOUND (computation, loops):")
print(" → multiprocessing.Pool")
print(" → Or use NumPy/Pandas (releases GIL)")
print(" → Each process gets own GIL")
print()
print("NEED SHARED MEMORY:")
print(" → threading.Lock + threading")
print(" → Or multiprocessing.shared_memory")
print()
print("WEB SERVER:")
print(" → async/await (FastAPI, aiohttp)")
print(" → Most efficient for I/O")
print()
print("KEY RULE:")
print(" Threading for I/O, multiprocessing for CPU")
print_decision_tree()
Common Misconceptions
Margaret addressed the myths:
"""
COMMON GIL MISCONCEPTIONS:
MYTH 1: "Threading is useless in Python"
✗ FALSE: Threading is excellent for I/O-bound work
✓ TRUTH: Threading doesn't help CPU-bound work
MYTH 2: "The GIL makes Python slow"
✗ FALSE: The GIL makes single-threaded Python FASTER
✓ TRUTH: The GIL prevents multi-threaded CPU parallelism
MYTH 3: "You can't do parallelism in Python"
✗ FALSE: Multiprocessing provides true parallelism
✓ TRUTH: Pure Python threading can't use multiple cores for CPU work
MYTH 4: "The GIL is a bug or mistake"
✗ FALSE: The GIL is a deliberate design tradeoff
✓ TRUTH: It prioritizes single-thread speed and C extension simplicity
MYTH 5: "NumPy/Pandas are slow because of the GIL"
✗ FALSE: They release the GIL during computation
✓ TRUTH: NumPy/Pandas can use multiple cores
MYTH 6: "The GIL will be removed in Python 4"
✗ FALSE: No such plan exists
✓ TRUTH: Python 3.13+ makes it optional, not removed
MYTH 7: "Only CPython has the GIL"
✗ PARTIALLY TRUE: PyPy, Jython, IronPython don't have GIL
✓ BUT: CPython is 95%+ of Python usage
MYTH 8: "The GIL makes Python thread-safe"
✗ FALSE: You still need locks for shared mutable state
✓ TRUTH: GIL protects Python internals, not your data
MYTH 9: "async/await removes the GIL"
✗ FALSE: async is single-threaded, GIL is irrelevant
✓ TRUTH: async provides concurrency without parallelism
MYTH 10: "All Python code is affected by the GIL"
✗ FALSE: C extensions can release the GIL
✓ TRUTH: Only pure Python code is GIL-bound
"""
Key Takeaways
Margaret summarized everything:
"""
GIL MASTER SUMMARY:
═══════════════════════════════════════════════════════════
1. WHAT IS THE GIL
═══════════════════════════════════════════════════════════
- Global Interpreter Lock (mutex)
- Only one thread executes Python bytecode at a time
- Threads take turns (~5ms intervals by default)
- Specific to CPython, not Python the language
- Protects Python's reference counting
═══════════════════════════════════════════════════════════
2. WHY IT EXISTS
═══════════════════════════════════════════════════════════
- Simplifies reference counting (one lock vs millions)
- Makes C extensions easier and safer
- Historical decision (1991, single-core era)
- Performance optimization for single-threaded code
- Removing it would break the ecosystem
═══════════════════════════════════════════════════════════
3. THE IMPACT
═══════════════════════════════════════════════════════════
CPU-BOUND: Threading doesn't help (use multiprocessing)
I/O-BOUND: Threading helps (GIL released during I/O)
SINGLE: Faster than without GIL
═══════════════════════════════════════════════════════════
4. WHEN GIL IS RELEASED
═══════════════════════════════════════════════════════════
✓ I/O operations (network, files, databases)
✓ time.sleep() and blocking calls
✓ C extension computations (NumPy, Pandas, etc.)
✓ Explicitly via C API (Py_BEGIN_ALLOW_THREADS)
═══════════════════════════════════════════════════════════
5. CONCURRENCY STRATEGIES
═══════════════════════════════════════════════════════════
I/O-BOUND:
→ ThreadPoolExecutor (simple)
→ async/await (modern, efficient)
CPU-BOUND:
→ multiprocessing.Pool (pure Python)
→ NumPy/Pandas operations (release GIL)
MIXED:
→ Combine both approaches
═══════════════════════════════════════════════════════════
6. THE FUTURE
═══════════════════════════════════════════════════════════
- Python 3.13+: Optional free-threaded mode
- Trade-off: parallelism vs single-thread speed
- GIL isn't being removed, it's becoming optional
- Ecosystem needs years to adapt
- Subinterpreters (PEP 554) as alternative
═══════════════════════════════════════════════════════════
7. PRACTICAL RULES
═══════════════════════════════════════════════════════════
1. Know if your work is I/O or CPU bound
2. Threading for I/O, multiprocessing for CPU
3. Use libraries that release GIL when possible
4. Don't fight the GIL, work with it
5. GIL doesn't make your code thread-safe
6. Most Python code is I/O-bound anyway
═══════════════════════════════════════════════════════════
8. THE BIG PICTURE
═══════════════════════════════════════════════════════════
- GIL is a pragmatic tradeoff, not a flaw
- Enabled Python's success and ecosystem
- Not unique to Python (Ruby MRI has similar)
- Modern Python offers solutions for all use cases
- Understanding GIL makes you a better Python programmer
═══════════════════════════════════════════════════════════
9. BOTTOM LINE
═══════════════════════════════════════════════════════════
The GIL is a feature, not a bug. It's a conscious design
decision that prioritizes:
✓ Single-threaded performance (the common case)
✓ Simple C extension API (enabled NumPy, Pandas, etc.)
✓ Easier CPython implementation
At the cost of:
✗ No multi-core CPU parallelism for pure Python
But with solutions:
✓ Threading/async for I/O (works great!)
✓ Multiprocessing for CPU (true parallelism)
✓ C extensions that release GIL (NumPy, etc.)
✓ Python 3.13+ free-threaded mode (optional)
Choose the right tool for your workload, and the GIL
won't hold you back.
"""
Timothy leaned back, finally getting the complete picture. "So the GIL isn't Python's limitation - it's CPython's tradeoff. It's a mutex that protects Python's internal structures and makes single-threaded code fast. For I/O-bound work, threading and async work great because the GIL is released during I/O. For CPU-bound work, I should use multiprocessing or libraries like NumPy that release the GIL. And Python 3.13 is making the GIL optional for those who need multi-threaded CPU parallelism and accept slower single-threaded performance."
"Perfect understanding," Margaret confirmed. "The GIL is one of Python's most debated features, but it's not a flaw - it's a conscious design decision with clear tradeoffs. It enabled Python's explosive growth by making the interpreter simpler and C extensions easier. Understanding those tradeoffs lets you write efficient concurrent Python code.
"The secret of the GIL is this: it's not about what it prevents, it's about what it enables. It enabled fast single-threaded execution, a simple C API that gave us NumPy and Pandas, and an easier-to-maintain interpreter. The 'limitation' only matters for pure Python CPU-bound code - and there are good solutions for that.
"Most Python code is I/O-bound anyway - web servers, APIs, databases, file processing. For those use cases, threading and async work beautifully. When you do need CPU parallelism, multiprocessing gives you true parallel execution. And as libraries like NumPy and PyTorch show, you can release the GIL in C extensions and get the best of both worlds.
"The GIL shaped Python's ecosystem. Without it, we might not have NumPy, Pandas, or thousands of other C extensions that assume GIL protection. The tradeoff worked out pretty well, wouldn't you say?"
Timothy nodded. "And now with Python 3.13's free-threaded mode and subinterpreters, Python is evolving to give developers more choices while keeping backward compatibility."
"Exactly. The GIL isn't going away, but it's becoming optional. And that's the right approach - let developers choose the tradeoff that fits their use case."
With that knowledge, Timothy could:
- Write properly concurrent Python code
- Choose threading/multiprocessing/async appropriately
- Understand why threading speeds up web scrapers but not number crunching
- Explain the GIL accurately without spreading misconceptions
- Use libraries effectively based on their GIL behavior
- Make informed decisions about Python 3.13's free-threaded mode
The GIL wasn't a mystery anymore - it was a well-understood design decision with clear implications.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Top comments (0)