Timothy stared at his laptop screen in the library's back office, his brow furrowed. He'd spent the last hour trying to speed up a data processing script using Python's threading module. The numbers didn't make sense.
"Margaret?" he called out. "Can you look at this?"
Margaret walked over from the circulation desk. "What's up?"
Timothy turned his laptop to show her. "I wrote a script to process library catalog data. I thought using multiple threads would make it faster, but..." He pointed at the timing results. "The multi-threaded version is actually slower."
She leaned in to see his code:
import threading
import time
def count_to_million():
count = 0
for i in range(1_000_000):
count += 1
return count
# Single-threaded version
start = time.time()
count_to_million()
count_to_million()
single_time = time.time() - start
# Multi-threaded version
start = time.time()
thread1 = threading.Thread(target=count_to_million)
thread2 = threading.Thread(target=count_to_million)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
multi_time = time.time() - start
print(f"Single-threaded: {single_time:.3f}s")
print(f"Multi-threaded: {multi_time:.3f}s")
The output showed:
Single-threaded: 0.143s
Multi-threaded: 0.156s
"The threaded version is slower," Timothy said. "I have two CPU cores. Two threads. Two separate counting tasks. It should be twice as fast, right? What am I doing wrong?"
Margaret smiled knowingly. "You're not doing anything wrong. You've just discovered the GIL - the Global Interpreter Lock. Come with me to the Reference Section. I'll show you what's happening."
The Single Key to the Rare Books Room
They walked through the stacks toward the Reference Section. Margaret gestured at the rare books collection behind a locked glass door.
"See that locked room?" Margaret asked.
Timothy nodded. "The rare books collection. Only one person can access it at a time."
"Exactly. And why's that?"
"Because we only have one key," Timothy said. "And the books are too valuable to have multiple people handling them simultaneously."
"Right. Now imagine we hired ten new librarians," Margaret continued, "and we told them all to catalog rare books at the same time. What would happen?"
Timothy thought for a moment. "They'd all want the key at once. But there's only one key, so they'd have to take turns. Even with ten workers, only one could actually work at any given time."
"Precisely!" Margaret pulled out her laptop. "That's exactly what's happening in your Python code. The GIL is Python's single key to executing code."
She opened a Python interpreter. "Watch this. I'm going to show you what Python does internally."
# Conceptually, every Python thread does this:
def execute_thread():
acquire_gil() # Grab the key
# Execute some Python bytecode
release_gil() # Give back the key
# Wait while another thread uses the key
"Even if you create 100 threads," Margaret explained, "only one thread can execute Python code at any given moment. The others must wait their turn, just like our librarians waiting for the rare books key."
"So my two threads are just... taking turns?" Timothy asked.
"Exactly. They're not running simultaneously. They're passing the key back and forth."
Why Python Has Only One Key
Timothy frowned. "But that seems insane. Why would Python deliberately make threading slower?"
"Great question," Margaret said. She led him over to the library's old card catalog. "Let me show you the problem Python was trying to solve."
She pulled out one of the drawers and selected a card: "Moby Dick" - Times borrowed: 47
"Every time someone borrows this book, we update this number," Margaret explained. "Now imagine two librarians both checking out copies of Moby Dick at exactly the same time."
She held up her hands like she was acting out two people:
"Librarian 1 reads: 47. Adds one. Writes 48.
Librarian 2 reads: 47. Adds one. Writes 48."
Timothy's eyes widened. "They both read 47, so they both wrote 48. The count only went up by one instead of two!"
"Exactly!" Margaret opened her laptop. "Python has the same problem with memory management. It's called reference counting."
She typed:
x = [] # Reference count: 1
y = x # Reference count: 2
z = x # Reference count: 3
del y # Reference count: 2
"Every Python object tracks how many references point to it. When the count hits zero, Python immediately frees the memory. Simple and efficient."
"But if two threads try to update the same reference count at the same time..." Timothy said slowly.
"Same problem as our card catalog!" Margaret sketched it out:
# Thread 1 # Thread 2
current = obj.refcount current = obj.refcount
obj.refcount = current + 1 obj.refcount = current + 1
# Both read 2, both write 3
# Actual references: 4
# Python thinks: 3
"The object now has four references but thinks it only has three," Margaret explained. "When one reference disappears, Python frees the memory... while other references are still using it."
"Crash," Timothy said quietly.
"Crash, corruption, chaos," Margaret agreed. "So Python's creators had a choice. They could put a lock on every single object—millions of lock operations per second, huge overhead, deadlock risks everywhere. Or they could put one big lock around the entire interpreter."
"And they chose one big lock," Timothy said.
"They chose simplicity and speed for the common case. This was the early 1990s. Multi-core processors were rare. Threading was mostly for I/O, not CPU parallelism. For those use cases, it was the right choice."
Seeing the Problem More Clearly
Timothy was still processing this. "So threading is just... useless for CPU work in Python?"
"Let me show you something more dramatic," Margaret said. She typed a new example:
import threading
import time
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
def compute():
result = fibonacci(35)
return result
# Test 1: Single thread
start = time.time()
compute()
compute()
single_time = time.time() - start
print(f"Single-threaded: {single_time:.2f}s")
# Test 2: Two threads
start = time.time()
t1 = threading.Thread(target=compute)
t2 = threading.Thread(target=compute)
t1.start()
t2.start()
t1.join()
t2.join()
multi_time = time.time() - start
print(f"Multi-threaded: {multi_time:.2f}s")
print(f"Speedup: {single_time / multi_time:.2f}x")
She ran it. The output appeared:
Single-threaded: 3.45s
Multi-threaded: 3.52s
Speedup: 0.98x
"No speedup at all," Timothy observed. "Actually slightly slower."
"The slight slowdown is from managing the threads. The GIL forces them to run one at a time, taking turns holding the lock."
Timothy leaned back in his chair. "So I can't use threading to speed up Python computation at all?"
"Not pure Python computation, no," Margaret said. "But here's where it gets interesting. The GIL only prevents parallel execution of Python bytecode."
When Threading Actually Works
Margaret pulled up a chair. "Let me show you when threading does help."
She typed a new example:
import threading
import requests
import time
def fetch_url(url):
response = requests.get(url)
return len(response.content)
urls = [
"https://python.org",
"https://github.com",
"https://stackoverflow.com",
"https://pypi.org",
"https://docs.python.org",
]
# Single-threaded
start = time.time()
for url in urls:
fetch_url(url)
single_time = time.time() - start
print(f"Single-threaded: {single_time:.2f}s")
# Multi-threaded
start = time.time()
threads = []
for url in urls:
t = threading.Thread(target=fetch_url, args=(url,))
t.start()
threads.append(t)
for t in threads:
t.join()
multi_time = time.time() - start
print(f"Multi-threaded: {multi_time:.2f}s")
print(f"Speedup: {single_time / multi_time:.1f}x")
Timothy watched her run it:
Single-threaded: 2.8s
Multi-threaded: 0.6s
Speedup: 4.7x
His jaw dropped. "4.7x faster! How?"
"Because while one thread waits for a network response," Margaret explained, "it releases the GIL. Other threads can run. During I/O operations, threads truly run concurrently."
She drew a quick diagram on a piece of paper:
Thread 1: [Running Python] → [Waiting for network - GIL released] → [Running Python]
Thread 2: [Running Python while T1 waits]
Thread 3: [Running Python while T1 waits]
"It's like our librarians again," Timothy said, understanding dawning. "If a librarian needs to wait for a book delivery from storage, they give back the key. Another librarian can work in the rare books room while they wait."
"Perfect!" Margaret beamed. "That's exactly it. I/O operations release the GIL automatically. Network requests, file reads, database queries—all of these release the lock while they wait."
The CPU-Bound vs I/O-Bound Distinction
Margaret opened a blank document. "Let me show you the critical distinction."
She created a simple table:
# - CPU-Bound Tasks (GIL hurts):
# - Pure Python computation
# - Math and number crunching
# - String processing in loops
# - Data manipulation
# - Algorithm implementations
def process_data(items):
total = 0
for item in items:
total += item ** 2
return total
# Threading: NO benefit
# I/O-Bound Tasks (GIL doesn't matter):
# - Network requests
# - File reading/writing
# - Database queries
# - User input
# - API calls
def fetch_data(url):
response = requests.get(url)
return response.json()
# Threading: HUGE benefit
"When your code is waiting," Margaret explained, "the GIL gets released. When your code is computing, the GIL stays locked."
Timothy typed something on his own laptop. "What about libraries like NumPy? I've heard they can use multiple cores."
"Excellent question!" Margaret said. She typed:
import numpy as np
import threading
import time
def matrix_multiply():
a = np.random.rand(1000, 1000)
b = np.random.rand(1000, 1000)
c = np.dot(a, b)
return c
# Single-threaded
start = time.time()
matrix_multiply()
matrix_multiply()
single_time = time.time() - start
# Multi-threaded
start = time.time()
t1 = threading.Thread(target=matrix_multiply)
t2 = threading.Thread(target=matrix_multiply)
t1.start()
t2.start()
t1.join()
t2.join()
multi_time = time.time() - start
print(f"Speedup: {single_time / multi_time:.1f}x")
She ran it:
Speedup: 1.9x
"Nearly 2x faster!" Timothy exclaimed.
"Because NumPy is written in C," Margaret explained. "The C code explicitly releases the GIL while it's doing matrix operations. Pandas, Pillow, many scientific libraries do this. They drop the key while they work, allowing other threads to run."
"So some libraries know how to work around it," Timothy said.
"They don't work around it—they work with it. They release the lock when they don't need it."
Why the GIL Still Exists
They walked back toward the circulation desk. Timothy was still processing everything.
"But this is 2025," he said. "Why hasn't anyone removed the GIL? It seems like such a limitation."
Margaret leaned against the desk. "Several reasons, actually. First, single-threaded Python is fast because of the GIL. No lock overhead on every operation. Most Python programs are single-threaded or I/O-bound anyway."
"So removing it would make the common case slower?" Timothy asked.
"Potentially, yes. Second, it makes writing C extensions much simpler. Extension authors can assume they're the only thread running. The entire Python C API is built around the GIL existing. Removing it would break thousands of extensions."
"Backward compatibility," Timothy nodded.
"Third, for most real-world programs, I/O is the bottleneck, not CPU. Your web server is waiting on database queries. Your data pipeline is waiting on API responses. Your script is waiting for file reads. The GIL doesn't matter for those."
Margaret pulled out her phone and showed him something. "And fourth, for CPU-intensive work, Python has other solutions. That's what we'll cover in Part 2."
Understanding the Pattern
Timothy opened his laptop again. "So let me see if I understand. I should use threading when..."
He started typing test cases:
# Test 1: CPU-bound task
def cpu_task():
count = 0
for i in range(10_000_000):
count += i
return count
# Test 2: I/O-bound task
def io_task():
response = requests.get("https://httpbin.org/delay/1")
return response.status_code
"Run them both with and without threading," Margaret suggested. "See what happens."
Timothy wrote the test harness:
def test_threading(task, description):
# Single execution
start = time.time()
task()
task()
single_time = time.time() - start
# Multi-threaded execution
start = time.time()
t1 = threading.Thread(target=task)
t2 = threading.Thread(target=task)
t1.start()
t2.start()
t1.join()
t2.join()
multi_time = time.time() - start
print(f"\n{description}")
print(f" Single: {single_time:.2f}s")
print(f" Multi: {multi_time:.2f}s")
print(f" Speedup: {single_time / multi_time:.2f}x")
test_threading(cpu_task, "CPU-bound task")
test_threading(io_task, "I/O-bound task")
He ran it and watched the results:
CPU-bound task
Single: 1.23s
Multi: 1.26s
Speedup: 0.98x
I/O-bound task
Single: 4.05s
Multi: 2.03s
Speedup: 2.00x
"Perfect!" Margaret said. "The CPU task gets no speedup—actually slightly slower from thread overhead. The I/O task gets 2x speedup because both threads can wait simultaneously."
Timothy looked up from his laptop. "So the rule is: threading for I/O, something else for CPU?"
"Exactly. And that 'something else' is what we'll cover next time." Margaret glanced at the clock on the wall. "We've got about an hour before the evening rush. Want to grab coffee and I'll explain multiprocessing and async/await?"
Timothy grabbed his laptop. "Definitely. But first—" he pointed at his original code, "now I know why this was slow. I was doing pure computation."
"And threading won't help with that," Margaret confirmed. "The GIL makes sure of it."
"The GIL isn't a bug," Timothy said slowly. "It's a design choice."
"A design choice with clear trade-offs," Margaret agreed. "Once you understand those trade-offs, you can choose the right tool for each job."
The Takeaway
Timothy closed his laptop, the GIL finally making sense.
The GIL is a single lock on the Python interpreter: Only one thread executes Python bytecode at a time.
It exists to protect reference counting: Python's memory management isn't thread-safe without it.
Fine-grained locking would be slower: Locking every object would hurt single-threaded performance.
Threading doesn't help CPU-bound Python code: Pure Python computation runs one thread at a time.
Threading does help I/O-bound code: The GIL releases during network, disk, and database operations.
I/O operations automatically release the GIL: While waiting, other threads can run.
C extensions can release the GIL: NumPy, Pandas, and similar libraries achieve parallelism this way.
The GIL makes single-threaded code fast: No lock overhead for the common case.
Most real programs are I/O-bound: Waiting for network/disk is usually the bottleneck, not CPU.
The rare books room analogy: One key means one worker at a time, even with many workers.
Card catalog race condition: Two simultaneous updates can corrupt the count.
Reference counting needs protection: Without the GIL, memory corruption occurs.
Module-level vs call-time imports parallel: Module imports happen once; call-time accesses happen when needed.
Passing the key back and forth: Threads take turns executing, creating overhead without benefit.
Check your task type first: Know if you're CPU-bound or I/O-bound before choosing concurrency.
The GIL persists for backward compatibility: Removing it would break the C API and existing extensions.
Libraries that know about the GIL work with it: They release it when doing non-Python work.
Python 2.x to 3.x kept the GIL: Even major version changes didn't remove it.
Understanding trade-offs enables good choices: The GIL isn't a limitation once you know the alternatives.
Understanding the Lock
Timothy had discovered Python's most famous constraint and why it exists.
The GIL revealed that Python trades parallel execution for memory safety, that protecting reference counts required either many locks or one lock, and that Guido chose simplicity and single-threaded speed over parallelism.
He learned that the GIL only prevents parallel Python bytecode execution, that I/O operations automatically release the lock, and that C extensions can explicitly release it during computation, which explained why NumPy achieves parallelism despite the GIL.
Moreover, Timothy understood that threading still helps for I/O-bound tasks because waiting doesn't require holding the lock, that CPU-bound pure Python code gets no benefit from threading because only one thread can execute at a time, and that the shocking slowdown in his original code came from thread management overhead without any parallel execution benefit.
He learned about the reference counting problem that necessitated the GIL, why concurrent updates to reference counts cause memory corruption, and why fine-grained locking on every object would be slower than a single interpreter lock for most programs.
He understood that the GIL persists for pragmatic reasons including single-threaded performance, C extension simplicity, and the enormous backward compatibility cost of removal, and that most Python programs are I/O-bound anyway so the GIL rarely matters in production.
Most importantly, Timothy understood that the GIL isn't a flaw to work around—it's a design choice with clear trade-offs, that knowing whether code is CPU-bound or I/O-bound determines whether threading helps, and that choosing the right concurrency model starts with understanding this fundamental constraint.
The library was quiet as they walked toward the coffee shop. Margaret had one more thing to show him—how to actually achieve true parallelism in Python, even with the GIL in place.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of Think Like a Genius.
Top comments (0)