Lalit Mishra

Posted on Dec 15, 2025

The Async Core: Understanding Eventlet and Gevent in Flask-SocketIO

#websocket #python #asynchronous #fastapi

The integration of real-time capabilities into Python web applications often presents an architectural paradox. Flask, a framework designed fundamentally around the synchronous WSGI (Web Server Gateway Interface) standard, assumes a simple lifecycle: a request arrives, a thread processes it, and a response is returned. This model collapses when introduced to WebSockets, which require persistent, stateful connections that may remain open for hours.

To bridge the gap between Flask’s synchronous nature and the asynchronous demands of WebSockets, Flask-SocketIO relies on Greenlets, typically provided by libraries like Eventlet or Gevent. These libraries allow a standard Flask application to handle thousands of concurrent connections on a single operating system thread.

This article dissects the internal mechanics of this transformation. We will explore how greenlets implement cooperative multitasking through stack slicing, the "black magic" of monkey patching, and the engineering trade-offs between Eventlet and Gevent in production environments.

The Blocking Problem

To understand the necessity of Eventlet and Gevent, one must first analyze why standard threading fails at scale. In a traditional WSGI deployment (e.g., Gunicorn with the sync or gthread worker), concurrency is mapped 1:1 with OS-level threads or processes.

If a Flask-SocketIO server were to use standard OS threads to manage WebSocket connections, it would encounter two primary bottlenecks:

Memory Overhead: A standard Linux thread typically reserves a stack size of 8MB. While virtual memory management mitigates the immediate physical cost, the commit charge and kernel structures (Thread Control Blocks) still impose a heavy footprint. Spawning 10,000 threads for 10,000 idle WebSocket clients would theoretically require ~80GB of addressable memory space, leading to resource exhaustion long before CPU limits are reached.
Context Switching Latency: The OS kernel scheduler manages thread execution using preemptive multitasking. As the number of threads rises, the scheduler spends an increasing percentage of CPU cycles simply deciding which thread to run next (context switching). This "thrashing" degrades throughput significantly.

Furthermore, Python’s Global Interpreter Lock (GIL) ensures that only one thread executes Python bytecode at a time. While I/O operations (like waiting for a socket message) release the GIL, the overhead of managing thousands of threads remains prohibitive.

Greenlets Explained

Eventlet and Gevent solve the blocking problem by implementing coroutines (cooperative user-space threads) via the greenlet C-extension library. Unlike OS threads, greenlets are managed entirely in user space without kernel intervention.

The Mechanism: Stack Slicing

The technical brilliance of greenlets lies in how they manage the call stack. The CPython interpreter uses the standard C stack for function calls. To pause a function in the middle of execution (which is necessary when a function blocks on I/O), the state of the stack must be preserved.
When a greenlet switches context (yields):

Stack Slicing: The library copies the current greenlet's portion of the C stack from the CPU's stack pointer into a buffer on the heap.
Stack Restoration: It copies the target greenlet's saved stack from the heap back onto the C stack.
Instruction Pointer Update: It updates the instruction pointer to resume execution where the target greenlet left off.

This "trampoline" mechanism allows Python to pause execution deep inside nested function calls—even across C-extension boundaries—without the C stack growing indefinitely.

Efficiency

Because greenlets share the same OS thread and process memory, a context switch involves only a memcpy operation (copying memory) rather than a system call. This reduces the context switch time from microseconds (OS threads) to nanoseconds. Additionally, a greenlet's initial stack size is miniscule (often just a few kilobytes), allowing a single process to host tens of thousands of concurrent greenlets.

Monkey Patching: The "Magic" Integration

Standard Python libraries (like socket and time) are blocking. If you call time.sleep(10) or socket.recv() in a standard Flask route, the entire OS thread freezes. Since Eventlet/Gevent run on a single OS thread, one blocking call would halt the entire server, freezing all 10,000 connected clients.
To prevent this, these libraries utilize Monkey Patching.

How It Works

Monkey patching dynamically modifies the standard library at runtime. When you execute eventlet.monkey_patch() or gevent.monkey.patch_all(), the library modifies sys.modules. It replaces the standard socket class with a "green" socket class and threading.Thread with a greenlet-based equivalent.

The Execution Flow of a "Green" Socket:

Intercept: User code calls socket.recv(). Because of monkey patching, this invokes the Gevent/Eventlet version, not the OS version.
Register: The green socket registers a callback with the Hub (the central event loop). This watcher tells the Hub: "Wake me up when file descriptor X has data to read."
Yield: The green socket calls greenlet.switch(), pausing the current request's execution and yielding control to the Hub.
Wait: The Hub uses a high-performance, non-blocking polling mechanism (typically epoll on Linux or kqueue on macOS) to check for I/O events across all file descriptors.
Resume: When data arrives on the socket, the Hub sees the event, triggers the callback, and switches execution back to the original greenlet.

To the Flask developer, the code looks synchronous (data = sock.recv(1024)). Under the hood, the execution is asynchronous and non-blocking.

The Risks of Monkey Patching

While powerful, monkey patching introduces significant engineering risks:

C-Extension Incompatibility: Libraries written in C that bypass the Python socket API (e.g., certain database drivers or old gRPC versions) perform direct OS system calls. These cannot be patched. If such a library blocks, it blocks the entire loop.
Order of Operations: Patching must occur before any other modules import socket or threading. Late patching can result in a "split brain" scenario where some parts of the app use green sockets and others use blocking OS sockets, leading to deadlocks.

Choosing Your Fighter: Eventlet vs. Gevent vs. Threading

When configuring Flask-SocketIO, you must choose an async_mode.

Threading

Concurrency Model: Standard OS Threads.
Pros: Maximum compatibility. No monkey patching required. Works with all third-party libraries.
Cons: Poor scalability. Capable of handling hundreds of clients, but fails at thousands due to memory and context switching overhead.
Use Case: Development, debugging, or low-traffic internal tools.

Eventlet

Concurrency Model: Greenlets.
Architecture: Historically the default for Flask-SocketIO. It uses a pure Python hub (mostly) wrapping epoll.
Status (2024/2025): Deprecated. The Eventlet project is currently in maintenance mode ("life support"). New feature development has stalled, and compatibility with newer Python versions (3.10+) has historically lagged.
Performance: High, but generally slightly slower than Gevent in raw throughput benchmarks.
Use Case: Legacy applications. New projects should avoid Eventlet.

Gevent

Concurrency Model: Greenlets.
Architecture: Built on top of libev (a highly optimized C library) and Cython.
Status: Active. Gevent remains well-maintained and robust.
Performance: Very High. The C-based hub and loop provide superior performance and lower latency compared to Eventlet.
Use Case: The recommended choice for production Flask-SocketIO deployments requiring high concurrency.

Conceptual Benchmark Comparison:
Under a workload of 5,000 concurrent WebSocket connections sending "heartbeat" messages:

Threading: Likely crashes or becomes unresponsive due to thread exhaustion.
Eventlet: Handles the load but with higher CPU usage due to Python-side loop overhead.
Gevent: Handles the load with the lowest CPU/Memory footprint due to libev efficiency.

Code Example: Minimal Gevent Setup

Given the deprecation status of Eventlet, the following example demonstrates a production-ready Gevent setup. Note the critical placement of the monkey patch.

# standard_library_patch.py
from gevent import monkey
# CRITICAL: Must be called before importing Flask, SocketIO, or any other library
# that imports socket, ssl, threading, or time.
monkey.patch_all()

from flask import Flask, render_template
from flask_socketio import SocketIO, emit

app = Flask(__name__)
app.config = 'secret!'

# Initialize SocketIO with gevent as the async_mode
# message_queue is required for horizontal scaling (e.g., Redis)
socketio = SocketIO(app, async_mode='gevent', message_queue='redis://localhost:6379')

@socketio.on('connect')
def handle_connect():
    print('Client connected')

@socketio.on('message')
def handle_message(data):
    # This looks synchronous, but 'emit' yields to the Hub
    # allowing other clients to be processed while waiting for network I/O
    print('received message: ' + str(data))
    emit('response', {'data': 'Message received'})

# For local development / running directly
if __name__ == '__main__':
    # socketio.run wraps the application in a gevent WSGI server
    socketio.run(app, host='0.0.0.0', port=5000)`

Running in Production (Gunicorn):
Do not use python app.py. Use Gunicorn with the specific Gevent worker class to ensure the environment is correctly set up.

gunicorn -k gevent -w 1 module:app

Note: We use -w 1 (one worker) because a single Gevent worker can handle thousands of connections. To use multiple cores, you increase workers, but you must use a message queue (Redis) to coordinate between them.

Conclusion

Flask-SocketIO achieves asynchronous real-time communication by fundamentally altering the execution model of Python via greenlets. By swapping heavy OS threads for lightweight, user-space coroutines, libraries like Gevent allow synchronous Flask code to scale to thousands of concurrent connections.

For the architect, the decision tree is clear:

Development: Use threading for simplicity and debugger compatibility.
Legacy Production: Continue using Eventlet if already integrated, but plan a migration.
New Production: Use Gevent. It offers the stability of libev and superior performance.
Greenfield Projects: If the project is purely real-time and does not require Flask's ecosystem, consider FastAPI or Quart. These frameworks use Python's native asyncio, eliminating the need for monkey patching and its associated risks.

Understanding the "trampoline" nature of the Hub and the invasiveness of monkey patching is essential for debugging hangs and performance issues. When a Flask-SocketIO server stalls, it is almost always because a blocking call slipped past the monkey patch, halting the event loop and pausing the universe for every connected client.

DEV Community