Werner Smit

Posted on Apr 10, 2025 • Edited on Apr 11, 2025

Python's Hidden Bottleneck: How Startup Time Could Impact Your DevOps Pipelines

#python #performance #devops

Python is often criticized for being slow, but in most real-world applications, its performance is more than adequate. The language’s flexibility, readability, and rich ecosystem usually outweigh the cost of a few extra milliseconds in execution.

That said, Python’s performance does become a concern in one specific area: startup time.

Python has a noticeable startup overhead, and this can become a real bottleneck in scenarios like:

High-frequency, short-lived scripts — such as those used in DevOps pipelines to query APIs, interact with databases, or perform quick checks.
Lightweight OS-level tasks — like verifying the status of a process or checking for the existence of a file.
Workflows that repeatedly invoke Python scripts in rapid succession — where the cumulative startup time can dwarf the time spent doing actual work.

In these cases, Python’s overhead can add up quickly—making alternative tools or persistent Python processes worth considering.

This article will take a closer look at Python’s startup behavior, explore when it becomes a problem, and discuss practical ways to work around it.

Python Requests vs. curl: A Startup Time Comparison

To demonstrate Python’s startup overhead, I built a simple HTTP server with Flask and wrote a basic client using Python’s requests library. Here’s how it compares to curl in terms of execution time:

Note: see server.py and client-requests.py for the code used to run the server and client.

Python (requests):

$ hyperfine -w 1 'python client-requests.py'
Benchmark 1: python client-requests.py
  Time (mean ± σ):     153.6 ms ±   3.8 ms    [User: 134.6 ms, System: 17.0 ms]
  Range (min … max):   147.1 ms … 162.4 ms    19 runs

curl:

$ hyperfine -w 1 'curl 127.0.0.1:5001/api'
Benchmark 1: curl 127.0.0.1:5001/api
  Time (mean ± σ):       6.2 ms ±   1.0 ms    [User: 2.6 ms, System: 2.9 ms]
  Range (min … max):     4.8 ms …   9.8 ms    353 runs

Result: 153ms vs 6ms.

That is a massive, 25x difference.

Python's Libraries: How Much Do Imports Affect Startup?

While requests is convenient, it seems to have a significant import overhead. Switching to Python's built-in urllib demonstrates how lighter dependencies can improve startup times:

Note: see client-urllib.py for the code used to run the client.

Startup Time Comparison

Method	Mean Time	Relative Speed	Notes
`curl`	6.2 ms	1x (baseline)	Compiled binary, no imports
`python (urllib)`	81.3 ms	~13x slower	Python stdlib, no deps
`python (requests)`	153.6 ms	~25x slower	Heavy dependency chain

Benchmark Details:

$ hyperfine -w 1 'python client-urllib.py'
  Time (mean ± σ):      81.3 ms ±   8.9 ms
  Range (min … max):    71.7 ms … 111.6 ms    41 runs

# requests (from earlier)
$ hyperfine -w 1 'python client-requests.py'
  Time (mean ± σ):     153.6 ms ±   3.8 ms
  Range (min … max):   147.1 ms … 162.4 ms    19 runs

Why Is the Python Script So Slow?

In short, Python is an interpreted language, so running a script doesn’t just execute your HTTP request, it spins up an entire interpreter. Before your code runs, Python performs significant groundwork, which introduces substantial overhead for short-lived tasks. The size of the overhead depends on the number of modules imported and the complexity of the script.

Dependency Bloat Matters:
Heavyweight libraries like requests exacerbate Python’s startup penalty.

Breakdown of Python Startup Time

The table below dissects the time taken at each stage of running the Python script:

Test	Time (ms)
`python -c 'print("hello world")'`	54 ms
`python -c 'from urllib.request import Request'`	64 ms
`python -c 'import requests'`	150 ms
`python client-urllib.py`	77 ms
`python client-requests.py`	153 ms
`requests.get()` (request only)	1.3 ms

Notice how the actual HTTP request (requests.get()) takes just 1.3 ms—the rest is consumed by interpreter startup and module imports, which are unavoidable in standard Python.

Does Pre-Compiling Help?

One might assume pre-compiling the script (e.g., using .pyc files or tools like Cython) would reduce startup time. Unfortunately, this doesn’t address the core issue: the interpreter must still initialize, and imported modules (like requests) must be loaded. Pre-compilation optimizes runtime execution, not startup.

Standard compiler

$ python -m compileall client-requests.py
Compiling 'client-requests.py'...

Nuitka

Nuitka is a Python-to-C++ compiler that can compile Python scripts into standalone executables. It can be used to compile Python scripts into binary executables, which can be run without requiring a Python interpreter. This can potentially improve performance.

Compiling the script:

$ python -m nuitka --standalone --onefile client-requests.py
..
Nuitka: Successfully created 'client-requests.bin'.

Benchmarking the compiled python scripts

$ hyperfine -w 1 'python client-requests.py'
Benchmark 1: python client-requests.py
  Time (mean ± σ):     152.7 ms ±  10.2 ms    [User: 131.3 ms, System: 19.7 ms]
  Range (min … max):   141.7 ms … 170.0 ms    20 runs

$ hyperfine -w 1 'python __pycache__/client-requests.cpython-313.pyc'
Benchmark 1: python __pycache__/client-requests.cpython-313.pyc
  Time (mean ± σ):     160.9 ms ±  17.4 ms    [User: 140.1 ms, System: 18.9 ms]
  Range (min … max):   143.3 ms … 194.1 ms    20 runs

$ hyperfine -w 1 './client-requests.bin'
Benchmark 1: ./client-requests.bin
  Time (mean ± σ):     257.9 ms ±   9.2 ms    [User: 211.9 ms, System: 41.5 ms]
  Range (min … max):   246.9 ms … 279.5 ms    10 runs

Compiled vs. Uncompiled Startup Time Results

In a surprising twist, the Nuitka-compiled script was 68% slower than the original, taking 257ms versus 152ms for pure Python. The compileall approach, as expected, showed no meaningful difference (~160ms), since benchmarking with a warmup phase already leverages cached bytecode (pyc files).

Test	Time (ms)
Pure Python	152 ms
Python + `compileall`	160 ms
Python + Nuitka	257 ms

Why Nuitka Slows Startup:
Nuitka adds overhead by bundling the CPython runtime and dependencies into a binary. While this helps distribution, it increases startup time due to:

Binary unpacking and linking.
Retained CPython interpreter initialization.

Conclusion

Python’s startup penalty is unavoidable for short-lived processes. While acceptable for one-off tasks (e.g., starting a web server), it becomes problematic when:

Called repeatedly in loops (e.g., processing each file in a directory separately)
Used for trivial operations (e.g., single API calls, checking if a file exists)
Run as frequent cron jobs (where milliseconds add up)
Used in shell pipelines (where Python is called for each input line)

Alternatives

Use lighter dependencies:
- Replace requests with urllib or httpx (if startup is critical).
- Avoid large frameworks (e.g., pandas, tensorflow) for trivial tasks.
Reduce Invocations:
- Batch work (e.g., process 100 files in one script call instead of 100 calls).
Use Shell Tools:
- Replace Python with curl/jq for HTTP or grep/awk for text processing.
Switch to Compiled Languages:
- For sub-10ms startup, use Go (simple) or Rust (performance-critical).

Key Insight: Python trades startup speed for runtime flexibility. Choose the right tool for the job.

References

System information

 OS: Arch Linux
 Kernel: x86_64 Linux 6.12.20-1-lts
 Uptime: 4d 18h 26m
 Shell: zsh 5.9
 CPU: Intel Core i5-8350U @ 8x 3.6GHz [59.0°C]
 GPU: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
 RAM: 5986MiB / 15734MiB

Python version

$ python -V
Python 3.13.2

Measurement and setup

Here's how the request.get was measured:

python -m timeit -n 10 -r 3 -s "import requests" "requests.get('http://127.0.0.1:5001/api').json()"
10 loops, best of 3: 1.37 msec per loop

server.py

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/api', methods=['GET'])
def handle_data():
    if request.method == 'GET':
        return jsonify({"message": "GET request received", "data": [1, 2, 3]})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5001, debug=False)

Run the server using:

python server.py

client-requests.py

import requests

# Base URL of the server
BASE_URL = 'http://localhost:5001/api'

def get_data():
    response = requests.get(BASE_URL)
    print("GET Response:", response.json())

if __name__ == '__main__':
    get_data()

Run the client using:

python client-requests.py
GET Response: {'data': [1, 2, 3], 'message': 'GET request received'}

client-urllib.py

from urllib.request import Request, urlopen
import json

BASE_URL = 'http://localhost:5001/api'

def get_data():
    req = Request(BASE_URL)
    with urlopen(req) as response:
        data = json.loads(response.read().decode('utf-8'))
        print("GET Response:", data)

if __name__ == '__main__':
    get_data()