DEV Community

Igor Benav
Igor Benav

Posted on

Yes, Python is Slow, but it doesn’t matter for AI SaaS

Very specific performance comparison between Python, Rust, Go, and C++

Python gets criticized for being slow. Benchmarks show Rust and C++ running circles around it. Go handles thousands more requests per second. The critics aren't wrong about raw performance numbers (ignoring the fact that languages can't actually be slow or fast).

But for most applications, especially AI SaaS, there's a lot of context being left out. The other day a CEO that I know was telling me that he wanted to go with Rust for his startup. What does the startup do? A bunch of OpenAI requests, like most of the new ones.

Software people like to say they're different, but the earlier you learn this, the best: people talk about trends. At first, code should be performant, then it had to be easy to write, then it had to be performant again… this misses the point.

Software engineering is not about writing code, it's about analyzing tradeoffs and making the best possible decisions based on all data, and - like Knuth said - premature optimization is the root of all evil (97% of the times). Let's actually make this decision (Python or a faster language like Rust) like a software engineer should do, analyzing the tradeoffs.

The Structured Process

What even is premature optimization? We could think of it in two ways:

  1. Improving what works well (and, therefore, doesn't need to be improved)
  2. Trying to improve without proper profiling, that is, knowing exactly how good you're doing

The first one is relevant, and you should always take it into account. Time is finite, and the opportunity cost of improving something that works instead of fixing something that doesn't (or shipping new features) is way too high in a startup to be ignored.

Here we'll talk more about the second one though, since I want to show how to actually make a decision using the whole context.

Let's start with profiling, that is, measuring how good we're doing and in what specific metrics we need to improve. For an API, usually this would mean measuring response time, throughput (successful requests per second), latency and some other metrics, but we don't have an actual API yet, we're deciding what to use, so we need to do something about it.

Measuring what we can't measure

We need to use a proxy variable for this, that is, measure something that is not directly what we need to measure, but that we assume (for some good reason) that is closely related to what we actually need, and therefore will give us a good approximate result.

Since our application mostly makes requests to AI services, we can start by assuming that improving this flow will yield the best immediate result, so let's start by understanding what constitutes a process that involves a request to an AI service.

Request Steps

Effort vs performance impact in a very non-scientific and anedoctal way

Thinking in a simplified way, what happens inside our API when someone does something (like generating text) is:

  1. Our API receives an http request (via the network)
  2. Python orchestrates this and what needs to be done
  3. We query the database for something (maybe multiple things even) like the user who is making the request, their permissions, usage
  4. We make a request to OpenAI or any other provider for text generation (and we need to wait for this)
  5. We store results in the database so the user can access it
  6. We return a response to the client (another network request)

Ok, now that we know the steps, we can actually think how long each of these would take in a typical request. By doing this, we can properly understand what optimization would make the most difference (even better if it's for the smallest possible effort).

Let's time each of these steps (approximated) and see where our performance bottleneck is:

  • Our API receives an http request: This is network I/O (Input, Output). Typically 1-20ms depending on the client's connection and geographic distance. Nothing we can do here - the data has to travel through the network.
  • Python orchestrates the request: This is CPU bound (our processor is actually doing something, not just waiting), but minimal. Parsing JSON, validating inputs, basic business logic. Maybe 1-5ms total for typical AI SaaS operations if done correctly.
  • Database queries: Classic I/O bound operation. Even with proper indexing, you're looking at 10-200ms per query depending on query complexity and database location. Multiple queries compound this.
  • OpenAI API call: The big one. Text generation takes 500ms to 3 seconds depending on model and prompt length. Image generation? 10-30 seconds. This is pure waiting time.
  • Store results back: Another database write, 10-100ms typically.
  • Return response: Network I/O again, similar to receiving the request.

Let me be concrete here. In a typical AI request:

  • Total Python execution time: ~5ms
  • Total I/O waiting time: 1000-5000ms (or a lot more depending on the AI generation)

Python execution represents 0.1-0.5% of total request time. Even if we magically made Python infinitely fast, users wouldn't even notice the difference. The language we pick to write our backend isn't even close to being the bottleneck here, so what is?

The Real Bottleneck: I/O vs CPU Operations

CPU and I/O bound operations

The speed problem in AI applications isn't your Python code - it's waiting for everything else. This is the key insight: most AI SaaS applications are I/O bound, not CPU bound.

I/O Bound Operations wait for input/output to complete. Network requests, database queries, file operations. The bottleneck isn't computation speed - it's waiting for data.

CPU Bound Operations are limited by computational power. Heavy math, data processing, complex algorithms. These actually benefit from faster languages.

But here's the thing - if your AI SaaS is making OpenAI requests, you're overwhelmingly I/O bound. The language choice becomes irrelevant when 99% of your time is spent waiting for external services. Even if you manage to make your orchestration code 10x faster, you'd save maybe 4ms out of a 2000ms request. That's 0.2% improvement.

Meanwhile, if you cache database queries properly, use connection pooling, optimize your database queries, you'd be improving what actually makes a difference. Plus, these optimizations are language-agnostic, easily providing 10-100x more impact than just switching to Rust.

Ok, but you might be thinking "even if it doesn't matter if python is slow for the orchestration, it's still handling one request at a time". That leads us to the next thing, async.

Async Changes Everything

Modern Python isn't the blocking, single-threaded Python of 2010.

With async/await and frameworks like FastAPI, Python handles I/O bound workloads brilliantly. While one request waits for OpenAI to respond, your Python server processes hundreds of other requests.

@app.get("/generate")
async def generate_content(prompt: str, db: AsyncSession = Depends(get_db)):
    # While this waits for the database...
    user_data = await db.execute(select(User).where(...))

    # ...and this waits for OpenAI...
    response = await openai_client.chat.completions.create(...)

    # ...Python handles other requests concurrently
    await db.execute(insert(Generation).values(...))

    return response
Enter fullscreen mode Exit fullscreen mode

This concurrency model is perfect for AI SaaS. You're not CPU bound anyway, so async Python easily handles thousands of concurrent requests. Your bottleneck becomes API rate limits, database connections, or memory usage - not language speed.

When Python Speed Actually Matters

Even though your case probably isn't one of these, there are cases where Python's lack of speed is actually a constraint:

  • Running ML models locally: If you're doing inference in Python rather than calling APIs, computation speed matters. But most AI SaaS companies use external APIs (OpenAI, Anthropic, etc.), not local models.
  • Heavy data preprocessing: Transforming massive datasets before feeding them to models. But again, most startups don't have massive datasets initially, and when they do, they use specialized tools like Spark or Dask.
  • Real-time applications: High-frequency trading, gaming servers, real-time video processing. These are CPU bound and benefit from faster languages. But if you're building ChatGPT for lawyers, this isn't you.
  • Mathematical computation: Complex algorithms that don't rely on external services. Signal processing, scientific computing, etc. Again, not typical for AI SaaS startups.

The thing is, these represent edge cases for most AI SaaS applications. If your business model is "call OpenAI API and add some business logic around it" (which describes most AI startups), you're I/O bound by definition.

If your business gets to a point where Python isn't enough, you can even migrate just the parts that need to be performant - most libraries for performance sensitive domains in python are already written in another language like Rust or C++ and just called by python.

What Actually Kills Your Performance

For 99% (heck, even a lot more) of AI SaaS, the performance killers are completely different:

N+1 database queries: Making separate database calls in loops instead of batch operations. I've seen codebases where getting 100 users takes 101 database queries instead of 1. That's not a language problem - that's a classical architecture problem.

# This kills performance - 100 database queries
for user_id in user_ids:
    user = await db.get(User, user_id)

# This is fast - 1 database query
users = await db.execute(select(User).where(User.id.in_(user_ids)))
Enter fullscreen mode Exit fullscreen mode

Missing database indexes: Forcing full table scans on queries that should be instant. Adding an index can turn a 5-second query into a 5-millisecond query.

No caching: Repeatedly calling the same endpoints with the same parameters. Cache that response and serve it instantly next time.

Synchronous operations: Blocking your entire request handler while waiting for I/O. Use async/await properly.

Poor API usage: Making 5 separate API calls when you could batch them into 1.

These problems cause 10x to 100x performance degradation. Fix the architecture problems first. They'll give you way more bang for your buck.

The Real Tradeoffs

Now let's think about this problem like actual software engineers.

Python advantages for AI SaaS:

  • Ship features fast: Python's ecosystem is unmatched for AI and integrations
  • Find developers easily: Way larger talent pool than Rust or Golang
  • Iterate quickly: Write, test, deploy cycles are faster
  • Proven ecosystem: FastAPI, SQLAlchemy, OpenAI SDK - everything just works

Rust advantages:

  • Raw performance: 2-5x faster for CPU-bound operations
  • Memory safety: Harder to write buggy code (once you get past the learning curve)
  • Lower resource usage: Matters if you're processing millions of requests

The real question: What's your bottleneck?

If you're doing heavy computation, languages like Rust might make sense. But if you're making API calls to OpenAI (like most AI startups), your bottleneck is database and network I/O, not CPU performance.

Optimizing Python won't help when 99% of your request time is waiting for external services. And rewriting everything in Rust will take months while your competitors ship new features.

The CEO I mentioned earlier? Ended up picking python and is getting traction with the solution in the hands of actual customers. That's the real cost of premature optimization: opportunity cost.

Resource Usage

Something Python is genuinely inefficient about is resource usage. Python objects have more overhead, and the garbage collector keeps references around longer than manual memory management would.

We can see this from Lovable's example - they went from 200 server instances down to just 10 when they migrated from Python to Go. That's a 20x reduction in infrastructure costs. Their deployment times dropped from 15 minutes to 3 minutes, and average request times improved by 12%.

But here's the key context: Lovable was handling 50+ concurrent HTTP requests per chat request with heavy parallelism. That's not a typical AI SaaS workload - most startups are making sequential API calls to OpenAI, not orchestrating dozens of parallel operations. For most AI SaaS applications, resource efficiency doesn't matter much until you're big. Really big.

Your OpenAI API calls don't care if your web server uses 100MB or 500MB of RAM. Your database doesn't run faster because your API server has lower memory usage. If you're doing 1000 requests per day, the difference between running 1 server instance vs 2 is maybe $50/month - not exactly a startup killer.

What actually constrains resources in typical AI SaaS:

  • API rate limits: Hit OpenAI's rate limit and your users wait regardless of how efficient your memory usage is.
  • Database connections: Running out of connection pool slots will kill performance. This has nothing to do with your programming language.
  • Developer bandwidth: The most expensive resource in a startup. Spending 3 months optimizing memory usage while competitors ship features is the real waste.
  • AI tokens: You're paying for OpenAI/Claude tokens, not server RAM.

The main takeaways from Lovable's migration: they got to almost a $2 billion valuation with Python before having to move. They knew exactly what they needed to optimize and why - specific concurrency patterns that were causing real operational pain. They measured concrete improvements: deployment speed, request performance, infrastructure costs.

That's engineering-driven optimization based on actual constraints, not premature optimization based on theoretical performance concerns.

Most AI SaaS startups will never hit the scale where Python's resource usage becomes the bottleneck. And if you do hit that scale, congratulations - you have a successful business and can afford to hire engineers to optimize the parts that actually matter.

The Bottom Line

Python's slowness and higher resource usage are real. But for AI SaaS applications that spend 99% of their time waiting for external APIs and databases, these limitations don't matter.

The path to success isn't picking the "fastest" language - it's understanding your actual constraints and optimizing what matters. Build your MVP in Python. Measure where the real bottlenecks are. Fix those first. When you hit Python's actual limits (not theoretical ones), then consider alternatives.

The CEO I mentioned? His startup shipped their MVP in Python and is now getting customer traction. Meanwhile, his competitors who chose Rust for "performance reasons" without actually needing it might still be rewriting HTTP handlers.

Software engineering is about tradeoffs, not trends. Make decisions based on your actual constraints, not benchmark blog posts. Your users don't care if your backend runs on Python or Rust. They care if your product solves their problems.


Want to skip months of setup and ship your AI SaaS today? Check out FastroAI - a production-ready FastAPI template with authentication, payments, AI integration, and everything else you need to launch. Stop building infrastructure, start building solutions.

Originally posted at FastroAI.

Top comments (0)