Why is My Multi-Threaded API Still Slow?

#fastapi #python #playwright #asynchronous

I'm facing an issue with my API, and I'm hoping someone can help. Despite adding multi-threading, the performance gains are far from what I expected. Ideally, if one thread takes 1 second to complete a task, then 10 threads running concurrently should also take about 1 second (that's my understanding). However, my API response times are still very slow.

The Problem

I'm using FastAPI along with libraries like Playwright, MongoDB, and ThreadPoolExecutor. The goal was to use threading for CPU-bound tasks and async-await for IO-bound tasks. Still, my response times are not improving as expected.

Book Automation Example

One part of my project involves automating book queries using Playwright to interact with an EPUB viewer. The following function uses Playwright to open a browser, navigate to a book's page, and perform searches:

from playwright.async_api import async_playwright
import asyncio

async def search_with_playwright(search_text: str, book_id: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        book_id = book_id.replace("-1", "")
        book_url = f"http://localhost:8002/book/{book_id}"
        await page.goto(book_url)
        await page.fill("#searchInput", search_text)
        await page.click("#searchButton")
        await page.wait_for_selector("#searchResults")
        search_results = await page.evaluate('''
            () => {
                let results = [];
                document.querySelectorAll("#searchResults ul li").forEach(item => {
                    let excerptElement = item.querySelector("strong:nth-of-type(1)");
                    let cfiElement = item.querySelector("strong:nth-of-type(2)");

                    if (excerptElement && cfiElement) {
                        let excerpt = excerptElement.nextSibling ? excerptElement.nextSibling.nodeValue.trim() : "";
                        let cfi = cfiElement.nextSibling ? cfiElement.nextSibling.nodeValue.trim() : "";
                        results.push({ excerpt, cfi });
                    }
                });
                return results;
            }
        ''')
        await browser.close()
        return search_results

The function above is meant to be async to avoid blocking other tasks. However, even with this async setup, the performance is still not as expected.
Note: I've calculated the time taken to open book and run query a single book is approximately 0.0028s

Refactor Example

I used run_in_executor() to execute functions in ProcessPoolExecutor, trying to avoid the GIL and properly manage workloads.

async def query_mongo(query: str, id: str):
    query_vector = generate_embedding(query)

    results = db[id].aggregate([
        {
            "$vectorSearch": {
                "queryVector": query_vector,
                "path": "embedding",
                "numCandidates": 2100,
                "limit": 50,
                "index": id
            }
        }
    ])

    # Helper function for processing each document
    def process_document(document):
        try:
            chunk = document["chunk"]
            chapter = document["chapter"]
            number = document["chapter_number"]
            book_id = id

            results = asyncio.run(search_with_playwright(chunk, book_id))
            return {
                "content": chunk,
                "chapter": chapter,
                "number": number,
                "results": results,
            }
        except Exception as e:
            print(f"Error processing document: {e}")
            return None

    # Using ThreadPoolExecutor for concurrency
    all_data = []
    with ThreadPoolExecutor() as executor:
        futures = {executor.submit(process_document, doc): doc for doc in results}

        for future in as_completed(futures):
            try:
                result = future.result()
                if result:  # Append result if it's not None
                    all_data.append(result)
            except Exception as e:
                print(f"Error in future processing: {e}")

    return all_data

Question

Even after these changes, my API is still slow. What am I missing? Has anyone faced similar issues with Python's GIL, threading, or async setups? Any advice would be greatly appreciated!

DEV Community

Why is My Multi-Threaded API Still Slow?

The Problem

Book Automation Example

Refactor Example

Question

Top comments (0)