Diptanu Gon Choudhury for Tensorlake

Posted on Feb 3 • Originally published at tensorlake.ai

The End of Database-Backed Workflow Engines: Building GraphRAG on Object Storage

#ai #rag #programming

GraphRAG sounds elegant in theory: build a knowledge graph from your documents, traverse it intelligently, and get better answers than vanilla RAG.

Then you look at the compute requirements.

To build a GraphRAG system, you need to: parse documents, chunk text, generate embeddings for every chunk, extract concepts from every chunk, compute pairwise similarities, build graph edges, and store everything in a queryable format. For a single 100-page PDF, that’s thousands of API calls, millions of similarity computations, and hours of processing.

Now imagine doing this for 10,000 documents. Or 100,000

What GraphRAG Actually Needs from Infrastructure

The algorithm is straightforward: chunk, embed, extract concepts, build edges, traverse. The infrastructure requirements are not.

Parallel execution

‍Documents are independent. Processing them sequentially wastes time. You need a system that can spin up workers on demand and distribute work across them.

Heterogeneous compute

PDF parsing needs memory. Embedding generation is I/O-bound waiting on API calls. Concept extraction needs CPU for NLP models. Running all of these on the same machine means over-provisioning for the hungriest step.

Durable execution

A 10-hour ingestion job will fail somewhere. Network timeout. Rate limit. OOM. When step 3 fails, it needs to read step 2’s output from somewhere durable. Without checkpointing, you start over from zero.

Job orchestration

You need something that spins up workers, tracks dependencies, retries failures, aggregates partial results, and decides whether to proceed or abort.

The DIY Stack

Building this yourself means assembling:

Kubernetes

‍Kubernetes is used for container orchestration. But Kubernetes doesn’t know anything about your jobs. It manages containers, not computations. It won’t schedule your tasks, track dependencies, or handle fan-out.

Celery + Redis

‍Celery and Redis are typically used for task queuing. Note: queuing, not parallel execution. Celery distributes tasks to workers, but it is fundamentally a message broker with worker processes attached. It doesn’t understand data locality, can’t optimize task placement, and treats every task as independent. When you need real parallelism, map-reduce over ten thousand chunks, aggregating partial results, handling stragglers, Celery gets you partway there. For the rest, you end up writing glue code or reaching for Spark.

Spark

‍Spark is brought in for actual parallel compute. Now you are running a third system. Spark knows how to partition data, schedule parallel tasks, and aggregate results. But Spark wants to own the entire pipeline. Mixing Spark jobs with Celery tasks means shuffling data between systems, managing two job lifecycles, and debugging failures that span both.

Postgres

‍Postgres is used for job metadata and durability. This is the state that workflow engines like Airflow and Temporal manage except now you are building it yourself.

The glue code

‍You have a container orchestrator that doesn’t understand jobs, a task queue that doesn’t understand parallelism, and a compute engine that doesn’t integrate cleanly with either. You end up writing hundreds of lines to bridge these systems, and every bridge is a place where failures hide.

And this assumes you get it right the first time. You won't.

Kubernetes was built for orchestrating long-running microservices, not bursty batch jobs. The Cluster Autoscaler checks for unschedulable pods every 10 seconds, then provisions nodes that take 30-60 seconds to come online. For a GraphRAG pipeline that needs to fan out to 500 workers immediately, that's minutes of latency before work even starts. The autoscaler prioritizes stability over speed a reasonable tradeoff for web services, but painful for batch processing.

This is why most GraphRAG implementations stay as notebooks. The infrastructure tax is too high.

A Different Approach: Object-Store-Native Compute

For the past two years, we've been quietly building a new serverless compute stack for AI workloads at Tensorlake.

It powers our Document Ingestion API, which processes millions of documents every month across a heterogeneous fleet of machines fully distributed, fully managed. Document processing was our testbed: OCR, layout detection, table extraction, entity recognition. Every document touches multiple models, multiple machines, multiple failure modes. If the infrastructure couldn't handle that, it couldn't handle anything.

But the compute stack itself is general purpose. It replaces the entire Kubernetes + Celery + Spark + Postgres stack with a single abstraction:

Write your workflow as if it runs on a single machine. In production, it gets transparently distributed across CPUs and GPUs, and scales to whatever the workload demands.

No queues to configure. No job schedulers to manage. No Spark clusters to provision. No glue code bridging systems that weren't designed to work together.

The key insight: use S3 as the backbone for durable execution instead of databases. AI workloads deal in unstructured data—documents, images, embeddings, model outputs. This data already lives in object storage. By building the execution engine around S3 rather than Postgres or Cassandra, we eliminated an entire class of serialization problems and made checkpointing nearly free.

GraphRAG on Tensorlake

Each step runs as an isolated function with its own compute requirements.

Step-level functions

The magic is in .map(). Fan out to thousands of workers with one line:

Execution Flow

When a function fails, Tensorlake doesn't re-execute successful steps - it reads the checkpointed output from S3 and continues. If the pipeline dies at chunk 847, the retry resumes from the last checkpoint, not from zero.

This isn't a batch job you run manually, it's a live HTTP endpoint. Deploy once, and it's available on-demand whenever someone wants to add a document to the knowledge graph:

No documents in the queue? The system scales to zero. A thousand PDFs arrive at once? Tensorlake spins up workers to handle them in parallel. You're not paying for idle clusters or babysitting Spark jobs. The infrastructure responds to the workload, not the other way around.

The Results

Try It

git clone https://github.com/tensorlakeai/examples/tree/main/graph-rag-pipeline

cd graph-rag-pipeline

tensorlake secrets set OPENAI_API_KEY <your-key>
tensorlake secrets set NEO4J_URI neo4j+s://xxx.databases.neo4j.io
tensorlake secrets set NEO4J_PASSWORD <password>

tensorlake deploy app.py

For a small proof of concept, a notebook is fine. For production GraphRAG with retries, scale, and real users, you need infrastructure that doesn’t become the bottleneck.

Built with Tensorlake and Neo4j. See the GraphRAG paper for the original algorithm.

Diptanu Gon Choudhury

/diptanu_gonchoudhury_23e

Top comments (1)

PEACEBINFLOW • Feb 4

This post hits a nerve because it’s saying out loud what a lot of people quietly discover the hard way: GraphRAG isn’t hard because of the algorithm, it’s hard because of everything wrapped around it.

What really resonated for me is the framing that most GraphRAG pain is self-inflicted by forcing batch-style, document-heavy AI workloads into infrastructure that was never meant for them. Kubernetes + Celery + Spark + Postgres is a perfectly rational stack on paper, but in practice you end up spending more time building a workflow engine than building the graph. I’ve seen this exact pattern play out: orchestration glue slowly becomes the product, retries turn into a bespoke state machine, and suddenly you’re debugging infra instead of knowledge quality.

The object-storage-as-the-spine idea feels like the quiet unlock here. Once you stop pretending unstructured AI artifacts belong in a relational job table and treat S3 as the source of truth for durability, a lot of complexity just… evaporates. Checkpointing becomes natural. Fan-out stops being scary. Failure recovery stops feeling like a special case. That’s a big mental shift, but it lines up with how these workloads actually behave.

I also appreciate that this isn’t framed as “Kubernetes bad” but more “Kubernetes optimized for a different era.” Microservices want stability; GraphRAG wants bursty, elastic, disposable compute. Those goals fight each other. Trying to bend K8s into a batch brain is how you end up with 12–16 weeks of setup and a monthly ops tax that nobody budgeted for.

One subtle but important point you’re making: notebooks aren’t a phase, they’re a symptom. People stay in notebooks not because they love them, but because productionizing GraphRAG with traditional stacks is disproportionately painful. Lowering that infra tax is what actually unlocks experimentation at scale.

The only question I keep circling back to is vendor gravity. Replacing four systems with one abstraction is appealing, but it raises the classic “how portable is my mental model?” concern. That said, the alternative today is already accidental lock-in to a pile of bespoke glue code no one else understands, which might honestly be worse.

Overall, this feels less like “the end of database-backed workflow engines” and more like a reminder that AI pipelines want first-class dataflow semantics, not retrofitted job schedulers. If GraphRAG is going to move beyond demos and notebooks, this kind of rethink feels inevitable.