DEV Community

Xianqiu Li
Xianqiu Li

Posted on

Why pandas cannot be used in TorchScript — and how xpandas fixes it

The problem I kept running into

I often deploy PyTorch models using TorchScript and LibTorch (C++) for inference.

The model itself is not the hard part.

The problem is data processing.

In many real pipelines — especially in quantitative finance, feature engineering, or low-latency systems — you still need to do things like:

  • groupby + aggregation
  • rolling window operations
  • column-wise transformations

In Python, this is trivial with pandas.

But the moment you try to move the whole pipeline into TorchScript or C++ inference, everything breaks.


Why pandas (and friends) don’t work in TorchScript

This is not a pandas problem. It’s a runtime boundary problem.

1. pandas is Python-runtime dependent

TorchScript explicitly removes the Python runtime:

  • no Python objects
  • no dynamic typing
  • no CPython extensions

pandas relies heavily on:

  • Python objects
  • NumPy internals
  • CPython C-API

So torch.jit.script() simply cannot compile pandas code.


2. Polars / Arrow don’t solve this either

You might think:

“What about Polars? It’s fast and written in Rust.”

Still no.

  • Polars is not TorchScript-aware
  • Arrow execution does not integrate with PyTorch JIT graphs
  • You cannot inline Polars logic inside a TorchScript model

They are great libraries — just solving a different problem.


3. Python preprocessing + C++ inference is often unacceptable

The usual workaround is:

Python (pandas) → Tensor → TorchScript → C++

This fails when you need:

  • a single deployable artifact
  • low-latency inference
  • no Python dependency in production

At that point, you either:

  • re-implement everything manually in C++
  • or give up on pandas-like logic altogether

That’s the gap I kept hitting.


The core idea: pandas-like ops as Torch custom operators

Instead of trying to make pandas work in TorchScript, I flipped the approach:

What if pandas-like operations were implemented directly as Torch custom ops?

That means:

  • inputs are torch::Tensor
  • logic lives in C++ (LibTorch)
  • everything is TorchScript-compatible
  • the entire pipeline can be exported and run in C++

This is what xpandas is.


What xpandas is (and what it is not)

What it is

  • A small, opinionated subset of pandas-like DataFrame operations
  • Implemented as Torch C++ custom operators
  • Fully compatible with torch.jit.script
  • Designed for inference pipelines, not exploratory analysis

What it is not

  • A full pandas replacement
  • A dataframe library for interactive data science
  • A competitor to Polars or Arrow

Repository:

https://github.com/CVPaul/xpandas

Top comments (0)