The problem I kept running into
I often deploy PyTorch models using TorchScript and LibTorch (C++) for inference.
The model itself is not the hard part.
The problem is data processing.
In many real pipelines — especially in quantitative finance, feature engineering, or low-latency systems — you still need to do things like:
- groupby + aggregation
- rolling window operations
- column-wise transformations
In Python, this is trivial with pandas.
But the moment you try to move the whole pipeline into TorchScript or C++ inference, everything breaks.
Why pandas (and friends) don’t work in TorchScript
This is not a pandas problem. It’s a runtime boundary problem.
1. pandas is Python-runtime dependent
TorchScript explicitly removes the Python runtime:
- no Python objects
- no dynamic typing
- no CPython extensions
pandas relies heavily on:
- Python objects
- NumPy internals
- CPython C-API
So torch.jit.script() simply cannot compile pandas code.
2. Polars / Arrow don’t solve this either
You might think:
“What about Polars? It’s fast and written in Rust.”
Still no.
- Polars is not TorchScript-aware
- Arrow execution does not integrate with PyTorch JIT graphs
- You cannot inline Polars logic inside a TorchScript model
They are great libraries — just solving a different problem.
3. Python preprocessing + C++ inference is often unacceptable
The usual workaround is:
Python (pandas) → Tensor → TorchScript → C++
This fails when you need:
- a single deployable artifact
- low-latency inference
- no Python dependency in production
At that point, you either:
- re-implement everything manually in C++
- or give up on pandas-like logic altogether
That’s the gap I kept hitting.
The core idea: pandas-like ops as Torch custom operators
Instead of trying to make pandas work in TorchScript, I flipped the approach:
What if pandas-like operations were implemented directly as Torch custom ops?
That means:
- inputs are
torch::Tensor - logic lives in C++ (LibTorch)
- everything is TorchScript-compatible
- the entire pipeline can be exported and run in C++
This is what xpandas is.
What xpandas is (and what it is not)
What it is
- A small, opinionated subset of pandas-like DataFrame operations
- Implemented as Torch C++ custom operators
- Fully compatible with
torch.jit.script - Designed for inference pipelines, not exploratory analysis
What it is not
- A full pandas replacement
- A dataframe library for interactive data science
- A competitor to Polars or Arrow
Repository:
https://github.com/CVPaul/xpandas
Top comments (0)