Borivoj Grujicic

Posted on Sep 17

Elusion Celebrates 50K+ Downloads: A Modern Alternative to Pandas and Polars for Data Engineering

The Rust data ecosystem has reached another significant milestone with Elusion DataFrame Library surpassing 50,000 downloads on crates.io. As data engineers and analysts, that love SQL syntax, continue seeking alternatives to Pandas and Polars, Elusion has emerged as a compelling option that combines the familiarity of DataFrame operations with unique capabilities that set it apart from the competition.

What Makes Elusion Different

While Pandas and Polars excel in their respective domains, Elusion brings several distinctive features that address gaps in the current data processing landscape:

1. Native Multi-Format File Support Including XML

While Pandas and Polars support common formats like CSV, Excel, Parquet, and JSON, Elusion goes further by offering native XML parsing capabilities. Unlike Pandas and Polars, which require external libraries and manual parsing logic for XML files, Elusion automatically analyzes XML file structure and chooses the optimal processing strategy:

`// XML files work just like any other format

let xml_path = "C:\path\to\sales.xml";

let df = CustomDataFrame::new(xml_path, "xml_data").await?;`

2. Flexible Query Construction Without Strict Ordering

Unlike DataFrame libraries that enforce specific operation sequences, Elusion allows you to build queries in ANY order that makes sense to your logic. Whether you want to filter before selecting, or aggregate before grouping, Elusion ensures consistent results regardless of function call order.

`// Write operations in the order that makes sense to you

sales_df

.filter("amount > 1000")

.join(customers_df, ["s.CustomerKey = c.CustomerKey"], "INNER")

.select(["c.name", "s.amount"])

.agg(["SUM(s.amount) AS total"])

.group_by(["c.region"])

Same result is achieved with different function order:
`sales_df

.join(customers_df, ["s.CustomerKey = c.CustomerKey"], "INNER")

.select(["c.name", "s.amount"])

.agg(["SUM(s.amount) AS total"])

.group_by(["c.region"])

.filter("amount > 1000")`

2. Built-in External Data Source Integration

While Pandas and Polars require additional libraries for cloud storage and database connectivity, Elusion provides native support for:

Azure Blob Storage with SAS token authentication
SharePoint integration for enterprise environments
PostgreSQL and MySQL database connections
REST API data ingestion with customizable headers and pagination
Multi-format file loading from folders with automatic schema merging

3. Advanced Caching Architecture

Elusion offers sophisticated caching capabilities that go beyond what's available in Pandas or Polars:

Native caching for local development and single-instance applications

Redis caching for distributed systems and production environments

Materialized views with TTL management

Query result caching with automatic invalidation

4. Production-Ready Pipeline Scheduling

Unlike Pandas and Polars which focus primarily on data manipulation, Elusion includes a built-in pipeline scheduler for automated data engineering workflows:

`let scheduler = PipelineScheduler::new("5min", || async {

// Your data pipeline logic here

let df = CustomDataFrame::from_azure_with_sas_token(url, token, None, "data").await?;

df.select(["*"]).write_to_parquet("overwrite", "output.parquet", None).await?;

Ok(())

}).await?;`

5. Interactive Dashboard Generation

While Pandas requires additional libraries like Plotly or Matplotlib for visualization, Elusion includes built-in interactive dashboard creation:

Generate HTML reports with interactive plots (TimeSeries, Bar, Pie, Scatter, etc.)

Create paginated, filterable tables with export capabilities

Combine multiple visualizations in customizable layouts

No additional dependencies required

6. Streaming Processing Capabilities

Elusion provides streaming processing options for handling large datasets for better performance while reading and writing data:

`// Stream processing for large files

big_file_df

.select(["column1", "column2"])

.filter("value > threshold")

.elusion_streaming("results").await?;

// Stream writing directly to files

df.elusion_streaming_write("data", "output.parquet", "overwrite").await?;`

7. Advanced JSON Handling

Elusion offers specialized JSON functions for columns with json values, that simplify working with complex nested structures:

Extract values from JSON arrays with pattern matching
Handle multiple JSON formats automatically
Convert REST API responses to JSON files than to DataFrames

`let path = "C:\RUST\Elusion\jsonFile.csv";

let json_df = CustomDataFrame::new(path, "j").await?;

let df_extracted = json_df.json([

"ColumnName.'$Key1' AS column_name_1",

"ColumnName.'$Key2' AS column_name_2",

"ColumnName.'$Key3' AS column_name_3"

])

.select(["some_column1", "some_column2"])

.elusion("json_extract").await?;`

Performance and Memory Management

Elusion is built on Apache Arrow and DataFusion, providing:

Memory-efficient operations through columnar storage
Redis caching for optimized query execution
Automatic schema inference across multiple file formats
Parallel processing capabilities through Rust's concurrency model

`let sales = "C:\RUST\Elusion\SalesData2022.csv";

let products = "C:\RUST\Elusion\Products.csv";

let customers = "C:\RUST\Elusion\Customers.csv";

let sales_df = CustomDataFrame::new(sales, "s").await?;

let customers_df = CustomDataFrame::new(customers, "c").await?;

let products_df = CustomDataFrame::new(products, "p").await?;

// Connect to Redis (requires Redis server running)

let redis_conn = CustomDataFrame::create_redis_cache_connection().await?;

// Use Redis caching for high-performance distributed caching

let redis_cached_result = sales_df

.join_many([

    (customers_df, ["s.CustomerKey = c.CustomerKey"], "RIGHT"),

    (products_df, ["s.ProductKey = p.ProductKey"], "LEFT OUTER"),

])

.select(["c.CustomerKey", "c.FirstName", "c.LastName", "p.ProductName"])

.agg([

    "SUM(s.OrderQuantity) AS total_quantity",

    "AVG(s.OrderQuantity) AS avg_quantity"

])

.group_by(["c.CustomerKey", "c.FirstName", "c.LastName", "p.ProductName"])

.having_many([

    ("total_quantity > 10"),

    ("avg_quantity < 100")

])

.order_by_many([

    ("total_quantity", "ASC"),

    ("p.ProductName", "DESC")

])

.elusion_with_redis_cache(&redis_conn, "sales_join_redis", Some(3600)) // Redis caching with 1-hour TTL

.await?;

redis_cached_result.display().await?;`

Getting Started with Elusion: Easier Than You Think

For SQL Developers

If you write SQL queries, you already have 80% of the skills needed for Elusion. The mental model is identical - you're just expressing the same logical operations in Rust syntax:

`// Your SQL thinking translates directly:

df.select(["customer_name", "order_total"]) // SELECT

.join(customers, ["id = customer_id"], "INNER") // JOIN

.filter("order_total > 1000") // WHERE

.group_by(["customer_name"]) // GROUP BY

.agg(["SUM(order_total) AS total"]) // Aggregation

.order_by(["total"], ["DESC"]) // ORDER BY
`

For Python/Pandas Users

Elusion feels familiar if you're coming from Pandas:

`sales_df

.join_many([

    (customers_df, ["s.CustomerKey = c.CustomerKey"], "INNER"),

    (products_df, ["s.ProductKey = p.ProductKey"], "INNER"),

])

.select(["c.name", "p.category", "s.amount"])

.filter("s.amount > 1000")

.agg(["SUM(s.amount) AS total_revenue"])

.group_by(["c.region", "p.category"]) 

.order_by(["total_revenue"], ["DESC"])

.elusion("quarterly_report")

.await?`

Installation and Setup

Adding Elusion to your Rust project takes just two lines:

`[dependencies]

elusion = "6.2.0"

tokio = { version = "1.45.0", features = ["rt-multi-thread"] }
`

Enable only the features you need to keep dependencies minimal:

Start simple, add features as needed

elusion = { version = "6.2.0", features = ["postgres", "azure"] }

Then, your first Elusion program would look like this:
`use elusion::prelude::*;

[tokio::main]

async fn main() -> ElusionResult<()> {

// Load any file format - CSV, Excel, JSON, XML, Parquet

let df = CustomDataFrame::new("data.csv", "sales").await?;

// Write operations that make sense to you

let result = df

    .select(["customer", "amount"])

    .filter("amount > 100")

    .agg(["SUM(amount) AS total"])

    .group_by(["customer"])

    .elusion("analysis").await?;

result.display().await?;

Ok(())

}
`

Perfect for SQL Developers and Python Users Ready to Embrace Rust

If you know SQL, you already understand most of Elusion's power. The library's approach mirrors SQL's flexibility - you can write operations in the order that makes logical sense to you, just like constructing SQL queries. Consider this familiar pattern:

SQL Query:

`SELECT c.name, SUM(s.amount) as total

FROM sales s

JOIN customers c ON s.customer_id = c.id

WHERE s.amount > 1000

GROUP BY c.name

ORDER BY total DESC;`

Elusion equivalent:

`sales_df

.join(customers_df, ["s.customer_id = c.id"], "INNER")

.select(["c.name"])

.agg(["SUM(s.amount) AS total"])

.filter("s.amount > 1000")

.group_by(["c.name"])

.order_by(["total"], ["DESC"])`

The 50,000 download milestone reflects growing recognition that modern data processing needs tools designed for today's distributed, cloud-native environments. SQL developers and Python users that are discovering that Rust doesn't have to mean starting from scratch - it can mean taking your existing knowledge and supercharging it.

DEV Community

Elusion Celebrates 50K+ Downloads: A Modern Alternative to Pandas and Polars for Data Engineering

Start simple, add features as needed

[tokio::main]

Top comments (0)