Elusion v8.0.0 is the best END-TO-END Data Engineering library writen in RUST

#dataengineering #rust #python

Elusion v8.0.0 just dropped with something I'm genuinely excited about: native SQL execution and CopyData feature.

Functional API still going strong:

Write queries however you want - Unlike SQL, PySpark, or Polars, you can chain operations in ANY order. No more "wait, does filter go before group_by or after?" Just write what makes sense:

use elusion::prelude::*;
#[tokio::main]
async fn main() -> ElusionResult<()> {
    let sales = CustomDataFrame::new("sales.csv", "sales").await?;

    let result = sales
        .select(["customer_id", "amount", "order_date"])
        .filter("amount > 1000")
        .agg(["SUM(amount) AS total", "COUNT(*) AS orders"])
        .group_by(["customer_id"])
        .having("total > 50000")
        .order_by(["total"], ["DESC"])
        .limit(10)
        .elusion("top_customers")
        .await?;

    result.display().await?;
    Ok(())
}

Raw SQL when you need it - Sometimes you just want to write SQL. Now you can:

There is small macro sql! to simplify usage and avpid using &[&df] for each Dataframe included in query.

use elusion::prelude::*;
#[tokio::main]
async fn main() -> ElusionResult<()> {
    let sales = CustomDataFrame::new("sales.csv", "sales").await?;
    let customers = CustomDataFrame::new("customers.csv", "customers").await?;
    let products = CustomDataFrame::new("products.csv", "products").await?;

    let result = sql!(
        r#"
        WITH monthly_totals AS (
            SELECT 
                DATE_TRUNC('month', s.order_date) as month,
                c.region,
                p.category,
                SUM(s.amount) as total
            FROM sales s
            JOIN customers c ON s.customer_id = c.id
            JOIN products p ON s.product_id = p.id
            GROUP BY month, c.region, p.category
        )
        SELECT 
            month,
            region,
            category,
            total,
            SUM(total) OVER (
                PARTITION BY region, category 
                ORDER BY month
            ) as running_total
        FROM monthly_totals
        ORDER BY month DESC, total DESC
        LIMIT 100
        "#,
        "monthly_analysis",
        sales,
        customers,
        products
    ).await?;

    result.display().await?;
    Ok(())
}

COPY DATA:
Now you can read and write between files in true streaming fashion:

You can do it in 2 ways: 1. Custom Configuration, 2. Simplified file conversion

// Custom Configuration
copy_data(  
    CopySource::File {
        path: "C:\\Borivoj\\RUST\\Elusion\\bigdata\\test.json",
        csv_delimiter: None,
    },
    CopyDestination::File {  
        path: "C:\\Borivoj\\RUST\\Elusion\\CopyData\\test.csv",
    },
    Some(CopyConfig {
            batch_size: 500_000, 
            compression: None,
            csv_delimiter: Some(b','), 
            infer_schema: true,  
            output_format: OutputFormat::Csv,
    }),
).await?;

// Simplified file conversion
copy_file_to_parquet(
    "input.json",
    "output.parquet",
    Some(ParquetCompression::Uncompressed), // or Snappy
).await?;

If you hear for Elusion for the first time bellow are some core features:

🏢 Microsoft Fabric - OneLake connectivity

☁️ AzureAzure BLOB storage connectivity

📁 SharePoint connectivity

📡 FTP/FTPS connectivity

📊 Excel file operations

🐘 PostgreSQL database connectivity

🐬 MySQLMySQL database connectivity

🌐 HTTP API integration

📈 Dashboard Data visualization

⚡ CopyData High-performance streaming operations

Built-in formats: CSV, JSON, Parquet, Delta Lake, XML, EXCEL

Plus:

Redis caching + in-memory query cache

Pipeline scheduling with tokio-cron-scheduler

Materialized views

To learn more about the crate, visit: https://github.com/DataBora/elusion

DEV Community

Elusion v8.0.0 is the best END-TO-END Data Engineering library writen in RUST

Top comments (0)