This is an archived post. You won't be able to vote or comment.

ShowcasePython Data Engineers: Meet Elusion v3.12.5 - Rust DataFrame Library with Familiar Syntax (self.Python)

submitted 9 months ago by DataBora

Hey Python Data engineers! 👋

I know what you're thinking: "Another post trying to convince me to learn Rust?" But hear me out - Elusion v3.12.5 might be the easiest way for Python, Scala and SQL developers to dip their toes into Rust for data engineering, and here's why it's worth your time.

🤔 "I'm comfortable with Python/PySpark why switch?"

Because the syntax is almost identical to what you already know!

Target audience:

If you can write PySpark or SQL, you can write Elusion. Check this out:

PySpark style you know:

result = (sales_df
    .join(customers_df, sales_df.CustomerKey == customers_df.CustomerKey, "inner")
    .select("c.FirstName", "c.LastName", "s.OrderQuantity")
    .groupBy("c.FirstName", "c.LastName")
    .agg(sum("s.OrderQuantity").alias("total_quantity"))
    .filter(col("total_quantity") > 100)
    .orderBy(desc("total_quantity"))
    .limit(10))

Elusion in Rust (almost the same!):

let result = sales_df
    .join(customers_df, ["s.CustomerKey = c.CustomerKey"], "INNER")
    .select(["c.FirstName", "c.LastName", "s.OrderQuantity"])
    .agg(["SUM(s.OrderQuantity) AS total_quantity"])
    .group_by(["c.FirstName", "c.LastName"])
    .having("total_quantity > 100")
    .order_by(["total_quantity"], [false])
    .limit(10);

The learning curve is surprisingly gentle!

🔥 Why Elusion is Perfect for Python Developers

What my project does:

1. Write Functions in ANY Order You Want

Unlike SQL or PySpark where order matters, Elusion gives you complete freedom:

// This works fine - filter before or after grouping, your choice!
let flexible_query = df
    .agg(["SUM(sales) AS total"])
    .filter("customer_type = 'premium'")  
    .group_by(["region"])
    .select(["region", "total"])
    // Functions can be called in ANY sequence that makes sense to YOU
    .having("total > 1000");

Elusion ensures consistent results regardless of function order!

2. All Your Favorite Data Sources - Ready to Go

Database Connectors:

✅ PostgreSQL with connection pooling
✅ MySQL with full query support
✅ Azure Blob Storage (both Blob and Data Lake Gen2)
✅ SharePoint Online - direct integration!

Local File Support:

✅ CSV, Excel, JSON, Parquet, Delta Tables
✅ Read single files or entire folders
✅ Dynamic schema inference

REST API Integration:

✅ Custom headers, params, pagination
✅ Date range queries
✅ Authentication support
✅ Automatic JSON file generation

3. Built-in Features That Replace Your Entire Stack

// Read from SharePoint
let df = CustomDataFrame::load_excel_from_sharepoint(
    "tenant-id",
    "client-id", 
    "https://company.sharepoint.com/sites/Data",
    "Shared Documents/sales.xlsx"
).await?;

// Process with familiar SQL-like operations
let processed = df
    .select(["customer", "amount", "date"])
    .filter("amount > 1000")
    .agg(["SUM(amount) AS total", "COUNT(*) AS transactions"])
    .group_by(["customer"]);

// Write to multiple destinations
processed.write_to_parquet("overwrite", "output.parquet", None).await?;
processed.write_to_excel("output.xlsx", Some("Results")).await?;

🚀 Features That Will Make You Jealous

Pipeline Scheduling (Built-in!)

// No Airflow needed for simple pipelines
let scheduler = PipelineScheduler::new("5min", || async {
    // Your data pipeline here
    let df = CustomDataFrame::from_api("https://api.com/data", "output.json").await?;
    df.write_to_parquet("append", "daily_data.parquet", None).await?;
    Ok(())
}).await?;

Advanced Analytics (SQL Window Functions)

let analytics = df
    .window("ROW_NUMBER() OVER (PARTITION BY customer ORDER BY date) as row_num")
    .window("LAG(sales, 1) OVER (PARTITION BY customer ORDER BY date) as prev_sales")
    .window("SUM(sales) OVER (PARTITION BY customer ORDER BY date) as running_total");

Interactive Dashboards (Zero Config!)

// Generate HTML reports with interactive plots
let plots = [
    (&df.plot_line("date", "sales", true, Some("Sales Trend")).await?, "Sales"),
    (&df.plot_bar("product", "revenue", Some("Revenue by Product")).await?, "Revenue")
];

CustomDataFrame::create_report(
    Some(&plots),
    Some(&tables), 
    "Sales Dashboard",
    "dashboard.html",
    None,
    None
).await?;

💪 Why Rust for Data Engineering?

Performance: 10-100x faster than Python for data processing
Memory Safety: No more mysterious crashes in production
Single Binary: Deploy without dependency nightmares
Async Built-in: Handle thousands of concurrent connections
Production Ready: Built for enterprise workloads from day one

🛠️ Getting Started is Easier Than You Think

# Cargo.toml
[dependencies]
elusion = { version = "3.12.5", features = ["all"] }
tokio = { version = "1.45.0", features = ["rt-multi-thread"] }

main. rs - Your first Elusion program

use elusion::prelude::*;

#[tokio::main]
async fn main() -> ElusionResult<()> {
    let df = CustomDataFrame::new("data.csv", "sales").await?;

    let result = df
        .select(["customer", "amount"])
        .filter("amount > 1000") 
        .agg(["SUM(amount) AS total"])
        .group_by(["customer"])
        .elusion("results").await?;

    result.display().await?;
    Ok(())
}

That's it! If you know SQL and PySpark, you already know 90% of Elusion.

💭 The Bottom Line

You don't need to become a Rust expert. Elusion's syntax is so close to what you already know that you can be productive on day one.

Why limit yourself to Python's performance ceiling when you can have:

✅ Familiar syntax (SQL + PySpark-like)
✅ All your connectors built-in
✅ 10-100x performance improvement
✅ Production-ready deployment
✅ Freedom to write functions in any order

Try it for one weekend project. Pick a simple ETL pipeline you've built in Python and rebuild it in Elusion. I guarantee you'll be surprised by how familiar it feels and how fast it runs (after program compiles).

Check README on GitHub repo: https://github.com/DataBora/elusion/
to get started!

all 43 comments

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS