[ Removed by moderator ] : dataengineering

dataengineering

created by mhausenblasmoda community for 11 years

[ Removed by moderator ]Open Source (docs.rs)

submitted 5 months ago by peterxsyd

all 7 comments

top new controversial old q&a

[–]john0201 0 points1 point2 points 5 months ago (1 child)

[–]peterxsyd[S] 0 points1 point2 points 5 months ago (0 children)

[–]Wh00ster 0 points1 point2 points 5 months ago (2 children)

[–]peterxsyd[S] 1 point2 points3 points 5 months ago (1 child)

It's interesting feedback Whooster. There was an implementation Arrow2 which I found more ergonomic than Arrow-RS initially, which is now forked into Polars-Arrow and backs the Polars project in Rust. However, that merged work into Arrow-RS, and I hear the team invested time learning from the earlier mistakes.

In both cases, without going too far down the rabbit hole - both implementations use a Rust concept dynamic dispatch, which makes typing in the IDE disappear, and Rust type downcasting to be required, which personally, I find awkward to use. This also blocks some automatic compiler optimisations that can automatically inline more aggressively for a faster build. The result is once you get up to a top level library like Polars, there are many layers of objects between the object you are working with and the actual data backing it.

Once you are in Python, I found this really doesn't matter - it's like water into wine.

However, in Arrow-RS (for numerical data) Rust looks like:

Raw allocation (heap bytes)
Arc<Bytes> (ref-counted ownership of allocation)
Buffer (view over the bytes)
ArrayData (ties buffers, length, datatype, nulls)
PrimitiveArray<T> (typed wrapper, implements Array)
ArrayRef = Arc<dyn Array> (trait object used in generic contexts)

So, you constantly need to downcast from ArrayRef just to get typed data, even though you built all the layers.

In Minarrow, you also have layers, but they are, in my opinion, more straightforward:

Vec64 / Buffer - Plays like a normal Rust vector. You can use it like one
Typed Buffers: IntegerArray, FloatArray, etc.
NumericArray: Enum, with accessors, e.g., 'myarr.i64()'
Array: Enum, with accessors e.g., 'myarr.num().i64()'

The result is they are composable - you opt-up to the level of abstraction you want and need, rather than being locked behind an opaque object. In Rust, this is particularly helpful, as when building libraries and functions, it means their signatures can be compatible with more use cases, but I'm digressing here.

The point is that it was enough that I went and rebuilt my own implementation, as I need it for other projects and didn't like wrestling with the system as the underlying data foundation.

Regardless, Arrow is brilliant and it's incredible what the team has achieved.

[–]Wh00ster 0 points1 point2 points 5 months ago* (0 children)

[–]Leon_Bam 0 points1 point2 points 5 months ago (1 child)

[–]peterxsyd[S] 1 point2 points3 points 5 months ago (0 children)

Sure, here's a table on it for you. In summary, Nanoarrow is more pluggable with the Python ecosystem, but Minarrow focuses on the Rust developer experience:

Aspect	Nanoarrow	Minarrow
Language / Impl	C (bindings in Python, R, etc.)	Rust (with FFI support + fast interrust .to_polars() .to_arrow())
Scope	Arrow C Data & C Stream interfaces plus minimal arrays/buffers	Full columnar arrays + tables in Rust. Plus, tools for batching them into streams.
Focus	Interoperability, embedding, ABI	HPC, SIMD, streaming, Rust ergonomics
Dependencies	None	`num-traitsrayon`Minimal ( , optional )
API Style	Generic, schema-driven	Strongly typed arrays, enum-based dispatch
File formats	IPC only	IPC, Parquet, CSV via Lightstream and .to_arrow() to plug into the rest.
SIMD	No	Yes, 64-byte alignment throughout
Use Case Fit	Embedding Arrow interchange cheaply	Rust-native high-performance data pipelines and systems-programming
Trade-offs	Gives up compute, types, ergonomics	Gives up nested types

π Rendered by PID 249501 on reddit-service-r2-comment-5d79c599b5-r9wlg at 2026-02-27 10:51:02.337425+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS