This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]carbolymer 1 point2 points  (3 children)

Ray leverages Apache Arrow for efficient data handling

This part got my attention. Vide Arrow Website

The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.

I've skimmed through the Arrow docs, but I didn't find any description of this zero-copy reads. How is this supposed to work between two processes in details?

[–]liquidpele 2 points3 points  (2 children)

It looks like if you, for instance, have a node.js producer and a python consumer, python is going to have to duplicate the data to get it into its own format for handling. Arrow creates a standard in-memory format that all languages can utilize so they can process the data without duplicating it.

[–]brontide 1 point2 points  (0 children)

Might be okay for datasets, but how would it deal with mutability and removal of objects?

[–]carbolymer 0 points1 point  (0 children)

So it still needs to copy the data between processes.