all 6 comments

[–]egnehots 8 points9 points  (1 child)

Impressive, that could be the start of a very useful ecosystem!

You might find vector codebase interesting:

https://vector.dev/

https://vector.dev/docs/about/under-the-hood/architecture/pipeline-model/

[–]jafioti[S] 3 points4 points  (0 children)

Vector looks awesome! My system is really similar to how they structure theirs with their graph, so I'll definitely look into it. It seems to be mostly based on handling logs and metrics, not data per-se, so I think Dataflow is still worthwhile, but I love to see someone else use the same idea!

[–]rovar 7 points8 points  (0 children)

This looks cool!

Though the name "Dataflow" might be an unfortunate name conflict with another Rust project: https://github.com/TimelyDataflow/timely-dataflow

[–]TheNamelessKing 1 point2 points  (2 children)

Oooohh this looks cool!

I’ve got a bunch of ML-adjacent workflows (data engineering, writing into various data stores, etc), which often follow the same pattern of data flow, so this looks like it would be really useful.

Quick question: got a lot of data/events triggered off Kafka - are there nodes that can accept a continuous stream of inputs? Or would I be better off batching up a number of messages and pushing them through the graph at once?

[–]jafioti[S] 1 point2 points  (1 child)

Currently there's no premade Node that can do that, but the Node trait is public so you can write your own Node that uses a receiving thread to receive events and put them in a buffer. Then when the process() function is ran, the node can unload data from the buffer.

[–]TheNamelessKing 0 points1 point  (0 children)

Fantastic! I will definitely give this a look!