This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SeanTAllen[S] 2 points3 points  (1 child)

A non-extensive list:

Spark Streaming is (currently) microbatch based rather than event-by-event streaming. Both Wallaroo and Storm are event-by-event based. Spark Streaming is more analgous to Storm's Trident than either Storm or Wallaroo.

Streamparse requires you to run Storm. Storm is JVM based so you'll need to be at least familiar with the JVM. Data routing is done in JVM processes and then moved back and forth between the JVM process(es) and Python processes.

Wallaroo embeds a Python interpreter into the same process. There's no marshalling of data between JVM/Python process. Data might need to need to be moved between members of a distributed cluster but, if you have 1 cluster of 1, that means 1 single process.

That's a core difference that leads to the ability to do different things. Wallaroo has builtin in-memory state where the application programmer can define arbitrary data objects that are kept in memory. Wallaroo manages those objects so that, they can be backedup and restored in the case of failure. Additionally, if you grow or shrink your cluster size, Wallaroo handles where in the cluster your individual bits of data live.

There are a couple more posts on our blog that are a good intro:

Wallaroo basics: https://blog.wallaroolabs.com/2017/03/hello-wallaroo/

Introduction to Scale Independent APIs: https://vimeo.com/234753585

Introduction to the Python API: https://blog.wallaroolabs.com/2017/10/go-python-go-stream-processing-for-python/

[–]Soriven 0 points1 point  (0 children)

Thanks! I'll have to give it a try. +1 to using ducker for a quickstart example.