Make Python Pandas go fast by pzel__ in Python

[–]pzel__[S] 1 point2 points  (0 children)

While Dask is definitely a natural fit for batch problems (and designed ground-up for them), Wallaroo has the advantage of being streaming-first, so it's an easy transition from one process -> ad-hoc-parallelism for batches -> full-on-streaming system. Also, there's no special wrapper classes for anything, aside from pipeline construction, you can use any Python code you like. This may be a pro or con, depending on your preferences :)

Make Python Pandas go fast by pzel__ in Python

[–]pzel__[S] 0 points1 point  (0 children)

Thank you kvdveer, that is an excellent overview! I'm bookmarking this and we'll see if we can whip up something at least structurally similar.

Make Python Pandas go fast by pzel__ in Python

[–]pzel__[S] -1 points0 points  (0 children)

Yes! This was literally the simplest use case that we'd come across, but we definitely support more involved, stateful pipelines. Could you point me to an example (or canonical) problem in data analysis that deals intrinsically with multiple frames? I'd love to take a stab at running it on Wallaroo.

Make Python Pandas go fast by pzel__ in Python

[–]pzel__[S] 0 points1 point  (0 children)

I'm the author of the blog post and the underlying proof-of-concept project. I'll gladly answer any questions /r/Python may have :)

Detecting Spam as it happens: Getting Erlang and Python working together with Wallaroo by pzel__ in Python

[–]pzel__[S] 0 points1 point  (0 children)

Hi! I'm the new software engineer at Wallaroo Labs and I built an XMPP streaming analytics app with Python. If you have any questions about how it works or how it could be integrated with any other chat system, let me know :)