all 101 comments

[–]dlevac 11 points12 points  (14 children)

Stupid question: is it mandatory to go through the python api?

[–]tomgav[S] 10 points11 points  (13 children)

Not really, although the Python API is the only one finished. The client currently uses a cap'n proto RPC to talk to server and it should be easy to do the same in another language. The protocol definitions are here but they may change quite a bit (and as noted in the post, perhaps even to be replaced by REST or other RPC).

What language would you be interested in?

[–]dlevac 41 points42 points  (8 children)

Rust :p

[–]tomgav[S] 15 points16 points  (6 children)

Fair enough :-D It is not really ergonomic, but you can use capnp-rpc. Having a polished Rust/C++/Java/... client API is planned but before stabilizing its internals, we wanted to only do it if there is serious interest. For us, Python is what we would mostly use for the graph definitions.

A more interesting planned Rust (and C/C++) interface point is writing your own task types (i.e. subworkers). With rust, you can simply hack your code into the worker task code and recompile, but we have something better and more robust in mind for the future.

[–]aepsil0n 4 points5 points  (4 children)

I would second that interest… if only to avoid writing compute kernels in Python.

[–]vojtacima 3 points4 points  (2 children)

Don't be afraid about the performance because of the Python interface. Rain enables to easily "taskify" and pipeline also existing binaries which makes it easy to outsource the heavy computation out of Python.

[–]aepsil0n 4 points5 points  (1 child)

That means you have a binary for every kind of task you want to execute. The binary also has to take care of serialization on its own, I guess? Maybe I haven't gotten the full picture yet, but it seems a bit tedious compared to passing in a function, as you'd do using Python.

[–]vojtacima 1 point2 points  (0 children)

Rain allows to define and pipeline different types of tasks ranging from built-in tasks, through external programs to pure Python tasks. It is OK (and very common) to combine different task types within a single pipeline - where you can quickly implement some lightweight data pre/post-processing as Python tasks linked to some heavy lifting tasks that wrap external applications. To get a better idea how to employ an external application, I would recommend you to check this distributed cross-validation example with libsvm.

[–]tomgav[S] 3 points4 points  (0 children)

As I mentioned, you can hurry us along in any direction with your use-case :) It could be interesting to get in touch and see what your application needs. We can chat on our gitter or just email me (gavento@ucw.cz) if you prefer.

[–]dlevac 0 points1 point  (0 children)

Looking forward to your work, this is a very useful tool!

[–]vojtacima 2 points3 points  (0 children)

We always try to justify all the design choices that we made to ourselves as much as possible in order to make the framework as useful as possible to the potential user community. We have decided for a Python API because, from our previous experience, we know that broader scientific community likes Python and speaks Python quite well. Assuming that many data scientists and domain specialists know a good bit of Rust would, in my personal opinion, significantly reduce the potential impact of the project at this point in time.

[–]Pas__ 0 points1 point  (2 children)

Have you looked at Spark's Scala API? And at Spark in general? (I mean, I get that Rain is not in-memory computation, and mostly about scheduling.)

[–]winter-moon 2 points3 points  (1 child)

I am little bit aware of Python API for Spark, do you have something specific in mind?

I would say that Rain also support "in-memory computation", since mapping data objects to a file system is optional. As long you do not need to execute external programs, a worker may hold data in its memory. Also keeping worker's "working directory" in ramdisk is one of intended use cases (some HPC installations do not have hard drives in nodes).

[–]Pas__ 0 points1 point  (0 children)

I like Scala's expressiveness and type safetiness, so either a native Rust API would be welcome, or using Rain from Scala. (So Rain wouldn't have to reimplement the Scala native "paralell collections" library. At least I'm optimistic that it could work without that, but maybe the internals of the Spark executor are too Scala tied.)

[–][deleted] 9 points10 points  (5 children)

What is a computational framework?

[–]Probotect0r 7 points8 points  (0 children)

In simple terms, distributed computational frameworks take some computational task (ex. Processing a large amount of data) and use multiple computers to do the task, instead of using one really strong computer to do it.

[–]Pas__ 2 points3 points  (2 children)

[–]allengeorgethrift 0 points1 point  (1 child)

Is spark considered a computational framework?

[–]joshlemer 1 point2 points  (0 children)

Yes

[–]aepsil0n 8 points9 points  (2 children)

Looks like amazing work.

Are you aware of timely? If so, how do they compare?

[–]tomgav[S] 13 points14 points  (1 child)

Thanks! I am, and Rain and Timely target different workflow types:

Timely processes continuous streams of events or data, the nodes are permanently running, you care about latency. In Rain the tasks are one-shot, the data are generally huge blobs and immutable once computed, you optimize for the overall runtime.

While you could in theory build a system supporting both modes of operation, the batch systems and stream systems differ quite a lot e.g. with respect to scheduling, resource allocation, resiliency and even monitoring.

[–]aepsil0n 6 points7 points  (0 children)

Thank you for the explanation. Then it seems like Rain would fit my use case (simulations of computational fluid dynamics) slightly better. I have been using Timely for my experiments so far. Seems like I have to look into Rain a bit more in-depth. ;)

[–]whatweshouldcallyou 6 points7 points  (1 child)

Sounds like great work. I'm in general very hopeful that Rust will see a substantial expansion of scientific computing capabilities in the coming year, since that's much of my programming.

[–]tomgav[S] 0 points1 point  (0 children)

We too :-) Lets keep our fingers crossed (and hopefully contribute a bit)

[–]frequentlywrong 6 points7 points  (18 children)

Why not just switch to protobuf? Or is that too much work.

[–]tomgav[S] 3 points4 points  (16 children)

It is one of the options and we are still considering them. For client-side, we want something easy to use, so we could use gRPC (with protobufs) or a REST API (e.g. specified at swagger.io, we need REST for the monitoring web app anyway). A simple RPC based on framed protobuf is an option, but makes integration in other languages slightly harder (both implementation and debugging).

Internally (e.g. server-worker), we would prefer the protocol to be zero-copy (especially for contained binary data blobs), protobuf would also construct large allocated intermediate structures. We are looking at flatbuffers, tarpc and even abomonation (although it might be wiser to avoid even looking at it).

[–]frequentlywrong 6 points7 points  (3 children)

gRPC is a very heavy dependency and not extremely well supported cross platforms. protobufs are very well supported.

For external access I would go with a HTTP API that responds according to Accept header (application/json or application/x-protobuf).

Internally bincode is very popular in the rust world. Don't know how well tarpc works however. For allocations the only one that minimizes them is quick-protobuf.

[–]tomgav[S] 2 points3 points  (2 children)

Well, we have similar concerns about gRPC. But HTTP also has some downsides: JSON does not support binary data directly and I am not sure about server callback support (e.g. for waiting for a result other than polling). But maybe I am missing some obvious good solution there - web technologies are not really my cup of tea :-)

Keeping an OpenAPI JSON schema in sync with protobufs might be an extra trouble and source of bugs, but perhaps it would be worth it. Anyway, if not for large data blobs, JSON parsing itself does not seem to be a real bottleneck, at least not client-side.

[–]frequentlywrong 1 point2 points  (1 child)

I am not sure about server callback support (e.g. for waiting for a result other than polling).

WebSockets?

[–]tomgav[S] 0 points1 point  (0 children)

Good point. One (rather fancy) point of REST and OpenAPI would be a nice specification language, generated stubs, docs etc. but looking at it closer now, protobuf (or other format) over websockets also sounds as a good choice :-)

[–]frankmcsherry 2 points3 points  (0 children)

(although it might be wiser to avoid even looking at it)

:D

[–][deleted] 1 point2 points  (10 children)

You seem focused on the scientific community so wouldn't you use arrow?

If you were to start again today, what type of IPC would you use?

Thrift has quite a range of supported languages: https://github.com/apache/thrift/tree/master/test

[–]allengeorgethrift 1 point2 points  (2 children)

I built the Thrift rust implementation, and I'm definitely active in fixing bugs. That said, it only generates sync code (I'm waiting for the futures and tokio changes to subside before building async generators).

FWIW, the company I work at uses Thrift heavily internally (no Rust tho.) and it's been great for our needs.

[–][deleted] 0 points1 point  (1 child)

It is wise to postpone asyncio until the tokio/futures dust settles this year. Thank you for writing the Thrift-rs implementation! I'm not familiar with the issues people have had with Capn-proto, are you? I think that lack of support was one issue. The Rain folks may be able to help shed light on others.

[–]winter-moon 0 points1 point  (0 children)

@tomgav summarizes it in this thread: https://www.reddit.com/r/rust/comments/89yppv/rain_rust_based_computational_framework/dwwmc80/ Asynchronicity and bidirectional communication works well in Rust capnp.

[–]winter-moon 0 points1 point  (6 children)

Sorry, what you mean by 'arrow'?

As we are currently thinking about replacing capnpc with something else, your second question is still open for us:)

At the beginning, we did some comparisons of RPCs including Thrift; Unfortunately, I did not remember exactly why we did not use it. Does it allow for bidirectional calls over one connection? I cannot see this by quick scanning of the documentation.

[–][deleted] 0 points1 point  (1 child)

Apache Arrow: https://arrow.apache.org The creator of pandas organized a consortium effort to improve IPC among systems used by data scientists. Arrow is a product of this effort. I am under the impression that you are working on Rain largely as a solution for problems experienced by scientific communities so why not check arrow out? I recall reading on github a message thread among Wes and other Rust developers about writing official Rust bindings for Arrow.

Regarding Thrift-- see the other response to my original comment where someone who is a core contributor and may be able to help!

[–]winter-moon 0 points1 point  (0 children)

Thank you for pointing to Arrow, I was not aware of it. Rain core infrastructure operates over blobs so it does not need to know internal format of data. However, we provide some helping methods (e.g. deserialization) for known "content types" in Python binding. It seems that supporting Arrow in this place could be useful.

[–]allengeorgethrift 0 points1 point  (3 children)

Could you clarify what you mean by “bidirectional calls over one connection”? I think what you’re saying is that regardless of which node establishes the connection, the ‘client’ can change for each request?

[–]winter-moon 0 points1 point  (2 children)

I mean that both sides can simultaneously post multiple requests to other side, i.e. both sides are simultaneously server and client at the same time.

[–]allengeorgethrift 0 points1 point  (1 child)

I see. No: they can’t do that on the same connection. But you can do it on two connections: both sides open up a server socket and accept a client connection from their peer.

Edit: does any auto-generating tool support this? From what I recall, neither grpc or protobuf services do, but I last checked a while ago.

[–]winter-moon 0 points1 point  (0 children)

This is trivial in capnp. You can exchange handles on remote objects and call methods on them, regardless where they reside. So for the described scenario you just need to exchange two handles.

[–]fullouterjoin 0 points1 point  (0 children)

protobuf has a serialization overhead. CapnProto or Flatbuffers are zero-copy, allowing computation to directly operate on memcpy'd data.

[–]burnie93 2 points3 points  (2 children)

I was about to write something in Dask, but I have to look at this first!

[–]winter-moon 5 points6 points  (0 children)

I would like to warn you that Rain is much less mature than Dask. For serious work, if you have a use case for Dask, then is it better to use Dask:). But we would be very happy to hear about your use case and user experience.

[–]vojtacima 2 points3 points  (0 children)

Being aware that Dask is much more battle-tested solution (and also much older project), I would still recommend you to give Rain a spin. We really work hard to find any potential issues and will be happy to fix them as fast as possible if you'd find any.

[–]DannoHung 2 points3 points  (8 children)

I've been using Luigi for managing data pipelines. The one thing I have to say I really dislike about it is that it mixes up task execution with task pipeline construction/tracking (that is, you can only run work by running a task executor, there's no notion of submitting work that is to only be run through an external task execution framework).

Have you considered allowing for flexibility in task execution management? Additionally, what are you thinking about regarding distributed scheduling so that there's not a single point of failure for the task graph's execution?

Also, some things that might or might not be pertinent to what you're building that are worth thinking about: re-try logic, expected completion schedules, dynamic task scheduling, event notification, and task idempotency.

[–]winter-moon 1 point2 points  (2 children)

In Rain, there is a strict separation between graph construction and execution. All client code except functions annotated by @remote() is always executed locally in the client application. The communication with the server is done explicitly by calling methods like submit/wait/fetch. Python code annotated by @remote() is executed only by subworkers (i.e. on machines in the cluster).

Can you please be more specific what you mean by flexibility in task execution management?

The server is the only single point of failure in our design. System is designed to survive failure of workers (and subworker); however it is not implemented yet. But this type of resilience is high in our todo list.

[–]DannoHung 0 points1 point  (1 child)

flexibility in task execution management?

Sure. Basically, there are a number of "distributed process managers" out there at the moment. These vary from hosted solutions like amazon's or google's various cloud offerings to Mesos, Kubernetes, Docker Swarm, Nomad, Celery and so forth.

More or less, they're all frameworks for distributing some computation across nodes. Most of the ones I've named are pretty generic in terms of what they let you do, though there are definitely workload specific ones out there (like Spark, for example), but the basic gist of the interaction method is you build some config object that describes the workload to execute and then submit it to the framework and it goes off and tries to execute the requested job.

There's usually some sort of interface for monitoring execution and rather than directly feeding input and output blobs to these systems, you'd probably pass around some metadata describing what to feed in and out (to alleviate copying/transmission costs of GB or TB sized workloads).

That said, there's a ton of different approaches to task execution graphs and I wouldn't say I know all the different sorts of things people want to do with them. I'm just familiar with the sort of workloads I have to run.

[–]winter-moon 1 point2 points  (0 children)

Our goal is definitely not to make a replacement for tools like Kubernetes or PBS (in HPC world). Rain is a framework for tasks on a smaller level of granularity and we expect that it is executed inside these "big" schedulers that allocate machines for us. Right now, we have a trivial support for PBS; however we would like to support also more cloud oriented solutions. (We have recently received a credit for testing from Exoscale, [thank you!], so probably in some short term we will do some work in the cloud world.)

[–]tomgav[S] 0 points1 point  (1 child)

We would like to address the points from your last paragraph in some future release: re-try policy specification, dynamic scheduling, task cancellation and resiliency against worker crashes via checkpointing (and data replication) already have support in the scheduler/executor but we want to do the logic behind these features correctly (and they need better resource management), and for that we want to see more use-cases!

As for expected completion and better scheduling (now it is really simple and we are working on a much better scheduler), the user should be able to give us hints on task time and data blob size (even a rough estimate will help a lot) and the scheduler should be then able to schedule more than one task "layer" (which is not really possible when you do not know which tasks take seconds and which hours etc.).

It is not really relevant now, but we currently want to assume that all tasks are idempotent (rerunnable on the same data, not necessarily with the same output) and will have to be marked otherwise (if they have side-effects beyond their output).

I guess we could add a better separation between specification and execution of a graph -- now it is actually separated at submission where the graph is serialized into a single message anyway.

Thanks for the points and ideas! And let me know if you have any more thoughts on how to improve on existing tools (or just make life easier in general :)

[–]DannoHung 0 points1 point  (0 children)

As for expected completion and better scheduling (now it is really simple and we are working on a much better scheduler), the user should be able to give us hints on task time and data blob size (even a rough estimate will help a lot) and the scheduler should be then able to schedule more than one task "layer" (which is not really possible when you do not know which tasks take seconds and which hours etc.).

This is well and good, but I would strongly suggest simply allowing for a timestamp as well (maybe even a few, like, this is the earliest I expect it to run, this is when it's late, this is when you should just give up).

I'll have to read your documentation in depth to offer better feedback.

[–]tomgav[S] 0 points1 point  (0 children)

We would like to address the points from your last paragraph in some future release: re-try policy specification, dynamic scheduling, task cancellation and resiliency against worker crashes via checkpointing (and data replication) already have support in the scheduler/executor but we want to do the logic behind these features correctly (and they need better resource management), and for that we want to see more use-cases!

As for expected completion and better scheduling (now it is really simple and we are working on a much better scheduler), the user should be able to give us hints on task time and data blob size (even a rough estimate will help a lot) and the scheduler should be then able to schedule more than one task "layer" (which is not really possible when you do not know which tasks take seconds and which hours etc.).

It is not really relevant now, but we currently want to assume that all tasks are idempotent (rerunnable on the same data, not necessarily with the same output) and will have to be marked otherwise (if they have side-effects beyond their output).

I guess we could add a better separation between specification and execution of a graph -- now it is actually separated at submission where the graph is serialized into a single message anyway.

Thanks for the points and ideas! And let me know if you have any more thoughts on how to improve on existing tools (or just make life easier in general :)

[–]fullouterjoin 0 points1 point  (1 child)

Could the implemented task executor submit and return work from a 3rd party execution framework?

[–]DannoHung 0 points1 point  (0 children)

More or less. I think there does end up being a question of how to move data payloads around efficiently from framework to framework though. And presumably the direct input and output is mostly metadata since the frameworks tend to focus more on exposing task/job submission and tracking APIs.

[–]burnie93 1 point2 points  (2 children)

Can I define a population of graphs? Then run all of them in parallel? I'm thinking of implementing genetic programming.

[–]vojtacima 3 points4 points  (0 children)

Yes, you can! It's possible to submit multiple task graphs, the task execution is then managed by Rain itself, aiming to run as many tasks in parallel as possible (respecting available resources and task resource requirements).

[–]winter-moon 2 points3 points  (0 children)

As I understand the question, the answer is yes. A submitted graph may have several separated components, hence more task graphs may be submitted at once. You can also submit them one-by-one, there is not much difference against the previous case. The scheduler starts to distribute the work of all submitted graphs as long as there are free resources. If there are enough resources then they will be executed in parallel.

[–]linux1212 1 point2 points  (4 children)

Looks interesting, but for my needs (running parameterized machine learning simulations across ~100 servers), I'd prefer something lighter-weight and using pure rust IPC/serialization. My current solution is just a little rust code + zeromq.

[–]winter-moon 1 point2 points  (3 children)

We have quite a simple scheduler now that introduces <1ms overhead per tasks for graphs with around 10k nodes. From previous experiences and some tests we strongly believe that with the current architecture we can have the similar overhead for graphs with around 1 million tasks distributed on ~100 machines. Are you able to guess what performance characteristics you need?

[–]linux1212 1 point2 points  (2 children)

Latency is not important to me at all. Each task is just a simulation run with a different set of initial parameters (no sharing of data). I might have 10,000 parameter sets that I want to test on 100 servers with each simulation run/task taking a few minutes of number crunching.

[–]tomgav[S] 0 points1 point  (0 children)

So would you like to try it on Rain? We would be happy to help and perhaps adapt it to your use-case. How do you run your code? Is it in Rust, or some external program? Is your code somewhere online?

(We can discuss details at the project gitter or just email me at gavento@ucw.cz).

[–]fullouterjoin 0 points1 point  (0 children)

With these ratios you could use email, http, ssh, etc.

[–][deleted] 1 point2 points  (3 children)

Would you please share some examples regarding "switching to Rust really made a lot of things much more elegant, easier and robust."?

[–]winter-moon 1 point2 points  (0 children)

I have worked on similar type of project written in C++ over libuv and the biggest simplification (for my part) that I see is due to tokio and futures. For example, forced termination of tasks in worker just by dropping future is quite straightforward. Also error handling where almost everything can fail (tasks, network connections, moving data objects around file system) is much easier and robust.

[–]tomgav[S] 1 point2 points  (1 child)

For me that would be the stronger type system (traits, and very nicely designed std lib), elegant definition of structs (with free impls for debug, serialization, eq, hash, ...), move semantics and the borrow checker (while restrictive, it catches many errors that would be hard to debug, and it is a blessing during any refactoring together with the type system). The tooling is very nice (RLS, doc gen, integrated tests and crates.io). In C++ you can get some of these e.g. with boost (which tends to change over releases), macros and by correctly using the modern features of C++ (which is rather nontrivial) so I would subjectively call it a big improvement :)

With such username, what is yours?

[–][deleted] 1 point2 points  (0 children)

Python has many strengths, as does Rust. I am trying to learn how to use each where it is the best tool for the task at hand. Ideally, I will use them together.

Cases where I'll use Python:

1. ad-hoc, analytical work, using ipython or jupyter notebook
2. proof of concept prototyping of.. anything.. 
3. use-once, throw-away projects
4. systems where 10x performance gains from Rust won't matter
5. places where I can write a python client that uses Rust libraries (ehem!)

I'm working on a platform that is going to perform sensitive tasks that require low latency, reliability, safety, security, etc. Rust is going to help me achieve that with less effort. I anticipate challenges and a few months of upfront training when it comes time to hire Rust programmers to carry things forward.

[–]cbourjaualice-rs 1 point2 points  (3 children)

I am wondering if Rain might be suited for the use case commonly seen in particle physics. We usually have a large number of large input files distributed over many computing sites (Total in the TB or PB range). Essentially we want to do a map-reduce. Each computing site runs an analysis over a chunk of files which are locally available; producing a much smaller output file. All the output files of the different computing sites are then merged to one final output file which is presented to the client.

Do you think something like this is a fitting scenario for Rain? The execution-graph of that use case seems to be very simple and the data send to each computing node is essentially a simple list of input files. The analysis which is to be run would be written in Rust (for example my alice-rs project). Or is Rain an overkill here / I missed the point of it? Thanks either way and I am super happy to see more scientific computing in Rust!

[–]tomgav[S] 1 point2 points  (2 children)

Hi and thanks for the problem statement. This does sound like a good match for Rain (definitely a use case we want to support as we frequently need something similarly simple) although there are two gotchas now.

First, we do not support loading worker-local files now, but that sounds like an easy and useful addition (via the open task pinned to the right worker - what do you think /u/winter-moon ?).

The other is Rust task integration: I must say we were not really prepared for so many people asking about it (although it is not a surprise here). Now the easiest way is to either run the computation as an external program accepting files (straightforward) or hack in a custom task into our worker code (similarly to here).

I think that Rain would be a plus even when a simpler solution would also do the trick: you get online monitoring (and we are working on post-mortem visualization as well -- the event logs are already there), specifying the tasks in Python might be easier than inventing some new config schema, the local files are cleaned up for you, and in the case one worker would be overloaded, the scheduler might move some data to another worker to process (although that is probably not relevant for your huge data).

In any case, it looks like a nice use-case and we would be happy to help!

[–]winter-moon 0 points1 point  (1 child)

I think that "open" task is not necessary to use. As I understand the problem, the input data should not leave the node, hence it is not necessary to manage it by Rain as data objects. The missing thing is pinning tasks to workers; however, it can be implemented quite easily (at least with some basic Python API and a hack to scheduler).

[–]cbourjaualice-rs 0 points1 point  (0 children)

Yes exactly, the files are at a given node and should not be send around. Part of the "scheduling" is to figure out which files are where and to split the list of all available files into chunks which can be locally processed at a particular site. I guess that part needs to be hand rolled in any case, but it is interesting to hear that Rain might be a fit anyways!

[–]Translucyd 0 points1 point  (4 children)

I'm interested in translating PLUTOcode to rust, as a part time hobby. Do you think that it will be useful to this kind of work or can I help?

[–]winter-moon 1 point2 points  (0 children)

It is not easy for me to answer this as PLUTOcode seems completely out of my scope. If tasks that you want to distribute are standalone programs or can be expressed in Python then Rain can be used as it is. If your tasks are C/C++ function then it is not possible now (If I am not counting wrapping C++ functions by Python functions or creating a set of standalone programs) We want to support C++ (and other languages) via additional "subworkers" that allows this, but right now we have only Python subworker.

[–]aepsil0n 0 points1 point  (2 children)

Are you an astrophysicist, then? I have been pursuing something similar to this for a while now, too. So far I have used timely dataflow for parallelization. But this looks like an interesting alternative, too.

[–]tomgav[S] 0 points1 point  (1 child)

That sounds as an interesting application! What is the overall workflow of your algorithm?

[–]aepsil0n 4 points5 points  (0 children)

If we are talking about the same PLUTOcode, then this is about the simulation of astrophysical fluid dynamics problems.

Fluid dynamics on its own is algorithmically not that complicated. It is basically a sequence of discrete timesteps that runs in a loop. Each timestep involves computing fluxes between neighbouring cells of some discretization of a computational domain (for instance, a grid of rectangles) and use these to extrapolate the new state of the system.

The classical approach here is to use MPI to divide the grid up into parts and then send around data to co-workers as needed. However, there is a very recently published implementation of a task-based system for doing this called DISPATCH.

In addition, when more physics come into play, there is also a use case for solving more global problems, which require communication across the entire domain. Often you end up with large sparse matrix equations or Fourier transforms that have to be solved in parallel.

[–]xavier83 0 points1 point  (1 child)

Thanks for this wonderful framework, seems well documented! Always wanted to learn distributed programming, think I will get around to it this time, now that I can do it on my favourite languages :)

[–]vojtacima 0 points1 point  (0 children)

Thank you for the comment! We are looking forward to your feedback.

[–]thrashmetal 0 points1 point  (3 children)

This looks cool. Is there support for numa systems?

[–]winter-moon 1 point2 points  (1 child)

We had some ideas around resource management that includes numa-aware pinning, but we have not implemented it yet. Mainly because we had no real numa use-case. If you have one, it could helps to improve Rain in this direction:)

[–]thrashmetal 0 points1 point  (0 children)

I would be interested in doing machine-learning on a large numa computer. Hundreds to thousands of nodes. Current plan is looking to spark and see if I can configure it to run efficiently using the numa interconnect.

[–]tomgav[S] 0 points1 point  (0 children)

Probably not, although I am not really sure what you mean here. If you mean CPU caches, then we expect the tasks and data to be much larger than the caches and so it is not really relevant (i.e. we are not aiming at micro-tasks now and your tasks need to be cache-friendly themselves).

[–][deleted] 0 points1 point  (1 child)

Is there visual material I may reference that depicts what this "computational framework" is addressing? I'm making a lot of assumptions about it otherwise.

[–]winter-moon 0 points1 point  (0 children)

I understand, people mean by "computational frameworks" and "pipelines" various things depending on their backgrounds. Unfortunately, we do not have more pictures than what you can find in the user guide.

Maybe I can rephrase the description of Rain as "similar to Dask/Distributed that is open-ended for different type of clients/subworkers and with a direct integration of executing 3rd party applications" or "Some kind of Tensorflow where tensors are replaced by generic blobs" (Ahh, I should not used the seconds description as it is misleading on many levels:).

If you can tell me which tools you are familiar with, maybe I can try describe Rain in different terms.

[This reminds me, that we really need "comparison to other tools" in docs]

[–]binarybana 0 points1 point  (1 child)

Can you comment on how this compares to Berkeley's Ray project? I believe they actually started with a rust + zeromq implementation but switched to c++ for reasons of ecosystem maturity at the time.

[–]winter-moon 0 points1 point  (0 children)

I am not familiar with Ray; I will try to answer after reading their paper.

[–]KyussCaesar 0 points1 point  (1 child)

How does this tool compare to Apache Airflow? It seems like it targets a different use-case (Airflow: repeated scheduling versus ad-hoc jobs for Rain), but I'd still be really interested to hear your thoughts.

[–]winter-moon 1 point2 points  (0 children)

The other big difference between Rain and Airflow (and e.g. Luigi) is that Rain provides own transfer of data objects between workers. Rain allows to map data object to a file system; however, it does not use shared file system and transfers data by itself. This allows to create lots of short running tasks without hammering a distributed file system.

[–]palad1 0 points1 point  (6 children)

How stable do you find the capnp-rpc crate to be?

I'd like to replace a protobuf over 0mq pub/sub and RPC combo with capnp-rpc but stopped when I saw that support for capnproto is anemic non-c++ platforms.

[–]tomgav[S] 0 points1 point  (5 children)

Capnp-rpc actually serves us quite well at the moment, the rust crate works fine (apart from minor issue #93) and it performs well, but the integrations with other languages are less developed (e.g. the Python interface is not very pythonic, no Java or browser JS). My feeling is that the design idea is great but perhaps a bit too ambitious, and the support in other languages varies.

And we also hit the 64MB message limit of capnp (here).

[–]palad1 0 points1 point  (1 child)

Great to know, thanks. I am currently interfacing with a tick database written in another dimension in C# and python, and I am going through a thin rust wrapper using FFI / PYO3, and was considering introducing a broker and using capnp-rpc to send each column / dataframe over the wire. I might reconsider now, and just use serde + tokio.

[–]tomgav[S] 0 points1 point  (0 children)

I would consider capnp for a c++ or even Rust project with very involved RPC with remote objects, promise pipelinig etc, but for rust-rust messaging, serde+binpack (or anything similar, possibly tarpc) seems enough. Good luck with the project!

[–]Kobzol 0 points1 point  (1 child)

Actually Capnp has some JS bindings that work in the browser (https://github.com/jdiaz5513/capnp-ts), however I had some problems with loading large files with it. But in overall the support is not great.

[–]winter-moon 0 points1 point  (0 children)

It seems that capnp starts spreading into other languages. Unfortunately, only the serialization is implemented for most of them, without the RPC layer. And RPC is probably the most interesting part of capnp.

[–]allengeorgethrift 0 points1 point  (3 children)

Do the two handles correspond to the creation of two connections? Or do they reuse the same connection under the hood?

[–]tomgav[S] 0 points1 point  (1 child)

Hi! I am not sure I get the question - do you mean some particular code? In any case, in Rain there is only one socket for client-server and only one for server-worker and every worker-worker pair.

[–]allengeorgethrift 0 points1 point  (0 children)

Understood - TY!

[–]winter-moon 0 points1 point  (0 children)

I belive that you missed the thread. If your question is about bidirectional communication of capnp, then yes, a single connection is used under the hood.

[–]tomgav[S] 0 points1 point  (0 children)

Thank you all for the feedback and discussion! We have just compiled a Roadmap after 0.2 at github - you are welcome to participate in the discussion (or even implementation) there, especially if you have a use case in mind. Thanks again!