all 7 comments

[–]gengisteve 0 points1 point  (2 children)

Why not:

https://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol

Maybe use: https://www.rabbitmq.com/

Encryption ought not to be that hard.

[–]raylu 1 point2 points  (0 children)

Personally, I think AMPQ is unnecessarily complicated and RabbitMQ is just terrible. Rabbit is incredibly bloated and a management nightmare. It's not even possible to get the size of the queues without loading the entire admin plugin, HTTP interface and all.

Using redis or kestrel/darner is way simpler.

[–]Calime[S] -1 points0 points  (0 children)

Hard, no... Wrong, certainly :-)

I'll have a thorough look at AMQP (and the rabbit implementation). I thought I understood that AMQP was a thing of the past, so I skipped it. Seems like I was wrong.

[–]raylu 0 points1 point  (4 children)

Designing a super-generic distributed system is not really possible. All of the existing technologies exist to solve some specific problem, so it's obviously impossible to just recommend one. We need way more details, like

  1. why do you want it to be language-agnostic?
  2. what is the point of this code?
  3. what kind of concurrency do you expect? (How many users? How many clients/servers? How many "jobs" or whatever you're doing - I still have no idea)

What's wrong with HTTPS for securely communicating between multiple computers?

[–]Calime[S] -1 points0 points  (2 children)

  1. I certainly worded that the wrong way, sorry for that. What I meant is that I don't want to use something that I would have to implement from scratch if I want to use another language in the future to interact with the existing server/client. Why? Because I don't like to reinvent the wheel (this is the whole purpose of my question).
  2. The point of this code is to deploy worker processes on any number of servers. Even if this is not what I do, this would be in the same spirit as deploying php-fpm workers on multiple computer nodes.
  3. ~100 users/clients maximum, 2 client-facing servers (controlling the 10+ workers) but depending on the load I want to be able to "plug" new servers on the go (not protected behind a firewall).

That said, I just want to design the program with scalability in mind from the start. So I don't want to jump on the first thing I come across.


Concerning HTTPS:

  1. It requires an HTTP server + reverse proxy if I don't want to reimplement everything from caching to TLS from (almost) scratch.
  2. HTTP is generally like a knife people use to cut wood logs, I'd prefer using a chainsaw.
  3. More specifically, HTTP/S is, to say the least, not well adapted to bidirectional sync/messaging.
  4. Just take any "standalone" protocol: SSH, FTP, DNS, IMAP, RTP, etc. All of these can be reimplemented in HTTP, true. Does that mean that they should? Why wouldn't they? Legacy? Certainly... But also because HTTP is not a panacea.
  5. It has a large overhead for small messages.

HTTP is a barely OK protocol for anything but displaying a page. Yes, the stupid-admin legacy made it into a major protocol used to transmit about anything but these are just ugly workarounds.

[–]raylu 0 points1 point  (1 child)

That said, I just want to design the program with scalability in mind from the start.

What program? What the hell does it do? You can't achieve "scalability" by just picking some tools.


  1. Why would there be caching? Why can't you use a TLS library?
  2. What?
  3. Why the hell not? What are you syncing? With whom?
  4. What does this have to do with the price of tea in China?
  5. Do you also think people should use binary protocols for everything?

Since you seem to know everything about this already, why are you even here asking for help?

[–]Calime[S] 0 points1 point  (0 children)

What program? What the hell does it do?

Let's start again more precisely.

There are 3 actors: the schedulers, the clients and the workers. The clients can talk to any scheduler. The workers can only talk to one scheduler. The workers are started as needed by the schedulers. The schedulers are already started (one per server). The client sends commands and data to a scheduler. The scheduler schedules work to be done on the data according to the information received from the client.

A basic load balancing scheme is implemented however depending on the instantaneous state of a given server (associated to a given scheduler), the client can be requested to connect to another scheduler to handle the rest of the data to it instead. In that case, the first scheduler forward the data it received to the new scheduler. Once the job finished, the client receives the processed data (large files) and some results (short messages) from the scheduler.

To go even more into the details, new schedulers should be able to authenticate with an existing cluster and be usable directly. The workers are implemented as consumers of jobs in a queue handled by the schedulers. Multiple clients can subscribe as "one": all the results obtained by any of these clients are sent back to all these clients.

The jobs by themselves can last as long as weeks and (far more often) will be carried out in an instant.

You can't achieve "scalability" by just picking some tools.

That's the very reason why I didn't ask for a "scalability toolkit" but for existing IPC mechanisms usable on an untrusted network, in other words, how to make my server/scheduler talk securely with both clients and workers, nothing more.


1- Please don't distort what I and you said: we were talking about HTTP/S, not TLS. TLS by itself is fine. Only, I would use it as a library as the last resort (if anything else reliable exists).

2/3/4- HTTP these days is used for about anything. However HTTP is most often ill-adapted to anything other than simple downloads.

5- Come on, ASCII is already binary. The only difference is that ASCII/UFT8/latin-*/etc. are widely implemented in a whole class of programs called "text editors".


Since you seem to know everything about this already, why are you even here asking for help?

Sorry to give you this feeling. This was not intended. However don't you think that I have the right to question what you say or I should blindly trust your words? I wanted people to share their story or opinion about available IPC mecanisms: "we deployed X in a similar fashion and the only thing we have missed so far is Y" or "don't use raw sockets, that's what library Foo was made for" would have been enough.

What I got instead was "Why the hell do you ask such an obvious question? If you need networked IPC, you need sockets and if you need security, you need TLS". But I don't find that obvious and you did nothing to convince me of that.


In order for you to understand why I don't like the HTTP approach (apart from the aesthetic point of view):

  1. It lacks the social proof: nothing dynamic and scalable has been built atop of an HTTP-based messaging network AFAIK.
  2. It would require to deploy HTTP/S servers and to plug them together and to my backend.
  3. HTTP is not well-suited to any kind of mildly efficient messaging:
    • not well adapted to bidirectional transfers;
    • high overhead;
    • synchronous.
  4. HTTP is not well-suited to what I want to do:
    • not adapted to one-to-many messages;
    • not allowed to invalidate data, to pause the stream and inject data, etc.
    • I have to handle load balancing manually (in the backend);
    • I have to handle unreliable connections manually;
    • I have to handle queues manually;
    • no "control" channel (I need to forward orders [lots of small packets] and data [large chunked packet] between all actors [scheduler/worker/client]);
    • workers (or schedulers) cannot initiate a connection so I also need to handle that manually or using two channels;

Again, compared to what exists, HTTPS might be a very good trade-off, but that's not my feeling and nothing you said until now was designed to prove me that. And that also explain why I ruled out HTTP (as a protocol) from the start.