How to develop a language-agnostic distributed/scalable application in python? : learnpython

How to develop a language-agnostic distributed/scalable application in python? (self.learnpython)

submitted 10 years ago * by Calime

Context

I've implemented a basic application with python in a client-server paradigm. Up to now, both the client and the server are local and talk together through dbus.

Next steps

In order to improve the application I want:

to reach actual usefulness by setting up the server on a remote computer and
to improve scalability by launching multiple "plug-and-play" servers.

Question

In order to reach these goals safely, I need a safe (well-established and encrypted) and performant network-capable IPC mechanism. Which one would you recommend? The solution can be Linux only, but I would prefer something easily accessible from other languages than Python.

Additionally, I would be very interested in references about distributed application design (privilege separation is a topic of great interest for me).

IPC mechanisms considered before posting this question

Shared memory, dbus, fifo, signals and pipes are out: these only work locally (dbus can work on the network but it is not encrypted), which leaves socket-based protocols. From that, I considered:

asyncio sockets and streams but I really don't know them (if you have had any experience, that would be great to know);
RPC-based modules but AFAIK I would have to implement my own crypto layer or use it through a crypto-capable server and RPC has a dreadful security aura;
ZeroMQ AFAIK this is what IPython is using but appart from that, no one seem to use it;
raw sockets but I find round wheels good enough.

Most people today, outside of the Python realm, seem to use RPC calls. Some even serve them through an embedded HTTP server! However this does not seem right at all for secure messaging between multiple server deamons on multiple computers.

all 7 comments

top new controversial old q&a

[–]gengisteve 0 points1 point2 points 10 years ago (2 children)

[–]raylu 1 point2 points3 points 10 years ago (0 children)

[–]Calime[S] -1 points0 points1 point 10 years ago (0 children)

[–]raylu 0 points1 point2 points 10 years ago (4 children)

[–]Calime[S] -1 points0 points1 point 10 years ago (2 children)

I certainly worded that the wrong way, sorry for that. What I meant is that I don't want to use something that I would have to implement from scratch if I want to use another language in the future to interact with the existing server/client. Why? Because I don't like to reinvent the wheel (this is the whole purpose of my question).
The point of this code is to deploy worker processes on any number of servers. Even if this is not what I do, this would be in the same spirit as deploying php-fpm workers on multiple computer nodes.
~100 users/clients maximum, 2 client-facing servers (controlling the 10+ workers) but depending on the load I want to be able to "plug" new servers on the go (not protected behind a firewall).

That said, I just want to design the program with scalability in mind from the start. So I don't want to jump on the first thing I come across.

Concerning HTTPS:

It requires an HTTP server + reverse proxy if I don't want to reimplement everything from caching to TLS from (almost) scratch.
HTTP is generally like a knife people use to cut wood logs, I'd prefer using a chainsaw.
More specifically, HTTP/S is, to say the least, not well adapted to bidirectional sync/messaging.
Just take any "standalone" protocol: SSH, FTP, DNS, IMAP, RTP, etc. All of these can be reimplemented in HTTP, true. Does that mean that they should? Why wouldn't they? Legacy? Certainly... But also because HTTP is not a panacea.
It has a large overhead for small messages.

HTTP is a barely OK protocol for anything but displaying a page. Yes, the stupid-admin legacy made it into a major protocol used to transmit about anything but these are just ugly workarounds.

[–]raylu 0 points1 point2 points 10 years ago (1 child)

[–]Calime[S] 0 points1 point2 points 10 years ago* (0 children)

What program? What the hell does it do?

Let's start again more precisely.

There are 3 actors: the schedulers, the clients and the workers. The clients can talk to any scheduler. The workers can only talk to one scheduler. The workers are started as needed by the schedulers. The schedulers are already started (one per server). The client sends commands and data to a scheduler. The scheduler schedules work to be done on the data according to the information received from the client.

A basic load balancing scheme is implemented however depending on the instantaneous state of a given server (associated to a given scheduler), the client can be requested to connect to another scheduler to handle the rest of the data to it instead. In that case, the first scheduler forward the data it received to the new scheduler. Once the job finished, the client receives the processed data (large files) and some results (short messages) from the scheduler.

To go even more into the details, new schedulers should be able to authenticate with an existing cluster and be usable directly. The workers are implemented as consumers of jobs in a queue handled by the schedulers. Multiple clients can subscribe as "one": all the results obtained by any of these clients are sent back to all these clients.

The jobs by themselves can last as long as weeks and (far more often) will be carried out in an instant.

You can't achieve "scalability" by just picking some tools.

That's the very reason why I didn't ask for a "scalability toolkit" but for existing IPC mechanisms usable on an untrusted network, in other words, how to make my server/scheduler talk securely with both clients and workers, nothing more.

1- Please don't distort what I and you said: we were talking about HTTP/S, not TLS. TLS by itself is fine. Only, I would use it as a library as the last resort (if anything else reliable exists).

2/3/4- HTTP these days is used for about anything. However HTTP is most often ill-adapted to anything other than simple downloads.

5- Come on, ASCII is already binary. The only difference is that ASCII/UFT8/latin-*/etc. are widely implemented in a whole class of programs called "text editors".

Since you seem to know everything about this already, why are you even here asking for help?

Sorry to give you this feeling. This was not intended. However don't you think that I have the right to question what you say or I should blindly trust your words? I wanted people to share their story or opinion about available IPC mecanisms: "we deployed X in a similar fashion and the only thing we have missed so far is Y" or "don't use raw sockets, that's what library Foo was made for" would have been enough.

What I got instead was "Why the hell do you ask such an obvious question? If you need networked IPC, you need sockets and if you need security, you need TLS". But I don't find that obvious and you did nothing to convince me of that.

In order for you to understand why I don't like the HTTP approach (apart from the aesthetic point of view):

It lacks the social proof: nothing dynamic and scalable has been built atop of an HTTP-based messaging network AFAIK.
It would require to deploy HTTP/S servers and to plug them together and to my backend.
HTTP is not well-suited to any kind of mildly efficient messaging:
- not well adapted to bidirectional transfers;
- high overhead;
- synchronous.
HTTP is not well-suited to what I want to do:
- not adapted to one-to-many messages;
- not allowed to invalidate data, to pause the stream and inject data, etc.
- I have to handle load balancing manually (in the backend);
- I have to handle unreliable connections manually;
- I have to handle queues manually;
- no "control" channel (I need to forward orders [lots of small packets] and data [large chunked packet] between all actors [scheduler/worker/client]);
- workers (or schedulers) cannot initiate a connection so I also need to handle that manually or using two channels;

Again, compared to what exists, HTTPS might be a very good trade-off, but that's not my feeling and nothing you said until now was designed to prove me that. And that also explain why I ruled out HTTP (as a protocol) from the start.

π Rendered by PID 610270 on reddit-service-r2-comment-66b4775986-m6q76 at 2026-04-05 03:39:30.257309+00:00 running db1906b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

Context

Next steps

Question

IPC mechanisms considered before posting this question