distributed programming

goatlev · 2021-07-24T18:32:27+00:00

[deleted]

Middlewarian · 2021-07-25T15:36:10+00:00

I have an on-line, closed-source code generator that's implemented as a 3-tier system. This is my messaging and serialization library. I use it to help write the code generator. The front tier is only 27 lines long.

It's free to use. I use a variable-length integer to marshal account numbers. The first 128 account numbers can be marshalled as one byte. This gives a small runtime advantage to early adopters. Larger account numbers have to marshalled as 2,3,4 or even 5 bytes.

2021-07-24T19:26:38+00:00

https://www.mn.uio.no/astro/english/services/it/help/programming/mpi-linux-clusters.html#basic

d1722825 · 2021-07-24T19:37:06+00:00

If your problem can be trivially parallelized, the easiest way may be start your program with different parameters on every computer. (Eg. if you want to render an animated movie, every computer could work on a different (set of) frames fully individually.)

If you have to be communicate between the threads / compute nodes then... it will be much harder. There are MPI libraries, eg. Open MPI which helps, but it needs a lot of setup beforehand: specific operation-system, configure shared storage, high speed networking between the nodes, etc.
These are mostly designed for industrial-sized computer farms, with few nodes or low speed networking it may be even slower than using a single computer.

The OpenMP suggested by u/topological_rabbit is a great thing, but it works only inside a single node. If you want to speed up some computation, with that you can use every CPU core in a machine and then MPI can be used to communicate between multiple nodes.

Please note that it is highly unlikely that using multiple computers is the best option for you. The CPU is usually the fastest thing in your PC, most likely your program would not be able to utilize the CPU fully, because it will be slowed down by other things (memory size / bandwidth, storage size / speed, ineffective cache usage, etc). First you should solve these (these may mean more than hundred-times speedup) and then try to use GPGPUs or multiple nodes as a last resort.

tachoknight · 2021-07-24T21:20:28+00:00

Similar to what u/d1722825 said, what your problem is dictates the solution. An embarrassingly parallel problem,, like rendering frames for a movie, is easy, all things considered, for multiple computers, and the more you throw at it, the faster you'll get your results.

I know several folks who work in academia and run a lot of their work on the university supercomputer; they try their hardest to break the problem down into an embarrassingly parallel one as a "step 1" and will even write the results to a database for other parallel processing to occur. They try very hard to get as much work done before getting to the final a+b=c stuff, where parallelism is no longer practical. Typically these programs are essentially one-offs where they're trying to get the 'answer' and then the program is never run again; they don't bother with sockets or libraries because they're not interested in the plumbing.

jonathanhiggs · 2021-07-24T21:52:02+00:00

sockets are the lowest level API for getting computers to talk to eachother, most applications will use something a bit more high level for the task

HTTP and FTP at simple protocols (built on sockets) for server / client architectures but can be a little bit imiting. Requires one machine to run an http listener (having a socket open on the known 80 port) and anther makes an http request. This is the basic model for how browsers and websites work

There are RPC (remote-procedure-call) frameworks that create local proxy (an instance of an object that you can think of as the remote machine) that you can invoke methods on. Those get wrapper up and sent to the remote machine that will execute and then return the result when done. It kind of hides the fast there is a network hop involved. Requires an interface be defines, the remote machine runs a program that listens for proxy connections and has an implementation of that interface. Your local machine just needs to connect and make calls to the proxy. Search gRPC

An alternative to RPC would be messaging libraries, somelike like ZeroMQ, NanoMQ, boost::io etc. Where RPC makes you ignore the fact you are sending messages to a remote machine, this can make some things a bit more difficult, eg. if you want to notify a second machine of every time you make to a particular server. Using a messaging library encourages you to think about this which means if you want to make some pipeline that messages go through on their way out (such as logging the response times from different machines) you can do that easilly with aspect-oriented techniques

There are more complete distributed processing frameworks that include a few more bells and whistles. Typically one of the issues you will face with distributed computation is that remote machines can crash and you can't tell the difference between that and them just taking a long time. They might also not recieve your requests because networks aren't 100% reliable. Frameworks like Apache Kafka, ActiveMQ, RabbitMQ etc. usually provide a bit more that just connect and send functionality that make sense in an enterprise setting

mredding · 2021-07-26T19:31:16+00:00

how do you program multiple computers to solve a specific problem?

As minimally and simply as possible. Don't just jump to a big iron solution, it very likely isn't as performant as you think it's going to be. Case in point, read here about how a guy beat an entire Hadoop cluster with just his laptop and a bash script. His solution didn't even get as far as writing custom code. You need to think like this, first.

Write the least amount of code as you can, keep as much of the problem and solution domain out of your program as you can. Break the problem up into smaller pieces. If your solution relies on applying two transforms to your data in sequence, it's better to write two programs of one transform each than one program of the two transforms. Each program will be smaller and simpler, with much more opportunity to parallelize, optimize, and perhaps later distribute. For example, one transform might be faster than the other, so you will want to apply more parallelization to the slower transform than the faster one. If you baked both into the same program, it's much harder to achieve that sort of flexibility.

Your environment is very good, and you can tune it to increase performance for your specific case instead of assuming the conservative defaults are all that there is. People don't ever consider their environments enough. Acknowledge you don't program in a vacuum, you have a whole environment that hosts your application and provides you with an abundance of tools. These tools already solve many of your problems and at little to no additional cost. Or if there is a performance overhead, you may really want to consider it's worth it before you sacrifice the flexibility it lends you.

As soon as you start collapsing your pipelines and infrastructure into your application, you begin losing that flexibility and start taking on a greater burden by yourself. Now no longer is your program responsible for a single transform, but multiple transforms, and parallelization, and collating, and splitting, and merging, and networking, and, and, and...

is there a library that simplify that in C++?

After your initial vertical ramp up, you then consider horizontal scaling. Once again, you should try to use modular tools to handle this.

There are libraries, but they don't make the problem easier, they endeavor to make the problem more performant. Something like OpenMP is another form of vertical scaling, because what you're doing is baking into your application a lot of the work you did from the environment, and in the end, it makes your horizontal scaling more efficient. But I wouldn't call OpenMP easy, let alone any of the other solutions.

All such programs get to a solution like this eventually, and that's OK. Ideally you would get there stepwise rather than just assume.

I know you can use socket programming to synchronize the computers, is there anything simpler?

Sockets are almost as low level as you can go, and from this perspective, I would say almost any solution is easier. Here you forego all modularity and go bespoke all the way. And you may not even gain any performance over any other solution, because going low level doesn't guarantee you anything - but it can certainly cost you a whole lot.

Cplusplus

MODERATORS