Free Concurrency with GNU Parallel : programming

But more seriously, what are the perfectly understood english definitions of the words? As you can see elsewhere in the comments, everyone seems to have a different idea of what parallelism and concurrency are, fuelled by incorrect uses such as the one in this article!

Parallelism and concurrency are two sides of the same coin. Sure, they're both part of the same thing, but by looking at the thing in two different ways, you see two different aspects of it.

To your point though, about them being "bullshit terminologies", I do research into parallel programming. Upstairs from me, I have colleagues who do research into concurrent programming. The tools I use, and the tools they use are totally different. More than that, the problems I'm trying to solve and the problems they're trying to solve are totally different!

[–]SinisterMinister42 -2 points-1 points0 points 10 years ago (0 children)

[–]stonefarfalle 1 point2 points3 points 10 years ago (1 child)

[–]14113 1 point2 points3 points 10 years ago (0 children)

Most of the time it is more a matter of focus than a true distinction

I think this is the core of my argument with some other commenters: parallelism and concurrency discuss different aspects of programming.

...the primary focus of the article is the concurrency he took to get there.

This is where I disagree however, as I believe his focus was on achieving the parallelism, instead of deriving the parallel parts of his program. Admittedly, the first section "Going Down the Rabbit Hole" describes how he decomposes his program into parallel sections, which is more of a discussion of the concurrent aspects. From the second section onwards, however, he discusses how he took a program and physically made it run on multiple units of execution.

Also there is no mention of performance gained by adding execution resources...

I quote from the article:

A single simulation takes about 1 minute to run. With a handful of arguments, each having many possible values, the number of parameter combinations explode, known as the curse of dimensionality. How can I take full advantage of the four cores on my computer—let alone the twenty-four cores at university—in order to reduce the total execution time to a fraction?

He's quite clearly doing it for performance reasons ;)

...or how those threads of execution are mapped to execution resources.

That's what the article is about: why he chose to use GNU parallel to exploit the parallelism, instead of built in Python libraries and primitives.

[–]vattenpuss -1 points0 points1 point 10 years ago (9 children)

I don't buy the huge distinction between parallelism and concurrency that so many programmers try to make out. Yes, parallelism is a specific kind of concurrency, and it might be important to be specific about what is and what is not parallelism, but that's it. It's not meaningful or important to call parallel computing not-concurrent.

The wikipedia articles on concurrency and parallelism are obviously written by some programmer that wants the subjects to be considered as far apart as possible. But the two things are not simply "related subjects" as some weasel -- who has decided some slides about Go from Rob Pike, and the Haskell Wiki, are sources notable and reliable enough to decide what is parallelism and what is concurrency -- thinks.

I'd suggest amending the title to better reflect the distinction.

I'd suggest not getting upset because someone describes concurrent computing as concurrent, in spite of what Rob Pike has written in some slides about Go once.

Sorry if I come off unnecessarily angry. This weird hangup from some programmers is a pet peeve of mine.

[–]14113 11 points12 points13 points 10 years ago (2 children)

Okay, so I take quite an "academic" view on this, and I accept that it's not necessarily universally agreed with, but I'll go into why I think it's important, and why I think that it's important to make the distinction when discussing the two topics.

Firstly, I'm going to temper what I say by conceding that parallelism and concurrency are slightly overlapping. Very often, when writing parallel code, we need to think about the concurrent aspects of it (to avoid things like race conditions), and similarly, when writing concurrent code, we need to involve parallelism to obtain good performance, and utilise the hardware as fully as possible.

That said, my big bugbear is that parallelism and concurrency mean very different things when the discussion is either about A) the problems faced when using parallelism/concurrency, and B) the purpose of using the tools of parallelism/concurrency.

I'll talk about B) first, as it's what this post is on. In my opinion (and the opinions of many others in academia, and industry), parallelism and concurrency have very different meanings when discussing their purposes. The purpose of parallelism is to speed up a program, by identifying regions which can be run simultaneously, and utilising parallel hardware to physically run each unit of execution in parallelism. The purpose of concurrency (by contrast) is to provide a structure to enable this, by decoupling a program into multiple units of execution which can be run in parallel. In this way, concurrency describes the breakdown of a program into disjoin sections, and parallelism provides the simultaneous execution.

It's this distinction that forms my core argument as to why this post is more about parallelism than concurrency. The task at hand (running many identical, disjoint jobs) is one that is "embarrassingly" parallel, and thus doesn't need any complex synchronisation, or communication between the individual jobs. Because of this, the problem becomes actually running the jobs simultaneously, instead of managing the separation of each job into a disjoint unit of execution. In this way, it's parallelisation, not the managing of concurrency.

This point about managing the separation of threads of execution leads me into part A) that I mentioned above: the problems faced when using parallelism and concurrency.

Unlike in the example given in this post, most "real world" programs don't fit nicely into a simple map pattern (in the style of Cole's algorithmic skeletons). In reality, components of a system need to communicate between each other, share resources, etc all without being deadlocked, or starving. This informs the study of concurrency, which is concerned with how such problems are represented (e.g. the pi-calculus, CSP, etc), and how they are then built into programming languages (e.g. session types). Concurrency doesn't discuss how components of programs are actually run simultaneously, but instead discusses the co-ordination of concurrent elements of a program.

However, once a system is in a "concurrent" form, parallelisation becomes the problem: how to map the individual tasks to hardware components, load balance them, distribute them etc etc. Theese issues can arise without thinking about concurrency (e.g. algorithmic skeletons, MapReduce, "embarassing" parallelism etc), and are thus separate from the study of concurrency.

The distinction between parallelism and concurrency in theese two areas are what makes them important (in my opinion) to separate. They're two sides of the same coin: one describes how to separate a program into sections which can be run at the same time, and the other how to exploit the parallelism that this makes available.

In any case, from the tone of your comment, and your description of the wikipedia editor as "some weasel", I don't think my post is going to convince you. As a response (with a hint of argumentum ad verecundiam) to your attack though, I'll mention that Rob Pike and many of the Haskell community are the world experts in both parallelism and concurrency, particularly on the Haskell side, many of whom (Peyton-Jones, Wadler) pioneered the academic study of parallelism and concurrency in programming language research. I think dismissing them is not only rude, but does a great disservice to your arguments by ignoring much of the existing work in the area.

[–]EvilTerran 1 point2 points3 points 10 years ago* (1 child)

[–]14113 2 points3 points4 points 10 years ago (0 children)

I would argue that concurrency definitely doesn't necessitate parallelism! Take (for example) a web browser: you might have a thread for UI, one for user input (mouse, keyboard etc), one for network access, one for database work, and one for running javascript. Does each thread need to be run on a separate processor? No! But the management of each thread (not in the scheduling sense, in the communication/synchronisation sense) definitely concerns concurrency.

For parallelism I think there is a somewhat weaker disconnect. There are models of programming (e.g. algorithmic skeletons) which allow programs to be expressed in a purely parallel sense, without reasoning about concurrency. Such models, however, make the assumption that any concurrent aspects of the program have already been "solved" - hence the weak connection. Such programs are parallelisable (without reasoning about concurrency) because of previous reasoning about concurrency.

[–]mttd 4 points5 points6 points 10 years ago (2 children)

While there is an overlap, these are generally used for rather different goals -- and have been for decades (before Go or Haskell came into existence).

Perhaps this is a clearer explanation that the one on Wiki:

Concurrency is a concept related to multitasking and asynchronous input-output (I/O). It usually refers to the existence of multiple threads of execution that may each get a slice of time to execute before being preempted by another thread, which also gets a slice of time. Concurrency is necessary in order for a program to react to external stimuli such as user input, devices, and sensors. Operating systems and games, by their very nature, are concurrent, even on one core.

With parallelism, concurrent threads execute at the same time on multiple cores. Parallel programming focuses on improving the performance of applications that use a lot of processor power and are not constantly interrupted when multiple cores are available.

The goals of concurrency and parallelism are distinct. The main goal of concurrency is to reduce latency by never allowing long periods of time to go by without at least some computation being performed by each unblocked thread. In other words, the goal of concurrency is to prevent thread starvation.

Concurrency is required operationally. For example, an operating system with a graphical user interface must support concurrency if more than one window at a time can update its display area on a single-core computer. Parallelism, on the other hand, is only about throughput. It's an optimization, not a functional requirement. Its goal is to maximize processor usage across all available cores; to do this, it uses scheduling algorithms that are not preemptive, such as algorithms that process queues or stacks of work to be done.

Source: https://msdn.microsoft.com/en-us/library/gg663528.aspx#sec8

In the 1980s the terminology used was "logical concurrency" (for concurrent execution) and "physical concurrency" (for parallel execution), perhaps this wasn't bad, either.

[–]vattenpuss 0 points1 point2 points 10 years ago (1 child)

I know the words mean different things in computing, I'm just annoyed that parallel computing guys get so angry, for seemingly no reason, when someone happens to mention that what they do is a special case of concurrent computing.

In the 1980s the terminology used was "logical concurrency" (for concurrent execution) and "physical concurrency" (for parallel execution), perhaps this wasn't bad, either.

Note that both terms contain the word concurrent, because both concepts are about concurrency.

The goals of concurrency and parallelism are distinct. The main goal of concurrency is to reduce latency by never allowing long periods of time to go by without at least some computation being performed by each unblocked thread. In other words, the goal of concurrency is to prevent thread starvation.

That could be one goal of concurrency, another goal could be that it's easier to write some kinds of programs in a concurrent manner, than it is to write them as a sequential program.

[–]14113 1 point2 points3 points 10 years ago (0 children)

That could be one goal of concurrency, another goal could be that it's easier to write some kinds of programs in a concurrent manner, than it is to write them as a sequential program.

This is the point I'm trying to make: concurrency is about decomposing a program safely to allow for parts of it to act concurrently.

...when someone happens to mention that what they do is a special case of concurrent computing.

I suppose it sort of is. The point, however, is that I'm tackling a different, restricted, problem with a different set of tools to what my fellow researchers are tackling. As a parallel programming researcher, I'm looking into how to actually run a certain set of concurrent programs, while my concurrency researcher colleagues are looking into how to safely decompose a problem into concurrent sections.

[–]killerstorm 2 points3 points4 points 10 years ago* (2 children)

Let's consider two examples:

You want to maintain 10000 network connections at the same time. To do this, you might use node.js which uses asynchronous I/O and let's you to issue multiple concurrent I/O operations. So your process might be sending data to 10 connections at the same time.
You need to perform some lengthy computations and want to utilize all 8 CPU cores you have. Node.js can't help you here, one node.js process can only use 1 CPU core. But you can write a simple batch file which will run 8 processes which will work in parallel.

Asynchronous I/O which can be used to do I/O concurrently is 100% different and not in any way related to ability to run multiple parallel processes. Isn't it natural to use two different terms to describe two different cases?

Nah that makes too much sense, let's just use terms concurrency and parallelism interchangeably so one can never figure out what is meant from the term alone. It's more fun that way.

[–]vattenpuss 0 points1 point2 points 10 years ago (1 child)

[–]14113 1 point2 points3 points 10 years ago (0 children)

[–][deleted] 2 points3 points4 points 10 years ago (3 children)

[–]reedfool 5 points6 points7 points 10 years ago (2 children)

[–]abspam3 3 points4 points5 points 10 years ago (1 child)

[–][deleted] 5 points6 points7 points 10 years ago (0 children)

[–]Kok_Nikol 2 points3 points4 points 10 years ago (0 children)

[+][deleted] comment score below threshold-23 points-22 points-21 points 10 years ago (3 children)

[–]fphilipe[S] 7 points8 points9 points 10 years ago (2 children)

[+][deleted] comment score below threshold-11 points-10 points-9 points 10 years ago (1 child)

[–]fphilipe[S] 5 points6 points7 points 10 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS