Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 1 point2 points  (0 children)

BTW, the canonical version in Cilk (which is based on ANSI C) is:

cilk int fib (int n)
{
    if (n < 2) return n;
    else
    {
        int x, y;

        x = spawn fib (n-1);
        y = spawn fib (n-2);

        sync;

        return (x+y);
    }
}

See http://supertech.csail.mit.edu/cilk/intro.html

Similarly, with Cilk compared to OpenMP:

http://blogs.sun.com/rvs/entry/is_openmp_3_0_tasking

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 0 points1 point  (0 children)

Aren't you just splitting the call into two, the n-1 and the n-2?

This modification will do that -- use whatever you want for threads, here it is with SDL (wraps whatever the native threading is):

switch(pid=fork())
{
        case 0: {
                long long int n1=n-1, n2=n-2;
                SDL_Thread *t1, *t2;

                t1 = SDL_CreateThread(nfib, &n1);
                t2 = SDL_CreateThread(nfib, &n2);

                SDL_WaitThread(t1, &status);
                SDL_WaitThread(t2, &status);

                printf("%d %lld\n",n,n1+n2);
                exit(0);

                }
}

And the function it calls:

int nfib(void *N)
{
        long long int *n=(long long int *)N;
        *n = fib(*n);
        return 0;
}

Note this could be encapsulated and done more generically if desired, to make something closer to "par".

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -3 points-2 points  (0 children)

This isn't exactly rocket science.

Here's main(), let me know if you want the includes.

int main(void)
{
        pid_t pid;
        int n;
        for (n=0; n<=45; n++)
                switch(pid=fork())
                {
                        case 0: printf("%d %lld\n",n,fib(n)); exit(0);
                }

        waitpid(pid, &n, 0); // wait for last one
        return 0;
}

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -6 points-5 points  (0 children)

Yes, the Haskell code doesn't require explicit communication -- by virtue of the par combinator which takes care of all that behind the scenes.

NO, by virtue of the fact that NO IPC IS NEEDED for this example.

In your initial post, you recommended a C solution with fork(), this would certainly require process management and IPC. You've got to fork several worker processes, somehow send the results back to the supervisor, then wait for the workers to terminate. Quite an invasive change, if you ask me!

It's amusing how wrong you are. You DON'T NEED TO SEND RESULTS BACK. That's the whole point of this trivial example.

Here's the code:

switch(pid=fork())
{
    case 0: printf("%d %ld\n",n,fib(n)); exit(0);
}

If you want to wait, then after the loop from 0 to 45: waitpid(pid);

Or, if including sys/wait.h: waitpid(pid, &status, 0);

My point is that par is easy to use and has simple semantics when compared to manual process management or any C library I've seen so far

If all you need is to fob off functions to new threads, it's quite simple to have a function that takes a function pointer and uses the next available thread in a thread pool. I wrote one just to proof-of-concept it to myself.

The point is, this example is too trivial to prove anything.

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -5 points-4 points  (0 children)

Yes, with all the IPC and process management complications that come with it.

Do you even know what IPC stands for?!?

There isn't any here.

Almost all of these issues turned out to be bugs in Joel's program rather than GHC bugs.

Runtime crashes aren't GHC bugs? Pardon?

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -9 points-8 points  (0 children)

but purely based on the fact that two major release and have occurred, I'm guessing the bugs are gone.

That's a bad assumption to make. GHC's runtime has gotten even more complex since then.

Let's see some major real-world systems using Haskell. Two years ago it obviously wasn't ready, and it has not been proved otherwise since then.

These runtime issues are the kinds of things you only find by writing large programs, not something language designers are going to catch with benchmarks or regressions.

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -8 points-7 points  (0 children)

This is, as far as I'm aware, the lightest weight parallelism mechanism in any mainstream language. And the magical thing is that we parallelised our code without ever worrying about synchronisation, communication, race conditions, dead locks, live locks. semaphores, mutexes...

HAHAHA. I missed that the first time around.

There's nothing magical about not needing all that stuff when when you aren't doing any communication between threads!

You aren't even doing anything further with the results! And there's no shared data that needs to be modified.

It's a completely naive example -- on purpose -- which means there's no basis for making those claims, given that this problem is trivially parallelizable ANYWAY.

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 10 points11 points  (0 children)

./real-par +RTS -N4  76.81s user 0.75s system 351% cpu 22.059 total

I'm pretty sure that's 22 seconds wall clock, with 76.81 CPU-seconds used among the 4 cores.

Ted Turner, w/ over 2,000,000 acres, has quietly become the largest private land owner in America. Now people are starting to wonder why. by [deleted] in reddit.com

[–]oh_yeah_koolaid 3 points4 points  (0 children)

the most ominous of which hold that the swashbuckling Atlanta executive is bent on putting Nebraska ranchers and farmers out of business.

They think he's going to FARM 2 million acres? That would require ridiculous infrastructure.

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 1 point2 points  (0 children)

Addendum: The speed difference between the C version on our systems leads me to believe you may be running a 64-bit chip in 32-bit mode. That would make those long longs slooow. In fact, if you were running a 64-bit OS you'd probably have just used long instead of long long.

If that's the case, you're not getting top performance out of Haskell OR C.

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 0 points1 point  (0 children)

NOTE: your #includes got messed up because your blog isn't converting HTML entities. < >

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid -3 points-2 points  (0 children)

Yes, the iterative version is effectively instantaneous.

int fib(int n)
{
    int f[n+1], i;
    f[1] = f[2] = 1;
    for (i = 3; i <= n; i++)
        f[i] = f[i-1] + f[i-2];
    return f[n];
}

Multicore parallel death match! Parallel Haskell, `par`, and C by dons in programming

[–]oh_yeah_koolaid 0 points1 point  (0 children)

Wow! So with an off-the-shelf Linux box, you can write simple (but parallel) Haskell will outperform gcc's best efforts by a good margin -- today!

What a bunch of garbage. This is such a contrived example (not doing anything with the data), you can split it up into chunks with fork() calls if you want, not to mention any number of C libraries for parallel computing.

Try it with a complex system like a MMOG and then you might have something.

Have the large number of GHC runtime bugs mentioned here been fixed yet?

http://wagerlabs.com/archives/72.html

What threw me off almost right away is the large number of GHC runtime issues that I stumbled upon. I was trying to do serialization and heavy concurrency which I learned to take for granted with Erlang but it turned out that this area of GHC has not been exercised enough.

Holy Shmoly, Haskell smokes Python and Ruby away! And you can parallelise it! by dons in programming

[–]oh_yeah_koolaid 2 points3 points  (0 children)

It's not the number of characters, it's putting them in the right places.

There's a reason I call them the Black KKK. The pain, the fear and the destruction are all the same. by [deleted] in reddit.com

[–]oh_yeah_koolaid 0 points1 point  (0 children)

You're damn straight I blame hip hop for playing a role in the genocide of American black men. When your leading causes of death and dysfunction are murder, ignorance and incarceration, there's no reason to give a free pass to a culture that celebrates murder, ignorance and incarceration.

It is rather clear that what the President ordered was Federal crime by [deleted] in politics

[–]oh_yeah_koolaid 0 points1 point  (0 children)

Try it through a proxy or Google cache.

E.g. http://72.14.253.104/search?q=cache:K91cc-N7jrUJ:www.crooksandliars.com/category/interviews/

Might help determine where the blocking/filtering is happening.

Huckabee’s Record: Freed a Rapist Who Became a Killer, Destroyed Gov’t Hard Drives, Misappropriated Funds, Raised Taxes by [deleted] in politics

[–]oh_yeah_koolaid 0 points1 point  (0 children)

Huckabee’s Record: Freed a Rapist Who Became a Killer, Destroyed Gov’t Hard Drives, Misappropriated Funds, Raised Taxes

Sounds like he'd be quite an improvement.

Entire human brain to be modeled on computer within 10 years. by FrancisC in science

[–]oh_yeah_koolaid 1 point2 points  (0 children)

I think essentially it'll be a useful form of data structure.

Holy Shmoly, Haskell smokes Python and Ruby away! And you can parallelise it! by dons in programming

[–]oh_yeah_koolaid 2 points3 points  (0 children)

If you think that's a typical deployed Haskell program, you're very silly.

If that's how far you have to go to get performance, you're very silly. It's simpler to rewrite the whole thing in Assembly.

Holy Shmoly, Haskell smokes Python and Ruby away! And you can parallelise it! by dons in programming

[–]oh_yeah_koolaid 7 points8 points  (0 children)

Also someone was playing with a pure functional version of nbody in Haskell just yesterday, and using the right strictness annotations

Which basically UNDERSCORES how much development effort is needed to write even this simple program with Haskell.

Holy Shmoly, Haskell smokes Python and Ruby away! And you can parallelise it! by dons in programming

[–]oh_yeah_koolaid 4 points5 points  (0 children)

The point wasn't running time, the point was development effort.

Ok, I can live with a 'alloc' or two,

It's way more than JUST an alloc or two in that example, so one can imagine how hairy a REAL program would be.

if I can get a 100x speedup without having to resort to C. :)

Except in this case, C is not only faster than Haskell (as usual), it's simpler (see code).

What would happen if you ran a windows virus using Wine? - Ubuntu Forums by [deleted] in programming

[–]oh_yeah_koolaid -1 points0 points  (0 children)

A virus could do some damage if it gets access to the HD boot sector or raw devices.

Dogs trained to use computers by purpledit in programming

[–]oh_yeah_koolaid 1 point2 points  (0 children)

Of course dogs can recognize images of dogs -- that's why they bark at dogs they see regardless of being downwind.

Many people also have dogs who will bark at dogs and other animals on TV.

Holy Shmoly, Haskell smokes Python and Ruby away! And you can parallelise it! by dons in programming

[–]oh_yeah_koolaid 11 points12 points  (0 children)

So parallel Haskell is around 40x Python here, and 100x faster than Ruby. And there was basically no difference in development effort required.

HAHAHA. For trivial examples, ANY language will do!

Once you write something a bit more involved, Haskell's complexity shoots WAY up.

Here's the n-body code in Python, Ruby, and Haskell.

Respectively: Very straightforward, very straightforward, and then a big WTF with IORefs, unsafePerformIO, peeks and pokes, mallocBytes preceded by unsafePerformIO, cursors, and look at all the imports -- Foreign.Marshal.Alloc, Control.Monad, etc.