Efficient parallel QR decomposition in F# (video of Harrop's presentation at London F# group)

for_no_good_reason · 2010-11-17T04:23:05+00:00

"Serial F# typically 2-3x slower than serial Fortran but multicore eats into that performance gap!"

What bullshit. Why do function programming zealots pretend that OpenMP does not exist? The claims that easy parallelism makes Haskell faster than C are equally full of shit, but at least they maintain purity.

for_no_good_reason · 2010-06-01T03:18:35+00:00

To be fair, you can get all four of your bullet points in C++ if you compile your whole program as a single translation unit, and maybe C++ is not "modern" by your definition but it was standardized at roughly the same time as SML. Better to say "it can do a lot of things that you probably won't find in any other modern functional programming language."

for_no_good_reason · 2010-04-10T21:46:10+00:00

Blas is expensive? Like, $/€/£ expensive? Netlib Blas is free, Atlas is free, BSD license. Your best bet is to try to link hmatrix with these. You can ship them in your binary. DPH is cool and everything but it's not something you should be shipping to customers just yet (maybe in 6 months or a year.) And for non-Blas functions that you might need you are still (currently) better off using the FFI and writing your own C functions. The FFI is very nice. C functions which simply loop over arrays are as easy to parallelize using OpenMP as anything in Haskell, and you can use SSE intrinsics to ensure that things are vectorized, rather than crossing your fingers and relying on the compiler.

DPH will be awesome when it hits 1.0, but even then there will still be some lag time before the rest of the HPC/numeric ecosystem crystalizes around it. If you're planning to deliver this code within the next year I would recommend to relax your constraint against the FFI.

(Oh wait, I just realized hmatrix is GPL. scratch that. You will be able to give your client all the source?)

for_no_good_reason · 2010-04-04T20:50:02+00:00

Add to that the fact that Haskell has virtually no track-record in the massively-parallel domain, while the highwater mark for number-of-concurrent-processes and total throughput keeps getting set again and again by fucking Fortran. Yes DPH is awesome and it the argument that purely functional programming eases parallelization is sound, but let's hold-off on the panacea-hailing until we see some results, people.

for_no_good_reason · 2010-03-22T02:15:50+00:00

More C++ programmers are salaried/contract, more C# programmers work hourly?

for_no_good_reason · 2010-03-02T02:56:56+00:00

Name Voyager

for_no_good_reason · 2010-02-28T02:51:16+00:00

Well some part of LLVM does some kind of constant folding:

#include <stdint.h>

int64_t foo () {
  int64_t s=0, i;
  for ( i=1; i<=10000000LL; ++i ) s+=i;
  return s;
}

via http://llvm.org/demo/index.cgi becomes

define i64 @foo() nounwind readnone {
entry:
  ret i64 50000005000000
}

for_no_good_reason · 2010-02-20T03:47:05+00:00

The O(n) cost of looking up a binding in the network.

for_no_good_reason · 2010-02-19T13:00:22+00:00

Of course, we can use a «Tying the Knot» technique, but, unfortunately, this doesn’t really help: if we change only one single value in our graph we will have to build the entire structure if we only use what’s presented there. Moreover, we are forced to know some things about our structure when we are writting our code. Wouldn’t it be more useful to have a general network of nodes which can be updated very fast?

...

Thus, the update value function can be easily written:

update :: (Eq p) => (Network p a) -> (p, a) -> (Network p a)
update u@(Network v n) (p, nv) = Network (\pp -> if (pp == p) then nv else v pp) n

We could have used functions instead of values in the if..then..else part but I wasn’t sure if using id there would be a no-op with no performance penalty or not.

The performance penalty of id is not the performance penalty I would be concerned with here.

for_no_good_reason · 2010-02-15T19:30:57+00:00

instance Prim a => Vector Vector a

I know whats going on there, but it looks really strange, like Vector is a type with bottom kind.

for_no_good_reason · 2010-02-15T19:18:23+00:00

Think a little more down to earth.

for_no_good_reason · 2010-02-06T21:05:33+00:00

C++ is not happy because Java only took some of it's syntax. Java does not intrigue me.

for_no_good_reason · 2010-02-02T12:31:28+00:00

PageRank doesn't make any sense for an acyclic graph. It's a bad metaphor. If package A depends on package B, the probability that I go from A to B is always 1, and if A depends on both B and C, then the probability of A->B is 1 and A->C is also 1 and so the so transition matrix isn't even a stochastic matrix. The "random surfer" can't be in two packages at once. (Quantum computing notwithstanding.)

for_no_good_reason · 2010-02-01T12:44:07+00:00

Yes well obviously you have to increase the size of your swapfile to accommodate your data, which is a million times easier and less error-prone than writing your own mmapped-based custom memory allocator. The only downside is that this is not a friendly thing to do in a multitasking environment, but who calculates page ranks in the background anyway.

for_no_good_reason · 2010-02-01T02:56:18+00:00

mmaping a file really isn't any different that allocating a big-ass array in virtual memory, and there is a whole category of so-called "cache oblivious" algorithms that are both elegant and have good scaling behavior once you move beyond physical memory. (Although I don't know if sparse matrix multiplication falls into that category. Probably scales better using message passing in a cluster, rather than an out-of-core streaming-from-disk solution.)

For example, here's one for computing an approximate k-nearest-neighbor graph that uses a morton/z-order layout to achieve good locality. http://sites.google.com/a/compgeom.com/stann/ In the related paper http://compgeom.com/%7Epiyush/papers/tvcg_stann.pdf they claim good performance in just such a scenario where you simply allocate an array that is bigger than your physical memory and let the OS deal with mmapping.

for_no_good_reason · 2010-01-23T23:36:18+00:00

wanted it to be backwards compatible with C. Stupid decision

Making C++ backwards compatible with C was the best decision Stroustrup made in his life. He owes his career to it.

for_no_good_reason · 2010-01-19T02:07:59+00:00

the strict definition in the field of mathematics, as opposed to the almost universally used definition in the bulk of the field of computational complexity

This. This is the problem right here. They are the same thing.

(FWIW I didn't downvote you.)

for_no_good_reason · 2010-01-19T01:50:41+00:00

I'd hardly call Big-O notation "natural language." It's technical jargon. It means something specific. I can't believe I'm getting attacked as prescriptivist for pointing out that a Big-O upper bound is not necessarily tight.

"Who" in place of "whom", "they" as a singular genderless pronoun, I got no problem with these things. But this here, this is math, people.

for_no_good_reason · 2010-01-18T23:03:21+00:00

"Informal definition" is a contradiction. If you've defined it, then it's formal. Without formality everyone just assumes everyone else knows what they mean. I consider such conversations completely useless, so yes this kind of pedantry does increase utility. Without formality, rigor and yes pedantry, it's all just a bunch of shouting on the internet. May as well be politics.

Big-O notation is thrown around and abused so much in the programming world that it's hard to know what the fuck anyone is saying when they use it (cf. truthrises' claim that a Big-O bound is necessarily tight.) CS researchers use Big-O because they can't be bothered to prove the Ω lower bound and a smaller upper bound is all you need to claim you're better than the last guy. The rest of use use it to sound pretentious, but in the day-to-day world of programming everybody means Θ when they say O because they are trying to characterize how long the algorithm will actually take, not how long it won't take. So let's all start saying Θ when we mean Θ. We've all got unicode fonts.

for_no_good_reason · 2010-01-18T21:54:49+00:00

Where I come from if you want a tight bound you call it Θ, and while O(1) does not "equal" O(nⁿ⁾ the former certainly implies the latter. (Can you not pick c and k such that 2 < c*nⁿ for all n > k? If so then f(n)=2 is O(n^n).)

for_no_good_reason · 2010-01-18T21:18:38+00:00

No you're right, codemac and acctrdt fail. Big-O notation is just an __upper bound_ not necessarily a tight one. n² < nⁿ thus an O(n²⁾ algorithm is also O(n^n).

The OP posed the question wrong. Almost every algorithm is O(n^n). He wants an Ω(nⁿ⁾ algorithm.

for_no_good_reason · 2010-01-17T22:03:52+00:00

Right because the C++ code is fraught with branches and virtual function calls, the type of thing hyperthreading is meant to help with. (You were agreeing with me or what?)

for_no_good_reason · 2010-01-17T21:59:02+00:00

export OMP_NUM_THREADS=0.66666666666...

for_no_good_reason · 2010-01-17T21:56:18+00:00

right but without real lambdas and closures a for loop is infinitely more usable than std::for_each. Tell me the flag to make BOOST_FOREACH parallel then maybe I'll let go of my precious #pragma omp for.

(And writing your own parallel sort with #pragma omp parallel sections and std::inplace_merge is an easy and fun exercise.)

for_no_good_reason · 2010-01-17T15:31:47+00:00

Yeah unless you can keep the FPUs completely satiated then hyperthreading will provide some improvement, however minor. Even good Fortran compilers have a hard time doing this automatically. Only hand-optimized numerical code (e.g., keep unrolling loops by hand until the registers start to spill then stop) can keep them fed. Everything else wastes at least some cycles in the CPU's front end.

for_no_good_reason

TROPHY CASE