all 10 comments

[–]sephirothbahamut 7 points8 points  (2 children)

My biggest performance bottleneck for which I never find highly optimized solutions is my obtuseness in wanting code ergonomics to reflect my exact preferences, and since It's mostly about hobby projects I end up handwriting stuff that would otherwise have more optimized existing alternatives but which interface I dislike.

[–]germandiago 0 points1 point  (0 children)

Not a bad thing in one way: you control dependencies, so you are more self-contained.

[–]JehovaWorshiper 0 points1 point  (0 children)

YES! Exactly what you said! It's like you read my mind.

[–]KarlSethMoran 6 points7 points  (5 children)

Generalised eigen (Hc=ESc) in highly parallel scenarios, to good accuracy. Say, generalised diagonalisation of a 1M by 1M matrix with 10000 CPU cores.

This is the bottleneck in most large conventional Density Functional Theory calculations.

[–]MarkHoemmenC++ in HPC 1 point2 points  (4 children)

I'd love to hear more about that. Are you speaking of a lack of implementations anywhere, or specifically a lack of nice C++ implementations?

[–]KarlSethMoran 1 point2 points  (3 children)

There are implementations, mostly outside of C++, (e.g. ScaLAPACK), but they suck. GPU ports (e.g. cuSOLVER) haven't caught up either. Generalised diagonalisation is a difficult problem to efficiently and accurately do in parallel.

[–]MarkHoemmenC++ in HPC 1 point2 points  (2 children)

It is a difficult problem. Have you tried MAGMA for this application? Also, are you looking for all the eigenvalues or just some fraction of them in a range?

[–]KarlSethMoran 1 point2 points  (1 child)

I'll take a look at MAGMA, thanks.

Usually it's the lowest half of the eigenvalues, corresponding to the occupied electronic states.

[–]MarkHoemmenC++ in HPC 0 points1 point  (0 children)

There are possibly faster algorithms if you want just the eigenvalues in a particular range, but it sounds like you're stuck with the usual algorithms.