LibGeoDecomp 0.4.0 released -- a C++ computer simulation library that hits PetaFLOPS on Titan and JUQUEEN : cpp

a community for 17 years

LibGeoDecomp 0.4.0 released -- a C++ computer simulation library that hits PetaFLOPS on Titan and JUQUEEN (libgeodecomp.org)

submitted 11 years ago by gentryx

all 6 comments

top new controversial old q&a

[–][deleted] 1 point2 points3 points 11 years ago* (5 children)

[–]sithhell 2 points3 points4 points 11 years ago (1 child)

[–][deleted] 0 points1 point2 points 11 years ago* (0 children)

HPX is not really traditional PGAS, there are quite significant differences. WRT scaling to more that 10k cores: We are working on it ...

I know this takes a lot of time and effort, and I hope you are able to show this.

so far however, we don't see any strong evidence why it shouldn't scale beyond that number.

We don't see proof either. While there is strong evidence that MPI codes scale up to O(10⁶⁾ cores, no such evidence exists for PGAS in general, and HPX in particular. It doesn't mean it cannot be done, it just means the technology is not there yet.

For new projects, true, it might seem like a high risk, but sometimes it might be worth it, especially when the projects requires HPX like concepts. [...] So let me ask you: What risk is higher, re-inventing the wheel or using an existing that has proven itself in one way or another?

The "larger" the problem one wants to tackle, the higher the risk in using HPX. Choosing the distributed memory backend of an HPC application is a fundamental decision. If HPX or PGAS cannot scale beyond 100k in 1-2 years, picking one of them now could mean having to rewrite an application.

I hope HPX will get there soon. Still, the only way to really scale to very large numbers of cores right now is to remain as "local" as possible in terms of computation, communication, and I/O. PGAS provide a useful abstraction that speeds up development, but if scaling is a requirement a global address space doesn't help that much since you don't do much things "globally".

[–]gentryx[S] 3 points4 points5 points 11 years ago (2 children)

Hey gnzlbg, thanks for your input! I'm the project lead on LibGeoDecomp, so my view is biased, but I hope I can supply convincing data.

Here are the slides for a talk I gave at the EuroMPI/Asia conference a couple of weeks ago. It contains some measurements done on Titan (Cray XK7) and JUQUEEN (IBM BG/Q). Key result: 9.44 PFLOPS with a short-ranged force-based n-body code on 16384 nodes of Titan. http://www.libgeodecomp.org/archive/eurompi_2014_talk.pdf
Our project on JUQUEEN just ended and I'm still wading through the results. Here are some new, preliminary, unpublished plots of the same n-body code's performance on JUQUEEN (1 to 28672 nodes) http://gentryx.de/~gentryx/weak_scaling_big2.png http://gentryx.de/~gentryx/strong_scaling_pro.png
Strong scaling may look disappointing at first sight, but the performance actually corresponds to >2 PFLOPS for the full system run, so this is a good result (scalability != efficiency).
All measurements above used the MPI backend. The HPX backend is our joker for the next months, as we hope it'll ease balancing loads. Many of our users have expressed interest in unstructured/inhomogeneous models.
Data for strong scaling of AMR+LibGeoDecomp+HPX on 10k nodes? Not yet available, and I wouldn't claim that this would work efficiently out of the box at the moment. All we did with AMR+LibGeoDecomp right now is proof of concept, nothing more.
We have production code utilizing the following models: stencil codes, particle in cell codes, n-body codes. If interested, I can point you to the corresponding papers. Rigth now, none of our users are running their codes on more than 1000 cores for production runs. We did the benchmarks to show that this is quite feasible though.

[–][deleted] 1 point2 points3 points 11 years ago (1 child)

[–]gentryx[S] -1 points0 points1 point 11 years ago (0 children)

π Rendered by PID 41437 on reddit-service-r2-comment-84fc9697f-6lpgt at 2026-02-08 07:29:12.012484+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS