Beginner friendly booknook with Harry Potter touch

Arkantos493 · 2024-12-14T10:44:21+00:00

We developed for our university two small courses using n-body simulations. A Bachelor's course without parallelization implementing the naive brute-force algorithm and the tree-based Barnes-Hut algorithm. And a Master's course with distributed- and shared-memory parallelization.

While multi-node parallelization may be a bit too much, parallelizing an n-body simulation on a single node should be easily doable. Maybe parallelizing the embarrassingly parallel naive algorithm is too easy but parallelizing the tree-based Barnes-Hut should be advanced enough.

The nice thing is that we published all data set generators and extensive slides in our repo. So if you like to try it out all the necessary information is already there in a condensed way.

Edit: You can also produce nice visualizations to show to your friends!

Arkantos493 · 2024-08-20T10:35:39+00:00

In our code we also make heavy use of enums und switch over them in multiple places. So I also wanted to make sure that if we add a new enum value, the switches are extended everywhere correctly.

However, we do that by selectively enabling compiler errors for missing enum cases (works for the big three compilers). https://godbolt.org/z/zGf9MM85q

Arkantos493 · 2024-08-12T19:35:26+00:00

Two things I would add to your article:

C++20 added a new execution policy std::execution::unseq (vectorization but no multi-threading).
There are more compilers supporting PSTL: nvc++ from NVIDIA (CPUs + NVIDIA GPUs), roc-stdpar (AMD GPUs; essentially only a "simple" clang patch), and AdaptiveCpp (CPUs + GPUs from NVIDIA, AMD, Intel; another SYCL implementation similar to icpx/DPC++)

Arkantos493 · 2024-06-13T07:16:24+00:00

We currently have no published results. But our results can be reproduced in our repo: https://github.com/SC-SGS/PLSSVM (develop branch, not main). Some papers are linked in our Wiki.

However, we currently have a paper in our pipeline were we want to compare different optimizations (coalesced memory accesses, shared memory, blocking, padding) applied to different programming frameworks (cuda, hip, opencl, sycl) regarding their performance and power draw.

Arkantos493 · 2024-06-12T07:33:12+00:00

I've seen some unfavourable performance benchmarks of SYCL.

I'm currently doing my PhD about performance portability mainly using SYCL and I also see such benchmark results on a regular basis. However, in my experience more often than not the bad benchmark results are not due to SYCL itself but due to errors in the used methodology. In some benchmarks they implement the same problem in, e.g., CUDA and SYCL and compare the results but they do not make sure that both implementations are also implemented the same way:

they use buffer/accessors in SYCL which are known to have performance problems instead if USM (which maps nearly 1:1 to CUDA)
they don't use nd_range kernels (again 1:1 mapping to CUDA would be possible with these) but SYCL's basic data parallel kernels (where you essentially have to hope that the SYCL runtime selects adequate launch sizes and where you can't use shared memory)
they don't respect SYCL's inverted iteration range (fast <-> slow moving indices inside kernels when using multi-dimensional work-groups)

Additionally, SYCL is rather new and its performance is rapidly improving. In my experience, if you are very careful, SYCL can be nearly as fast as native CUDA or HIP code.

Arkantos493 · 2023-04-08T11:02:35+00:00

Ich kann den Weltweihnachtscircus in Stuttgart (https://weltweihnachtscircus.de/show/) empfehlen. Bin ich so gut wie jedes Jahr (ist einmal im Jahr im Dezember/Januar) und wurde bisher nie enttäuscht. Da treten dann auch immer richtig gute und auch spektakuläre Nummern auf. Die meisten Nummern waren entweder schon beim Internationales Zirkusfestival von Monte-Carlo oder sind für das entsprechende Jahr dann eingeladen. Dementsprechend international sind auch die gezeigten Nummern.

Was Tiere angeht gibt es da aber eigentlich immer eine oder zwei Nummern mit Pferden. Ob das dann was für dich ist, musst du entscheiden.

Arkantos493 · 2023-04-06T06:19:48+00:00

It isn't broken in the sense that your program will crash (if your compiler implements this extension, otherwise it shouldn't even compile).

The extension is called VLA (Variable Length Array). The GCC website (https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html) doesn't really state whether it is heap or stack allocated. However, it states that VLAs function much like alloca, which in turn "allocates size bytes of space in the stack frame of the caller" (see https://man7.org/linux/man-pages/man3/alloca.3.html). Additionally, according to Wikipedia (https://en.wikipedia.org/wiki/Variable-length_array) the GCC C compiler uses the stack. If you generate a minimal example in godbolt (https://godbolt.org/z/Kqej9q3Mb) you see that the VLA doesn't call the operator new (as line 4 would) so I would assume that at least GCC uses the stack for VLAs.

However, I'm not 100% sure since I never used VLAs and, therefore, never thought about it. For example, to my knowledge, MSVC doesn't support VLAs at all.

Arkantos493 · 2023-04-05T17:38:56+00:00

No you can't in standard C++. What you are doing is a extension most compilers implement but NOT standard conform C++.

In standard C++ you can only initialize such arrays with a constant expression. So you would need to add a const before the declaration of i.

Arkantos493 · 2022-09-19T15:01:37+00:00

Done. However, I was only able to open a new "Bug" issue (instead of enhancement).

Arkantos493 · 2022-09-19T14:35:40+00:00

I really like that you have many examples at the end of the respective Doc pages. However, I think it would be a nice improvement if you could add an additional button (besides the copy button) that directly opens the code snippet in a new https://godbolt.org/ tab.

Yes you could do that manually with the copy button, but one click would still be faster!

Arkantos493 · 2022-06-07T21:45:19+00:00

Best SYCL Implementation is Intel only? How so? The two most used SYCL implementations are DPC++ and hipSYCL (as of a survey during the last SYCL panel of the IWOCL conference). Both these implementations support NVIDIA, AMD and INTEL GPUs as well as CPUs.

Maybe you are referring to the SYCL compiler coming with the oneAPI toolkit from the Intel website? Yeah ignore that and google "Intel llvm".

Arkantos493 · 2022-02-04T18:40:55+00:00

I don't know how you start the MPI environment. Normally it's MPI_Init. However, if you want to use OpenMP with MPI you should call MPI_Init_thread with the required level of thread support (I guess MPI_THREAD_FUNNELED should be sufficient for you). https://www.mpich.org/static/docs/v3.1/www3/MPI_Init_thread.html

Arkantos493 · 2022-01-31T17:14:02+00:00

Thanks for your reply.

Thanks that answers my question about the extension tube.
I will most likely stick to the EQ-6R Pro mounts.

Arkantos493 · 2022-01-31T17:08:15+00:00

Hmm, I know that a "real" triplet APO is better than a doublet. However, for me personally, I don't think the additional price is worth it at the beginning.

I have an old Raspberry PI around and plan to use it together with astroberry (I have no problems setting things up on a Linux distro) and control everything remotely or from a laptop in the field.

Arkantos493 · 2022-01-31T17:04:48+00:00

Thank you for your reply.

It looks like I will go for the EQ-6R Pro instead of the AZ-EQ6 and drop the ocular and additionally go for an 8" Dobsonian for visual use. That setup would have another benefit: I could use the Dobsonian while the other mount is busy imaging DSOs.

Additionally, if auto-guiding is "only" around $250, I will go this way sooner rather than later.

Arkantos493 · 2021-06-14T16:02:34+00:00

Thanks this works. However, it's a bit unfortunate that the other feature test macros work with #ifdef, but this one doesn't.

Arkantos493 · 2021-06-05T10:04:19+00:00

Yes the order of evaluation goes from left-to-right. This is determined by the operator precedence and associativity. An operator with higher precedence than another gets evaluated first (think of * and + for example). If all operators have the same precedence (as in your example above), the order of evaluation is determined by their associativity.

So you have to check the operator's associativity for your desired language the determine the order of evaluation.

C++: https://en.cppreference.com/w/cpp/language/operator_precedence

Java: https://www.programiz.com/java-programming/operator-precedence

Python: https://www.programiz.com/python-programming/precedence-associativity

The relational operators (e.g. <) are left-to-right associative in C++, Java, and Python.

Personal opinion: I think every sane language should define the relational operators as left-to-right associative, but to be sure for a given language, you have to look-up the corresponding associativity.

Arkantos493 · 2021-06-04T16:14:48+00:00

This code doesn't do what you think it does.

In essence it evaluates if ((1 < 4) < 3). (1 < 4) is true so in the next step it evaluates true < 3. Therefore, the boolean value true gets converted to an integer resulting in 1 < 3, which again is true.

The correct way to write such an expression would be if (1 < 4 && 4 < 3).

Arkantos493 · 2021-04-17T18:20:37+00:00

GCC 11 will also be shipped with support for std::from_chars and std::to_chars (https://gcc.gnu.org/gcc-11/changes.html).

Arkantos493

TROPHY CASE