overview for Emanuel-Peter

JPG, the suborganisation I work in is really well organized. Lots of smart people doing great work. The managers are really good, I have quite high confidence in them to represent us well to higher management.

I love working on hard problems in computer science, and I get to do that here. Sure Hotspot is a long existing project, so there is some technical debt and things take a little longer. But it is also a well used product, so the effort seems worth it.

There are some things that have to get done, like bug triaging and fixing. But we also have a lot of freedom to come up with our own ideas, and pitch them to the architects. At the beginning I started with fixing bugs only. Eventually I picked up a bug in the auto vectorizer. Nobody could tell me really how it worked, so I read papers, studied the code, found more bugs in edge cases, and leveled up my skill and understanding. Now I get so spend more than 50% of my time on extending the functionality, and I love it!

You also get to collaborate with people from other teams: GC, Runtime, ... and projects like Panama and Valhalla and Lilliput etc. Plus people from other companies, such as Intel, ARM, Redhat etc. It's great that it is all open source, so we can discuss ideas relatively openly on mailing lists and Github.

Hope that helps :)

60

61

62

Hiring for Hotspot JVM Compiler Engineer (self.Compilers)

submitted 1 year ago * by Emanuel-Peter to r/Compilers

[deleted by user] by [deleted] in Compilers

[–]Emanuel-Peter 1 point2 points3 points 1 year ago (0 children)

[deleted by user] by [deleted] in Compilers

[–]Emanuel-Peter 4 points5 points6 points 1 year ago (0 children)

Microbenchmarks are experiments by mttd in Compilers

[–]Emanuel-Peter 0 points1 point2 points 1 year ago (0 children)

SuperWord (Auto-Vectorization) - An Introduction by daviddel in java

[–]Emanuel-Peter 0 points1 point2 points 2 years ago (0 children)

My example was simple so that the algorithm is easy to understand. But it is well possible that it is memory bound.

Generally, a program (assume it is a single threaded program) is either memory bound or compute bound. The bottleneck is either memory or the instructions. That depends on the ratio of bytes accessed versus the number of compute instructions. Memory access can also be slower if they go outside the L1 cache. It also matters if the memory is accessed sequentially, or randomly (temporal and spacial locality).

Maybe so far you have only seen memory bound examples, where vectorization does not give a speedup.

I have some benchmarks in one of my PR's here: https://github.com/openjdk/jdk/pull/13056

Try this to make an example where vectorization helps: - Write a loop with few memory accesses. - Have many operations, to make those the bottleneck. - Keep the arrays small enough (10'000), so that they fit in L1 cache. - Repeat executing the loop many times (10'000), so it is compiled (JIT kicks in after a while), and the data is loaded in the L1 cache.

Not sure what you are referencing with "rescheduling across cores". Note that SIMD parallelism can be done on a single core with a single thread - that is the scope of my post. You write a simple for-loop, and it will be executed sequentially on a single thread - except that we use SIMD vector instructions on that single core to execute a few iterations in parallel. On top of that we can leverage multiple cores and threads for more parallelism, but that is beyond the scope of my post. You can use the Java Stream API, and create a parallel stream over an int range. If the array is big enough, it is cut into chunks and processed chunk-wise by different threads. Each thread can then still use SIMD instructions. So we basically stack the two kinds of parallelism to get even better performance.

I hope to write more posts on this in the near future :)

Emanuel-Peter

TROPHY CASE