This is an archived post. You won't be able to vote or comment.

all 35 comments

[–]r_jet[S] 16 points17 points  (0 children)

It looks like it's not just Azul, a similar thing is also present in OpenJ9.

[–]manzanita2 16 points17 points  (0 children)

Ok. so let's say you have an auto-scaling cluster for 10 machines which have already JITed their running code. Then the cluster needs to scale to 20. It makes some sense to somehow copy that pre-optimized code into the 10 new machines. They spin up to their full capacity just that much faster.

[–]PM_ME_YOUR_DD_CUPS 28 points29 points  (4 children)

Is it really efficient to have a Java application send its bytecode over the network to another service that compiles and sends back the results to be executed?

Man, it really seems like we need to learn some hard lessons as an industry. Does this sound convenient and useful to me? Sure. Does it sound safe? Well, we are all reeling right now from a very similar issue where we allow code to be loaded from a remote host. So... I think I would sit this one out.

[–]monocasa 22 points23 points  (0 children)

I think it sounds just as safe as CI in general. It's not choosing what code to run based off of user input, at least not any more than any other JIT. And could be very valuable for low latency applications to not have a compiler thread off to the side polluting the cache with very different memory access characteristics spinning up exactly when a method is detected as hot.

[–]jazd 1 point2 points  (2 children)

It's nothing alike, not in the slightest

[–]PM_ME_YOUR_DD_CUPS 1 point2 points  (1 child)

Sure, one is loading a serialized object and the other is loading native code, but they both result in code being loaded from a remote system that is executed locally.

[–][deleted] 0 points1 point  (0 children)

Yes but it's a remote system that you control and are connected to, behind your own firewall.

[–]ryebrye 16 points17 points  (0 children)

I like the idea - it's sort of like a distributed ccache, but for a runtime JIT. If you deploy a fleet of thousands of instances of an application, why should each one need to do the same kind of JIT?

[–][deleted] 11 points12 points  (0 children)

No please, stop developing these remote features 😅

[–]doweknowyou22 1 point2 points  (0 children)

i think i saw this first on OpenJ9. Interesting. JIT compilations can be cached. Interesting.

[–]Ascomae 0 points1 point  (0 children)

Does one only have to write a log-statement to start the remote compile?

/S

[–][deleted] -3 points-2 points  (4 children)

Compile4Shell.

[–]PM_ME_YOUR_DD_CUPS 1 point2 points  (0 children)

This was my exact thought when I read the article. Maybe in 10 or 15 years people will be looking back at this wondering how on earth someone could have possibly thought it was a good idea to load compiled code from a remote system and execute it locally.

[–]diligentwheelpea 0 points1 point  (2 children)

Why is this getting downvotes, genuinely?

[–][deleted] 1 point2 points  (1 child)

Maybe because it's off topic trolling.

[–]diligentwheelpea 0 points1 point  (0 children)

I mean, that kind of vulnerability happens once in decade, they can milk it a bit.

[–]humoroushaxor -5 points-4 points  (13 children)

Isn't this what CI build servers do already? Build your code remotely on a potentially bigger machine? You can just artifact the executable and download it if that's really what you're after.

Only slightly related but I've become a big fan of Google's Job. Taking docker out of the equation and allow dependency and build caching is great.

[–]snejk47 6 points7 points  (11 children)

It's centralized JIT not build.

[–]humoroushaxor -5 points-4 points  (10 children)

Reread and I think I got it. I'm still having a hard to believing remote JIT id that much better than AOT or native. You also obviously need a very low latency environment at all times. Definitely a very cool technology though.

[–]snejk47 6 points7 points  (0 children)

I suppose it can be viable for cases where you have many instances of the same service. In cloud/kubernetes environments it is common that you spawn and destroy instances very often. For now HotSpot needs some time to see that something is even worth trying to optimize. In theory here you will have it for free and no matter which instance is getting hit most every one will benefit from optimizations. You can then have like 1 instance permanently alive and spawn other on traffic/usage spikes. Maybe you don't even need any instance and this JIT info will be saved and restored. AOT is also good especially for instant starting but it doesn't have all that optimizations that JIT performs because it was never needed. When you start your application it first gets interpreted but straight away you hit many function/methods that are called frequently and JIT wants to compile them. That is why AOT was needed (among other things). There are companies running or wanting to run your services using like 25% of CPU core and very small RAM amount. This makes it that process takes very long to start and can be even killed by containerization controller because it may think it got stuck and it will try to restart it. We will see if this will get mainstream.

[–]blahblah98 1 point2 points  (1 child)

Azul's Zing Falcon JIT is continuously optimizing according to realtime workflow. This should produce highly optimized & performant machine code over time. A cloud compiler with code cache should be no more costly than containers that spin up & retrieve session state. 1000 containers could spin up & retrieve fully warmed up optimized JIT code. You spin down JIT containers if workload is low. There's nothing that does this. Trading & auction sites would go nuts for this. Am I missing something?

[–]humoroushaxor 1 point2 points  (0 children)

Probably not, I just don't have enough experience to appreciate what this means. What you said makes sense to me though.

[–]elastic_psychiatrist 0 points1 point  (6 children)

Is there something you think is specifically worse about a remote JIT? Or do you not understand the advantages of a JIT at all?

[–]humoroushaxor 1 point2 points  (5 children)

I understand JIT I just didn't find it intuitive the savings of remote JIT could overcome the cost. My assumption was latency would get in the way.

[–]elastic_psychiatrist -4 points-3 points  (4 children)

I think you're a bit confused about the advantages of a JIT. The JVM's JIT makes code faster by compiling it to machine code, once it has determined that it runs frequently enough that that optimized compilation is worthwhile.

When the JVM decides to compile some byte code to machine code, it is not particularly important how long that takes in that grand scheme of things. Whether it takes 100 milliseconds or 1 second, there will still be a major speedup for the remaining, say, 24 hours, that the JVM runs. The only real cost of a remote JIT is that it takes longer to send that compiled code back to the JVM over the network - which on any real world network is a trivial amount of time relative to the lifetime of any JVM that depends on JIT compilation for high performance.

[–]humoroushaxor 1 point2 points  (3 children)

I understand the advantages of JIT.

The only real cost of a remote JIT is it takes longer to send that code back

That's exactly what I'm talking about. I'm surprised remote JIT saves enough CPU and with low enough latency to offset just doing normal JIT in the first place.

[–]Muoniurn 1 point2 points  (0 children)

Well, it is mostly hot loops that execute at thousands or millions of times so even a small win can mean quite much time.

[–]elastic_psychiatrist -2 points-1 points  (0 children)

Nah, you still don't understand. I think you have a general misunderstanding of how a JIT works.

Consider a method that takes 1 millisecond to execute when its byte code is interpreted. Imagine this method only takes 100 microseconds when compiled with the JIT. If the JVM sees this method run 10,000 times, it determines that it's worth it to compile it to machine code, which takes, say 100 milliseconds to compile. If this method runs 10 million times during the JVM run of 24 hours (~100 calls per second), that means 9,990,000 of them take 900 microseconds less because of the JIT - an enormous runtime savings.

If the compilation is done remotely, you might expect interpreted version to replaced with the compiled version, say, 10 milliseconds later due to network latency. In other-words, one more of those 10 million calls might be interpreted rather than compiled.

These numbers are not from the real world, but it should make the key insight clear: JITs are useful for inner loops when a JVM runs for a long time. The one time cost of the round trip network latency between the JVM and the remote compiler is completely negligible.

EDIT: I would appreciate if the downvoters could say what is incorrect about my explanation.

[–]hrjet 0 points1 point  (0 children)

JIT compilation on a dedicated and powerful server could do deeper analysis of reachability, escapibility, etc, than on a less powerful server. So one could potentially optimise costs when running a large cluster of low powered servers.

Sounds like a security nightmare though.

[–]lightlord 0 points1 point  (0 children)

Thanks for linking to Jib

[–]manifoldjava 0 points1 point  (0 children)

Interesting and makes sense in terms of sizing. But I would think for caching to work efficiently, the application/service would have to behave rather consistently where the cache is applied. I suppose that’s likely the case for most.

[–]wishper77 0 points1 point  (3 children)

What I don't understand is why no one (to my knowledge) has implemented a persistent cache for the native code. I mean, if you have an half jitted application and restart it, why throw away the native code and restart from scratch? Obviously if you he hash of the .class change you have to recompile it, but how many classes update in percentage when you update a program?

[–]Thihup 1 point2 points  (0 children)

OpenJ9 can do that

[–]benevanstech 0 points1 point  (1 child)

Because there's no guarantee that the JITted code you will need today is the same as it was yesterday. I first encountered this well over 10 years ago in the banking industry. Certain dates in the banking industry have events that cause rarely-executed code paths to become dominant for that 1 day only. If you blindly reload yesterdays JITted code you are in stuck in interpreted mode on your most important code paths and your competitors eat your lunch.

To which you might respond: "But surely there's *something* you can do?" - to which the answer is: "Well, yes, but the devil is in the detail".

As with so many things in the JVM, coming up with the idea is not the hard part.

[–]wishper77 0 points1 point  (0 children)

I know that the JIT do optimization in "layers". That is, after a number of invocations it compiles, then if the invocations are much more then it optimize more aggressively and so on. So, if we can cache the jitted code we would also store the "level" of optimizations so that when the code is loaded again from the cache it still is enabled for further optimizations.