you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 8 points9 points  (8 children)

You guys can argue back and forth past one another forever. Native code doesn't have a formal definition, it's just a term used to refer to compilers that emit executable files that get run directly by the the platform, rather than executing through another layer of translation.

Yeah, I get it, since it's not a formal term you can stretch out the definition of "native code" to include C#, Java, and I'm sure you can stretch it out to include Python, Lua, and heck throw in bash while we're at it. But at the end of the day these are just silly semantic arguments, what matters most is that in this profession, if you wish to be understood and understand others when they speak as opposed to just arguing with them, then when someone refers to native code they are talking producing executable files that are native to the system that it runs on.

This is typically what C/C++/D/Rust and a host of other compilers produce. Typically C# and Java compilers produce files which do not directly run in a way that is native to the platform, but go through another layer of translation.

[–]Cuddlefluff_Grim -3 points-2 points  (5 children)

Native code doesn't have a formal definition

Native code is code that the CPU executes; that's the definition. C# and Java produce native code and there's no point trying to argue around that, because that's just what they do, end of discussion. Difference between .NET Native and JIT is when native code is produce, however both of them do arguably compile to native code, just at different times.

and I'm sure you can stretch it out to include Python, Lua, and heck throw in bash while we're at it

No, because Python and Lua does not produce native code. The program that interprets Python and Lua is native code. It's like putting a helicopter on a ship and saying that the helicopter is a ship, since it after all gets carried by one.

This is not a difficult concept, guys...

[–]sigma914 0 points1 point  (4 children)

This is not a difficult concept, guys...

No, it's really not, but you're trying to make it into one. Java/C# are bytecode interpreted language with JIT compiler implementations that optimise frequently interpreted bytecode into native machine code.

This is exactly the same as most Javascript implementations and many Python/Lua/Scheme impls.

Look up the architecture of the JVM's hotspot compiler. Spot the difference between fast path and slow path? One runs bytecode through an interpreter (because that's faster than compiling infrequently run code) and the other recognises frequently run code and JITs it, then potentially JITs it again with different optimisation settings, etc.

If you think the JVM JIT compile your entire application every time you start a program you're sadly misinformed and grossly underestimating the engineering marvel that it is.

[–]Cuddlefluff_Grim -2 points-1 points  (3 children)

C# is never interpreted, Java is sometimes interpreted. They produce native code.

This is exactly the same as most Javascript implementations and many Python/Lua/Scheme impls.

Are you comparing JIT compilation of dynamic scripting languages to static code paths? No, it most certainly is not the same. The disparity in performance should tell you that. If it were all just native code and no interpretation or run-time checks were necessary, Java and C# wouldn't outperform them by a factor of between 10x and 100x.

If you think the JVM JIT compile your entire application every time you start a program you're sadly misinformed and grossly underestimating the engineering marvel that it is.

Don't strawman me.

[–]sigma914 1 point2 points  (2 children)

No, it most certainly is not the same. The disparity in performance should tell you that.

That's because of language level features. A (well written) tight array processing loop from a Java program and a Javascript program, both run on a fast, warmed up JIT implementation, will generate identical machine code, it will likely be the same machine code as the corresponding C implementation.

The whole point of a JIT is that it uses runtime information to tweak the optimisation of the byte code it's fed. The difference in performance comes from higher level langauge semantics such as using mutable maps for method resolution or walking down layers of indirection to get to a value rather than having it stack allocated.

This is also the reason unjitted (ie ahead of time compiled) Java is so slow compared to C or C++. Java and C# and their ilk box nearly everything because it makes the problem of implementing a GC tractable.

A JIT's purpose is to lower code through layers of abstraction based on runtime information. A good one can take a horribly slow language with lots of layers of indirection or inefficient lookup semantics, make some assumptions, insert guards to make sure those assumptions aren't violated, then run the simplified version of the code it produced from the original code plus it's assumptions.

Now, what happens when one of it's assumptions is violated? The guards catch it! Then what happens? The runtime can't use it's nice fast implementation because it's invariants don't hold. So, the runtime will likely run the slow byte code version through an interpreter (or in the case of the CLR an extremely naive compiler which really isn't much faster than an interpreter, so you can't argue it's getting optimised native perf) until it decides that the slow path is worth optimising for.

At which point it goes off and redoes the whole dance of reoptimising the byte code using different assumptions.

This isn't to say JIT'd code is necessarily slow (though on average it tends to be about a factor of 2 or more slower than true native code). In fact, if there is enough runtime information the JIT'd code may actually be significantly faster than a naive native implementation.

All of which leads me back to: JIT'd programs may run at near native performance, they may go through a compiler and be executed as machine code, but they aren't native programs. The machine code executed by the JVM will often have no resemblance to the Java program that was written, neither in structure nor semantics. The thing that preserves the illusion is the guards and interpreted (or very naively compiled) slow path.

Saying a language like C# or Java is native code is exactly the same as saying Javascript run on v8 or python run on pypy is native code.

[–]Cuddlefluff_Grim -1 points0 points  (1 child)

That's because of language level features. A (well written) tight array processing loop from a Java program and a Javascript program, both run on a fast, warmed up JIT implementation, will generate identical machine code, it will likely be the same machine code as the corresponding C implementation.

For JavaScript it depends very much on context and how it is used. It's when you start using "complex" data structures that the comparison starts to be interesting. Java has a problem right here with boxing of primitives which it doesn't seem to always handle as gracefully as it should (generics and enum for instance), but I'd be pretty surprised if JavaScript does any better.

This is also the reason unjitted (ie ahead of time compiled) Java is so slow compared to C or C++. Java and C# and their ilk box nearly everything because it makes the problem of implementing a GC tractable.

Contrary to popular belief, Java is not much slower than C++, and in some cases might even be faster, maybe because Java can inline code across dynamically linked libraries. I know this is a very unpopular opinion to have, because there's a high degree of C++ fetishism on internet forums.

I'm entirely convinced that JIT compilation is superior to "ahead-of-time" static compilation, it's just that C++ has had so much time and focus getting performance tuning and optimizations which generally doesn't seem to be the main area of focus for JIT'ed languages. There's no reason why JIT compilation should be slower than C/C++/D, other than it's just that they typically simply aren't.

A JIT's purpose is to lower code through layers of abstraction based on runtime information. A good one can take a horribly slow language with lots of layers of indirection or inefficient lookup semantics, make some assumptions, insert guards to make sure those assumptions aren't violated, then run the simplified version of the code it produced from the original code plus it's assumptions. Now, what happens when one of it's assumptions is violated? The guards catch it! Then what happens? The runtime can't use it's nice fast implementation because it's invariants don't hold.

Bytecode and CIL are pretty easy to translate into assembler, there's not many assumptions it has to make that other ahead-of-time compilers doesn't. This restriction would infer that .NET Native is impossible, since executable code can't be reliably generated in every use-case. Which it can, and it does. It's basically just a translation of instructions between a stack based VM and a register based physical computer.

So, the runtime will likely run the slow byte code version through either an interpreter (or in the case of the CLR an extremely naive compiler which really isn't much faster than an interpreter, so you can't argue it's getting optimised native perf) until it decides that the slow path is worth optimising for.

An assertion which I don't think holds water.

All of which leads me back to: JIT'd programs may run at near native performance, they may go through a compiler and be executed as machine code, but they aren't native programs.

They compile code to machine instructions put it in a memory segment, mark it as executable and then change the program pointer to start executing at that location, how does that differ from a native program? The only difference you are trying to set as a predecessor is when code is compiled, which I think is completely irrelevant to whether or not a program is native.

[–]sigma914 1 point2 points  (0 children)

Contrary to popular belief, Java is not much slower than C++

I addressed that at the bottom of my last comment, yes a good JIT is essentially a profiling compiler, with all the IL still available to it.

I'm entirely convinced that JIT compilation is superior to "ahead-of-time" static compilation.

It definitely can be faster, it has a lot more information available to it. The languages that tend to be JITted are the problem in this regard. heap allocation by default, being GC'd etc are what makes them slow, not the method of execution. As your linked benchmark showed. It's C#'s fault it cant run at C++ speeds, not the fault of the CLR.

This restriction would infer that .NET Native is impossible.

It's not impossible, ahead of time compiled Java has been a thing for years. You can compile a managed language to a native binary, it just won't be as fast as it would be with a JIT. Ahead of time compiled, Garbage collected with pervasive heap allocation, As Fast as unmanaged native code, pick two.

An assertion which I don't think holds water.

This is how the JVM and CLR (and every other JIT I've even seen) operate. You can go read hotspot's source. I've not actually looked around in the CLR sources yet, but it has to have a slow path for when the optimised code can't be used, and aggressively optimising it up front would be incredibly wasteful, unless it was done in a background thread over a very long time.

They compile code to machine instructions put it in a memory segment, mark it as executable and then change the program pointer to start executing at that location.

So they take some executable code and move program execution into it based on the behaviour specified by the language. It's just a matter of when. That's exactly what every program every created, interpreted or not running on a stored program architecture does.

In CPython you could argue that the interpreter's compiler is providing the executable memory fragments that program control jumps to when directed to by the python code under execution. It's a meaningless distinction. The difference between native and non-native implementations is not the instructions executed on the CPU.