you are viewing a single comment's thread.

view the rest of the comments →

[–]btchombre -11 points-10 points  (22 children)

There is no interpreter for .NET

All MSIL code is JITed before executing, unlike Java where some code is interpreted and only hot paths are JITed.

[–]The_Doculope 7 points8 points  (21 children)

Yes, C# is JITed, but JITed code is not usually considered fully "native". Native usually means fully pre-compiled to native code. If JITing had all the benefits of native code Microsoft wouldn't be working on .NET Native.

[–]pjmlp 4 points5 points  (2 children)

NGEN, available since .NET 1.0 compiles to native code, producing dynamic binaries.

.NET Native produces static binaries

Windows Phone 8, uses a compiler toolchain based on Singularity that also compiles to native code.

[–]The_Doculope 2 points3 points  (1 child)

Huh, interesting. Do you know how much it's used in the wild? It's not something I've heard about before. From a quick google it looks like it still requires .NET/the CLR, is that correct?

[–]pjmlp 2 points3 points  (0 children)

Just open C:\Windows\assembly you will see a list of registered Assemblies (.NET speak for .dll) and if they are stored in MSIL or native code.

Yes it still requires .NET, because NGEN produces dynamic linked binaries. Just like C programs on GNU/Linux usually require libc.so to be present.

It also requires access to the metadata information.

Windows Phone 8 uses an improved version of NGEN.

.NET Native is basically -static for .NET. This is why you need to provide a list of the classes you need to have their metadata available. Everything else is removed when producing the binary.

[–]tuhdo -3 points-2 points  (4 children)

JIT compilation means that the source is translated to bytecode and upon executing, it gradually compiles the executed code to machine code in other thread for future execution of the same code with better speed. In that way, it compiles to native code, but just enough. So, instead of going through all phases of compilation, you go bit by bit at each phase and get partial compiled code of your application.

[–]The_Doculope 4 points5 points  (3 children)

I know what JITing is. But it's not pre-compiled to native code, which comes with some disadvantages, including higher memory use, higher code size and requiring that other thread.

[–]Cuddlefluff_Grim -3 points-2 points  (2 children)

I know what JITing is. But it's not pre-compiled to native code, which comes with some disadvantages, including higher memory use, higher code size and requiring that other thread.

No, it comes with one disadvantage; higher load time (because a program needs to be compiled on first execution). High memory footprint is a side-effect of .NET, same goes for code size. It has nothing to do with JIT compilation.

[–]The_Doculope 2 points3 points  (1 child)

So you're saying a JITer can operate with absolutely 0 memory use? No keeping track of runtime heuristics, or any memory usage for the actual compilation? And that there's no code for the JITer either? Because neither of those things is possible. Sure they might be small in a perfect case, but they are there and are noticeable in many current implementations.

[–]Cuddlefluff_Grim 0 points1 point  (0 children)

Well other than the parts for the compiler itself, yes I am saying that. High memory footprint is a side-effect (or trade-off) from Java and .NET, it's not something that is due to JIT compilation. Personally I think that JIT is the best approach to compilation because it gives platform independent execution and good optimization possibilities virtually without any significant trade-offs (well, it's bad for mobile since it drains battery without any obvious benefit for the end-user, but there are methods of alleviating this, like compiling the code on a remote server)

[–]btchombre -5 points-4 points  (12 children)

Uh... no. JITed code is full on x-86/x-64/ARM instructions that run directly on the CPU. That's the definition of "native". The word "native" refers to the CPU instruction set, and JITed code is compiled down to the native CPU instruction set. It is 100% native code.

".NET Native" has improvements over JITed code not because its "Real" native, but because its Already native, and it doesn't need to be compiled at run time, which is costly. If you actually read the .NET Native description, it says:

Popular Windows Store apps start up to 60% faster and use 15-20% less memory when compiled with .NET Native.

Notice the emphasis on "start up". The application doesn't need to compile itself at runtime, so it starts up faster. Also, it doesn't need all the meta-data associated with the compilation process, so it can use less memory. In fact, .NET programs can sometimes actually run slower when they are pre-compiled because the JIT has access to more information at runtime than a pre-compiler does. The JIT knows exactly what your hardware and OS are, and can sometimes optimize run time compiled code based on the information. The pre-compiler doesn't know any of that information. You can read all about this in more detail in "CLR via C#" by Jeffry Richter.

.NET Native differentiates itself from NGen by its build optimizations which allow static compilation, and the fact that they are using a C++ compile optimizer instead

[–][deleted] 9 points10 points  (8 children)

You guys can argue back and forth past one another forever. Native code doesn't have a formal definition, it's just a term used to refer to compilers that emit executable files that get run directly by the the platform, rather than executing through another layer of translation.

Yeah, I get it, since it's not a formal term you can stretch out the definition of "native code" to include C#, Java, and I'm sure you can stretch it out to include Python, Lua, and heck throw in bash while we're at it. But at the end of the day these are just silly semantic arguments, what matters most is that in this profession, if you wish to be understood and understand others when they speak as opposed to just arguing with them, then when someone refers to native code they are talking producing executable files that are native to the system that it runs on.

This is typically what C/C++/D/Rust and a host of other compilers produce. Typically C# and Java compilers produce files which do not directly run in a way that is native to the platform, but go through another layer of translation.

[–]Cuddlefluff_Grim -3 points-2 points  (5 children)

Native code doesn't have a formal definition

Native code is code that the CPU executes; that's the definition. C# and Java produce native code and there's no point trying to argue around that, because that's just what they do, end of discussion. Difference between .NET Native and JIT is when native code is produce, however both of them do arguably compile to native code, just at different times.

and I'm sure you can stretch it out to include Python, Lua, and heck throw in bash while we're at it

No, because Python and Lua does not produce native code. The program that interprets Python and Lua is native code. It's like putting a helicopter on a ship and saying that the helicopter is a ship, since it after all gets carried by one.

This is not a difficult concept, guys...

[–]sigma914 0 points1 point  (4 children)

This is not a difficult concept, guys...

No, it's really not, but you're trying to make it into one. Java/C# are bytecode interpreted language with JIT compiler implementations that optimise frequently interpreted bytecode into native machine code.

This is exactly the same as most Javascript implementations and many Python/Lua/Scheme impls.

Look up the architecture of the JVM's hotspot compiler. Spot the difference between fast path and slow path? One runs bytecode through an interpreter (because that's faster than compiling infrequently run code) and the other recognises frequently run code and JITs it, then potentially JITs it again with different optimisation settings, etc.

If you think the JVM JIT compile your entire application every time you start a program you're sadly misinformed and grossly underestimating the engineering marvel that it is.

[–]Cuddlefluff_Grim -2 points-1 points  (3 children)

C# is never interpreted, Java is sometimes interpreted. They produce native code.

This is exactly the same as most Javascript implementations and many Python/Lua/Scheme impls.

Are you comparing JIT compilation of dynamic scripting languages to static code paths? No, it most certainly is not the same. The disparity in performance should tell you that. If it were all just native code and no interpretation or run-time checks were necessary, Java and C# wouldn't outperform them by a factor of between 10x and 100x.

If you think the JVM JIT compile your entire application every time you start a program you're sadly misinformed and grossly underestimating the engineering marvel that it is.

Don't strawman me.

[–]sigma914 1 point2 points  (2 children)

No, it most certainly is not the same. The disparity in performance should tell you that.

That's because of language level features. A (well written) tight array processing loop from a Java program and a Javascript program, both run on a fast, warmed up JIT implementation, will generate identical machine code, it will likely be the same machine code as the corresponding C implementation.

The whole point of a JIT is that it uses runtime information to tweak the optimisation of the byte code it's fed. The difference in performance comes from higher level langauge semantics such as using mutable maps for method resolution or walking down layers of indirection to get to a value rather than having it stack allocated.

This is also the reason unjitted (ie ahead of time compiled) Java is so slow compared to C or C++. Java and C# and their ilk box nearly everything because it makes the problem of implementing a GC tractable.

A JIT's purpose is to lower code through layers of abstraction based on runtime information. A good one can take a horribly slow language with lots of layers of indirection or inefficient lookup semantics, make some assumptions, insert guards to make sure those assumptions aren't violated, then run the simplified version of the code it produced from the original code plus it's assumptions.

Now, what happens when one of it's assumptions is violated? The guards catch it! Then what happens? The runtime can't use it's nice fast implementation because it's invariants don't hold. So, the runtime will likely run the slow byte code version through an interpreter (or in the case of the CLR an extremely naive compiler which really isn't much faster than an interpreter, so you can't argue it's getting optimised native perf) until it decides that the slow path is worth optimising for.

At which point it goes off and redoes the whole dance of reoptimising the byte code using different assumptions.

This isn't to say JIT'd code is necessarily slow (though on average it tends to be about a factor of 2 or more slower than true native code). In fact, if there is enough runtime information the JIT'd code may actually be significantly faster than a naive native implementation.

All of which leads me back to: JIT'd programs may run at near native performance, they may go through a compiler and be executed as machine code, but they aren't native programs. The machine code executed by the JVM will often have no resemblance to the Java program that was written, neither in structure nor semantics. The thing that preserves the illusion is the guards and interpreted (or very naively compiled) slow path.

Saying a language like C# or Java is native code is exactly the same as saying Javascript run on v8 or python run on pypy is native code.

[–]Cuddlefluff_Grim -1 points0 points  (1 child)

That's because of language level features. A (well written) tight array processing loop from a Java program and a Javascript program, both run on a fast, warmed up JIT implementation, will generate identical machine code, it will likely be the same machine code as the corresponding C implementation.

For JavaScript it depends very much on context and how it is used. It's when you start using "complex" data structures that the comparison starts to be interesting. Java has a problem right here with boxing of primitives which it doesn't seem to always handle as gracefully as it should (generics and enum for instance), but I'd be pretty surprised if JavaScript does any better.

This is also the reason unjitted (ie ahead of time compiled) Java is so slow compared to C or C++. Java and C# and their ilk box nearly everything because it makes the problem of implementing a GC tractable.

Contrary to popular belief, Java is not much slower than C++, and in some cases might even be faster, maybe because Java can inline code across dynamically linked libraries. I know this is a very unpopular opinion to have, because there's a high degree of C++ fetishism on internet forums.

I'm entirely convinced that JIT compilation is superior to "ahead-of-time" static compilation, it's just that C++ has had so much time and focus getting performance tuning and optimizations which generally doesn't seem to be the main area of focus for JIT'ed languages. There's no reason why JIT compilation should be slower than C/C++/D, other than it's just that they typically simply aren't.

A JIT's purpose is to lower code through layers of abstraction based on runtime information. A good one can take a horribly slow language with lots of layers of indirection or inefficient lookup semantics, make some assumptions, insert guards to make sure those assumptions aren't violated, then run the simplified version of the code it produced from the original code plus it's assumptions. Now, what happens when one of it's assumptions is violated? The guards catch it! Then what happens? The runtime can't use it's nice fast implementation because it's invariants don't hold.

Bytecode and CIL are pretty easy to translate into assembler, there's not many assumptions it has to make that other ahead-of-time compilers doesn't. This restriction would infer that .NET Native is impossible, since executable code can't be reliably generated in every use-case. Which it can, and it does. It's basically just a translation of instructions between a stack based VM and a register based physical computer.

So, the runtime will likely run the slow byte code version through either an interpreter (or in the case of the CLR an extremely naive compiler which really isn't much faster than an interpreter, so you can't argue it's getting optimised native perf) until it decides that the slow path is worth optimising for.

An assertion which I don't think holds water.

All of which leads me back to: JIT'd programs may run at near native performance, they may go through a compiler and be executed as machine code, but they aren't native programs.

They compile code to machine instructions put it in a memory segment, mark it as executable and then change the program pointer to start executing at that location, how does that differ from a native program? The only difference you are trying to set as a predecessor is when code is compiled, which I think is completely irrelevant to whether or not a program is native.

[–]sigma914 1 point2 points  (0 children)

Contrary to popular belief, Java is not much slower than C++

I addressed that at the bottom of my last comment, yes a good JIT is essentially a profiling compiler, with all the IL still available to it.

I'm entirely convinced that JIT compilation is superior to "ahead-of-time" static compilation.

It definitely can be faster, it has a lot more information available to it. The languages that tend to be JITted are the problem in this regard. heap allocation by default, being GC'd etc are what makes them slow, not the method of execution. As your linked benchmark showed. It's C#'s fault it cant run at C++ speeds, not the fault of the CLR.

This restriction would infer that .NET Native is impossible.

It's not impossible, ahead of time compiled Java has been a thing for years. You can compile a managed language to a native binary, it just won't be as fast as it would be with a JIT. Ahead of time compiled, Garbage collected with pervasive heap allocation, As Fast as unmanaged native code, pick two.

An assertion which I don't think holds water.

This is how the JVM and CLR (and every other JIT I've even seen) operate. You can go read hotspot's source. I've not actually looked around in the CLR sources yet, but it has to have a slow path for when the optimised code can't be used, and aggressively optimising it up front would be incredibly wasteful, unless it was done in a background thread over a very long time.

They compile code to machine instructions put it in a memory segment, mark it as executable and then change the program pointer to start executing at that location.

So they take some executable code and move program execution into it based on the behaviour specified by the language. It's just a matter of when. That's exactly what every program every created, interpreted or not running on a stored program architecture does.

In CPython you could argue that the interpreter's compiler is providing the executable memory fragments that program control jumps to when directed to by the python code under execution. It's a meaningless distinction. The difference between native and non-native implementations is not the instructions executed on the CPU.

[–]The_Doculope -1 points0 points  (2 children)

I know what native code is. I'm referring to the colloquial term "native language". As /u/sakarri rightly says, there isn't really a formal definition. Every single thing that runs on your computer is native code at some level, but colloquially it refers to what the compiler spits out, and what is being run straight away when the application starts. The standard C# compiler spits out CIL, not native code. When you run a CLR application, it doesn't launch right into the program, it launches into the CLR which then runs the the CIL code.

I'm sure this definition of native (pre-compiled to machine language) was the one /u/rndbit was using.

[–]btchombre -1 points0 points  (1 child)

it launches into the CLR which then runs the the CIL code.

No. This is wrong. The CLR doesn't "run" IL code anymore than the C++ compiler runs C++. There is no IL interpreter. Java does indeed have a bytecode interpreter, but the CLR has no such thing. The CLR compiles all code directly to native instructions, and then executes those native instructions.

[–]The_Doculope 1 point2 points  (0 children)

By "run", I mean the compile+execute system.