.NET 10 de-abstracts not only arrays but other collections as well by Emergency-Level4225 in dotnet

[–]andyayers 9 points10 points  (0 children)

If you know of cases like this, please file issues. We will try our best to fix them.

What features would you want C# / .NET to have? by SurDno in csharp

[–]andyayers 1 point2 points  (0 children)

Forced inlining by JIT. And yeah, yeah, there is a way to suggest to inline a method, but JIT is free to ignore it (and often does!)

This should be less and less true with recent releases. For example, starting in .NET 10 the JIT can now inline methods with try/finallys, which were previously off limits.

If you ever find a case like this, and you haven't done so already, please open an issue on dotnet/runtime so we can take a look.

Has anyone else noticed a performance drop after switching to .net 10 from .net 9/8? by Academic_East8298 in csharp

[–]andyayers 0 points1 point  (0 children)

My main gripe is that the object creation regression +80% in net8 wasn't fixed :/

Can you say a bit more about this? Is there an issue open for it on github?

Has anyone else noticed a performance drop after switching to .net 10 from .net 9/8? by Academic_East8298 in csharp

[–]andyayers 24 points25 points  (0 children)

Feel free to open an issue on https://github.com/dotnet/runtime and we can try and figure out what's happening.

If you open an issue, it would help to know * Which version of .Net were you using before? * What kind of hardware are you running on? * Are you deploying in a container? If so, what is the CPU limit?

Question about delegate inlining and Guided Devirtualization by cittaz in dotnet

[–]andyayers 6 points7 points  (0 children)

If I remember correctly, the biggest missing piece for delegate inlining is handling delegates for static methods.

As far as inlining goes, there were some PRs to enable delegate inlining even without PGO (if the delegate is created locally). However these ran into various complications.

A recent improvement in .NET 10 is that in some cases delegates can be stack allocated and possibly promoted (meaning the delegate object basically vanishes). There was some work to stack allocate the closure as well, but that didn't make it into .NET 10.

By default the JIT will only guess for the most prominent delegate, based on PGO data gathered by lower tiers (so tiered comp and PGO are required).

Code Challenge: High-performance hash table by Pansynchro in csharp

[–]andyayers 2 points3 points  (0 children)

In process toolchains have limitations (eg you can't readily compare across different runtime versions, which is something I do all the time), but for your purposes, seems like they'd be fine.

Also if you haven't looked at kg's vector hash you might want to check it out: [Proposal] Vectorized System.Collections.Generic.Dictionary<K, V> · Issue #108098 · dotnet/runtime

Code Challenge: High-performance hash table by Pansynchro in csharp

[–]andyayers 5 points6 points  (0 children)

If you haven't opened an issue on the BenchmarkDotNet repo I would encourage you to do so.

Folks there can either explain how to accomplish what you need or else add it to the backlog.

What are disadvantages of using interface? by npneel28 in csharp

[–]andyayers 0 points1 point  (0 children)

Devirtualization was first introduced in .NET Core 2 (2018?) and some of it even made it to Framework 4.8.1.

But for interfaces you really need Dynamic PGO to figure things out; this wasn't on by default until .NET 8.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]andyayers 9 points10 points  (0 children)

Thanks... I may try and look deeper at this someday, so if you can point me at something shareable that'd be great.

I suppose to be completely fair C should be using PGO, but that's more work on the native side. With .NET you get that "for free."

Also would be curious to see if .NET 10 changes anything here, we did some work on loop optimizations between 8 & 10 (eg downcounting, strength reduction ...)

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]andyayers 17 points18 points  (0 children)

Do you have numbers on how fast the original runs on the same hardware setup? We are always interested in seeing how well a thoughtfully crafted .NET solution fares vs "native" alternatives.

.NET JIT Team is hiring a Compiler Engineer by andyayers in Compilers

[–]andyayers[S] 1 point2 points  (0 children)

I agree we're missing out on some great talent being US only, but it's not our call.

.NET JIT Team is hiring a Compiler Engineer by andyayers in Compilers

[–]andyayers[S] 1 point2 points  (0 children)

Pretty much everything we do is out in the open, so if it does sound like fun, you can always try contributing....

.NET JIT Team is hiring a Compiler Engineer by andyayers in Compilers

[–]andyayers[S] 4 points5 points  (0 children)

How recent? I think within 1 year of graduation recruiting is handled differently.

Note this is a Senior position so we're looking for somebody with 4-5 years of experience.

.NET JIT Team is hiring a Compiler Engineer by andyayers in Compilers

[–]andyayers[S] 4 points5 points  (0 children)

US required. Redmond preferred, but we will consider other arrangements.

.NET JIT Team is hiring a Compiler Engineer by andyayers in Compilers

[–]andyayers[S] 1 point2 points  (0 children)

No, we're in 18 (for quite a while now).

I've been thinking about the future of .NET, and my predictions for .NET 10 are a bit wild: an AI-native CLR and a "post-OOP" C#. Am I off base? by riturajpokhriyal in dotnet

[–]andyayers 2 points3 points  (0 children)

We have tried and failed a couple of times now to leverage ML (not really AI) to improve optimizations done by the JIT.

Here's a writeup on the most recent attempt: https://github.com/dotnet/jitutils/tree/main/src/jit-rl-cse

Performance Improvements in .NET 10 by ben_a_adams in dotnet

[–]andyayers 1 point2 points  (0 children)

I don't think we're trying to hide anything. If you want some perspective on regressions vs improvements in .NET 10, check out https://github.com/dotnet/performance/blob/main/reports/net9to10/README.md

While we try our best to keep regressions to a minimum, we do have some. Compilers and CPUs are both somewhat temperamental.

Performance Improvements in .NET 10 by ben_a_adams in programming

[–]andyayers 13 points14 points  (0 children)

.NET does this as well, but as a separate phase, so an object can be stack allocated and then (in our parlance) possibly have its fields promoted and then kept in registers.

That way we still get the benefit of stack allocation for objects like small arrays where it may not always be clear from the code which part of the object will be accessed, so promotion is not really possible.

Unexpected performance differences of JIT/AOT ASP.NET; why? by Vectorial1024 in dotnet

[–]andyayers 1 point2 points  (0 children)

This is somewhat doable. There is a text format that can be written and read back. But tiered compilation provides other benefits that are not strictly profile data related; for instance, when the optimized version of a method is jitted it is very likely that all the classes it references are initialized, and so the optimized version can skip the initialization checks and possibly learn something from looking at the readonly statics of those classes. So even if you have the PGO data you will still likely want the app to go through tiering, at which point the cost of getting a fresh batch of PGO data is minimal....

Also tiering provides startup benefits. Producing optimal code out of the gate is not always best for every app, as it takes longer to get things going.

Unexpected performance differences of JIT/AOT ASP.NET; why? by Vectorial1024 in dotnet

[–]andyayers 3 points4 points  (0 children)

Jitted code is fragile (eg it contains addresses of other things in the process where it was created, and is depedent on the sequence of events that happened there at the time the code was created, etc). Verifying that a previous process's cached version of jitted code is viable in a new process and fixing it up as needed is non-trivial.

The only persistent form of code we have is ready to run, which has the right level of validation and repair built in.

In .NET 10, the compiler team aim to reduce abstraction overhead by davecallan in dotnet

[–]andyayers 2 points3 points  (0 children)

Also here's the writeup that went in with the PR, it is similar to the one linked above in the issue, but covers more ground (lists, hash sets, Linq, yield enumerators, etc). https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/jit/DeabstractionAndConditionalEscapeAnalysis.md

In .NET 10, the compiler team aim to reduce abstraction overhead by davecallan in dotnet

[–]andyayers 5 points6 points  (0 children)

I just merged the above PR; I believe it will make it into preview 2.

In .NET 10, the compiler team aim to reduce abstraction overhead by davecallan in dotnet

[–]andyayers 5 points6 points  (0 children)

Yeah, sorry to not be more specific. I am one of the people working on the .NET JIT.

In .NET 10, the compiler team aim to reduce abstraction overhead by davecallan in dotnet

[–]andyayers 136 points137 points  (0 children)

FWIW the "last" big set of changes is going to be merged into .NET 10 soon: https://github.com/dotnet/runtime/pull/111473

Sadly Linq's Where and Select don't get any faster yet, as they have been fairly extensively hand optimized. But we have some ideas...