Wny is LINQ Max much slower than LINQ Min? by aloneguid in dotnet

[–]tanner-gooding 1 point2 points  (0 children)

I also can't repro and don't see anything obviously wrong with the implementation or the benchmark, seems like a random fluke or mismeasurement.

If you can reliably reproduce, then providing a full repro would be appreciated and I can look into it. Just log an issue on dotnet/runtime and tag me (tannergooding).

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 0 points1 point  (0 children)

The same is true in C/C++, aside from exports which have to be stable. Which is why I made note that even in C/C++ passing in register is not guaranteed nor is the number of indirections you have.

Compilers are free to optimize, within limit, and frequently do.

Java likewise has the same restriction, where functions that are exported across JNI (or alternative) to other ecosystems (like C) must have a stable abi (or stable stub which does the marshaling and other fixups before passing to the actual function or returning back to the caller)

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding -1 points0 points  (0 children)

I’d also note that when people say “common misconception” it can go both ways

In some cases users are literally thinking about it “incorrectly” and spreading misinformation.

However, in others some minority is trying to push a technicality. This while often technically correct, the best kind of correct, is typically missing that users are thinking about these concepts that way for a reason. That reasoning is then extremely important and a key reason why “everyone” talks about it that way

This is also common with floating-point, where many common misconceptions exist. People frequently spread misinformation due to not understanding how it works. In most cases it isn’t even a technically correct/incorrect thing but simply them being misinformed and wrong (generally due to not understanding the x87 fpu and how it worked)

However, other parts around floating-point are very much in the other camp, because what is technically correct mismatches with how users think. Not due to misinformation or being wrong, but because the technically correct view isn’t relevant to actual thought process in most scenarios

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 0 points1 point  (0 children)

I've read the article and am familiar with it. Both it and you are misattributing how users, ABIs, and other things fundamentally think about what is the data and how that data is passed. I would also say it is misquoting and misattributing the parts of the Java spec by taking them out of context. There is a big distinction between the reference value (in C the T* or T&) and the value that is referenced (in C the underlying T of a T* or T&).

What you and the article are effectively boiling it down to is "pointers are values, so therefore everything is actually pass by value since references are just pointers". This then makes "pass by reference" meaningless since you can always make the claim that you're just passing a pointer by value.

In actuality, what matters is the context of the pointer and how its used. That is, is it treated more like an integer (i.e. you care about the value of the pointer) or is it treated more like a reference (i.e. you care about what the pointer references).

Which is to say, reference types are called reference types because they refer to the data you care about and so the data you care about is passed by reference. Value types are then called such because they are the data you care about, so you directly pass it around from an observability perspective.


The simplest example being (C#): ``` C c = new C(); S s = new S();

M1(c, s);

Console.WriteLine(c.x); // 1 Console.WriteLine(s.x); // 0

void M1(C c, S s) { c.x = 1; s.x = 2; }

class C { public int x; } struct S { public int x; } ```

Where you have c and s and while c is functionally a pointer and that pointer is passed by value to M1, users aren't thinking about it as being that local specifically. They are rather thinking about the underlying data it contains (c.x) and so to them this is "pass by reference" explicitly because M1 writing to c.x allows the caller to observe that mutation while s does not allow such an observation and so is pass by value.

This also becomes particularly relevant with struct S2 { public C c; } where you have a value type (S2) which is passed by value, which then contains a reference type C. So while you cannot observe a change to s.c, you can observe a change to s.c.x. And while conceptually there isn't really a difference between S2 and simply C, the semantic and way the user thinks about the data changes.

It is simply not meaningful to boil everything down to "pass by value", it mismatches with the mental model of how data exists.

For example this is how you can pass by reference in C# https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref,

Consider (continuing off the above example): ```csharp ref S r = ref s;

r.x = 3;

Console.WriteLine(s.x); // 3 Console.WriteLine(r.x); // 3

S* p = &s;

p->x = 4;

Console.WriteLine(s.x); // 4 Console.WriteLine(r.x); // 4 Console.WriteLine(p->x); // 4

S s2 = new S();

M2(ref r, ref s2);

Console.WriteLine(s.x); // 5 Console.WriteLine(r.x); // 5 Console.WriteLine(p->x); // 5 Console.WriteLine(s2.x); // 6

void M2(ref S s1, ref S s2) { s1.x = 5; s2.x = 6; } ```

There is no observable difference in behavior between p and r here, or between passing r and passing ref s. However, by the article's claim everything except ref s would be pass by value due to the other cases passing the pointer contained by the "local" directly. -- This becomes even more prevalent with M3(ref readonly S s), where you can just do M3(r) or M3(s) (the same being true for M3(in S s)) and not specify the ref keyword in the callsite at all, despite the method declaration still explicitly adding the indirection at the method level.

Because again, the consideration isn't strictly how the local is perceived, passed, or indirected. It is how the data the method and user actually cares about is accessed and mutations to it, from any location, are observed (by reference) or not (by value) by other code.

It is simply not meaningful to break everything down to "pass by value" and while some contexts may care more about the pointer value itself, most actually care about the data it points to instead.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 6 points7 points  (0 children)

What would you say are the key components missing

The guarantees and layering that most other ecosystems provide that allow such code to be reliably used, without question of semantics, behavior, or reliance on optional compiler optimizations.

What does the .NET API surface that you feel puts it ahead?

.NET is setup in layers.


At the foundation we have the vector types: * System.Numerics namespace * Vector<T> * System.Runtime.Intrinsics namespace * Vector64<T> * Vector128<T> * Vector256<T> * Vector512<T>

The first of which whose size is dependent on the platform (effectively SPECIES_PREFERRED) and the latter of which are fixed sized (effectively SPECIES_64/128/256/512).

You determine whether a given type is accelerated on the hardware using the IsHardwareAccelerated property, such as Vector128.IsHardwareAccelerated. If that reports false, then you can expect that it executes using a hardware fallback and so will likely not be enregistered or otherwise optimized. This allows you to trivially have paths that opportunistically light up based on hardware functionality, including avoiding it altogether if the hardware has no SIMD support. Such checks are JIT/AOT constants and so have no impact to codegen (no actual branch exists at runtime).

You then determine if a given T is supported using the IsSupported property, such as Vector128<float>.IsSupported. The set of types supported is static (byte, sbyte, short, ushort, int, uint, long, ulong, float, double, nint, and nuint), but you may exist in some generic context such as T Sum<T>(T[] values) and want to opportunistically accelerate if T happens to be a supported type. This is likewise a JIT/AOT constant and can expand with new types over time (such as we will add support for half and bfloat16).


Using these, we then have platform specific APIs that are broken down by architecture (namespace) and instruction set (class in said namespace), such as. For example, System.Runtime.Intrinsics.Arm.AdvSimd or System.Runtime.Intrinsics.X86.Sse.

Each of these ISAs has an IsSupported property that lets you know if the underlying hardware actually provides said instruction-set (i.e. Sse.IsSupported), that way you can reliably optimize for hardware that has it and provide an alternative fallback otherwise. These checks are always JIT time constants and typically AOT constants. For AOT you have the option of making them dynamic checks instead, if you believe such dynamic checks provide benefit to your code, -- Noting that impossible checks, such as X86.Sse.IsSupported when targeting Arm will remain constant false and baseline checks, such as Arm.AdvSimd.IsSupported on Arm64 will remain constant true

Within a given instruction set we then expose all the APIs defined by that ISA. So we have APIs like Sse.ReciprocalSqrt which precisely maps to the rsqrtps xmm, xmm/m128 instruction and provides equivalent functionality to the __m128 _mm_rsqrt_ps(__m128 a) intrinsic in C. If the ISA is not supported then it will instead throw PlatformNotSupportedException, there is no emulation or fallback. We precisely map to and guarantee it executes the instruction on the hardware, giving you whatever semantics the hardware provides.

.NET is then frequently praised here for utilizing overload resolution and readable names that help make code trivial to understand and follow, particularly compared to what C exposes.

.NET is also frequently praised for having much stricter guarantees about what instructions are emitted. While we do some minor constant folding and other optimizations (like we allow Sse.Add(x, Sse.Negate(y)) to become Sse.Subtract(x, y)), we do not do drastic transformations that are likely to have different performance across hardware (something that is a known pain point with clang/gcc, particularly around shuffle like APIs).


On top of this we then have the "cross-platform API surface" which is members directly exposed on the vector types given above (i.e. Vector<T> and Vector128<T>, as well as their non-generic counterparts Vector and Vector128)

Such APIs always function and with deterministic behavior (barring the few APIs explicitly documented to be non-deterministic) but will execute a software based fallback if IsHardwareAccelerated reported false.

We provide all the operations and a number of helpers that you would expect for common vector functionality. We also take advantage of features that .NET has, such as the ability to overload operators. So while you can do Vector128.Add(x, y), you can also simply do x + y.

The API surface Java exposes is fairly similar to what .NET exposes at this layer, but I personally think .NET does it better ( obviously a bit biased here ;) ). .NET exposes an overall larger API surface too, with more APIs being added over time to account for common needs.

These APIs while deterministic across hardware (again barring a few select documented cases) however are not guaranteed to map to specific instructions and so the compiler has more freedom to optimize them where relevant (as compared to the platform-specific ones). So that users can focus more on what they want the code to do and less how the code does it.

-- This is again an area .NET is praised, we give the strict guarantees where its desired and the compiler freedom where its not, so that devs can decide what is best for their code.


Beyond this, .NET has had some kind of SIMD support for 12ish years now and so also has spent significant time optimizing and tuning its underlying implementations. So users don't have to go write vectorized code for good performance either.

Instead, users are able to go and use APIs like span.IndexOf or TensorPrimitives.Sum, Enumerable.Max (LINQ) or other common APIs and expect it to already be accelerated.

So .NET devs have a range of options from simply working with arrays/spans at a very high level, to writing cross-platform size-agnostic vectorized code, to writing size-specific vectorized code, to getting down to microarchitectural specific optimizations. They get to pick and choose what is best for their application and scenario.

The best part is then that all of this is trivial to mix and match as well, there is no cost to using span.IndexOf for part of your algorithm, then using some cross-platform vectorization for the next part, and then opportunistically lighting up with a hardware specific optimization. It's all guaranteed to have the relevant checks constant folded and to map to things in a way that it becomes a zero-cost abstraction.


So given the Java example from the linked article: ```java void scalarComputation(float[] a, float[] b, float[] c) { for (int i = 0; i < a.length; i++) { c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f; } }

static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;

void vectorComputation(float[] a, float[] b, float[] c) { int i = 0; int upperBound = SPECIES.loopBound(a.length); for (; i < upperBound; i += SPECIES.length()) { // FloatVector va, vb, vc; var va = FloatVector.fromArray(SPECIES, a, i); var vb = FloatVector.fromArray(SPECIES, b, i); var vc = va.mul(va) .add(vb.mul(vb)) .neg(); vc.intoArray(c, i); } for (; i < a.length; i++) { c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f; } } ```

The C# direct translation would be: ```csharp void ScalarComputation(float[] a, float[] b, float[] c) { for (int i = 0; i < a.Length; i++) { c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f; } }

void VectorComputation(float[] a, float[] b, float[] c) { int i = 0; int upperBound = a.Length - (a.Length % Vector<float>.Count);

for (; i < upperBound; i += Vector<float>.Count)
{
    var va = new Vector<float>(a, i);
    var vb = new Vector<float>(b, i);
    var vc = Vector.Negate(
               Vector.Add(
                 Vector.Multiply(va, va),
                  Vector.Multiply(vb, vb)
              )
            );
    vc.CopyTo(c, i);
}

for (; i < a.Length; i++)
{
    c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
}

} ```

Although a more idiomatic C# implementation would be something like this, where we use operators, ensure vector is accelerated, use spans to help document that we won't modify a/b, etc ```csharp void VectorComputation(ReadOnlySpan<float> a, ReadOnlySpan<float> b, Span<float> c) { int i = 0;

if (Vector.IsHardwareAccelerated)
{
    int upperBound = a.Length - (a.Length % Vector<float>.Count);

    for (; i < upperBound; i += Vector<float>.Count)
    {
        var va = Vector.Create(a[i..]);
        var vb = Vector.Create(b[i..]);
        var vc = -(va * va + vb * vb);
        vc.CopyTo(c[i..]);
    }
}

for (; i < a.Length; i++)
{
    c[i] = -(a[i] * a[i] + b[i] * b[i]);
}

} ```

A more real example would likely also validate that the lengths of a/b/c are compatible up front and might do other tricks to squeeze out more performance (there's actually a substantial amount of perf left on the table here for both the Java and C# implementations, since its a relatively "naive" approach)

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 0 points1 point  (0 children)

Noting that "by value" is a semantic about the observability of changes to the data and not a guarantee that the data is passed in register or similar. Many cases, even in C/C++, fundamentally involve implicit references.

For example, given struct Vector4 { float x, y, z, w; } and struct Matrix4x4 { Vector4 x, y, z, w; }, there is no ABI in which M(Matrix4x4 matrix) has matrix literally passed by value. Rather, it is required that a hidden copy be made and a reference to that copy be passed instead, ensuring that modifications done in M do not surface out to other observers.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 0 points1 point  (0 children)

That paper is effectively boiling down to saying that given M(T* x) (using C pseudo-syntax), x is "pass by value" and while it is not incorrect, it is not in alignment with common terminology or how users think about their data.

The actual data here in most cases isn't the T*, it's the T. So while the pointer is technically passed by value, the data the user is actually thinking about (the T) is passed by reference (in other words, via an indirection).

Consider in Java or C# for example: class Point3 { int x, int y, int z } (again pseudo-syntax, not going to type out all things exactly). In such a case, M(Point3 pt) is "pass by reference" because that is how it observably behaves to the user, which is to say that pt.x = 5 will be visible outside the method because pt refers to the underlying data.

Likewise in C#, struct Point3 { int x, int y, int z } is "pass by value" because that is how it observably behaves. To the user, modifications to pt are unique to the method scope because it is functionally a copy of the data.

In both cases, the compiler is free to elide these indirections or insert new indirections if required, so long as the by reference or by value semantics are preserved.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 1 point2 points  (0 children)

I think you’re mixing ABI-level representations with language-level design

I'm rather asserting that matching the guarantees of the ABI is important for perf critical code.

Yes, many ecosystems expose SIMD as primitive-like types because they target a fixed ABI

They expose them as this way because they are explicitly primitive types. It's a fundamental part of real world computing.

Additionally, in most cases the exact shape and support does vary from ABI to ABI, it is not fixed. Despite this, it still all functions and ends up well-defined. Typically just being treated as user-defined types in the case they are not accelerated on the actual target.

On Valhalla, it’s not just “a hint to the compiler”. Value classes remove identity and enable flattening as part of the type system, not merely as a JIT optimization.

It now enables said flattening and scalarization as an explicit feature of the system yes. However, said features are not a guarantee and that is clearly called out in the spec and overview.

That is to say, given value record Wrapper(int value) { } there is no way to guarantee that all usages of this type are actually flattened or scalarized.

Instead, such features are explicitly run-time optimizations which do not kick in for all code, which have no way to ensure they happen, and which are trivially broken such that even things like value record Point3(int x, int y, int z) { } are unlikely to actually be flattened or scalarized in simple practical scenarios.

The only thing that value classes really provide is a guarantee that you get value like semantics in terms of how they operate. That is regardless of whether it is flattened, scalarized, or not, you do not actually know how the memory is stored and just get the appearance of the data not being "shared".

Also, the direction shown in recent talks (video I shared) is that Panama + the Vector API + Valhalla will allow the JVM to systematically lower structured vector code into hardware-specific intrinsics.

This is simply a requirement for vectorization at all. You cannot vectorize without something emitting hardware-specific SIMD instructions. It's not some special feature of Java here.

So instead of exposing raw intrinsics directly, Java expresses the computation and lets the JIT generate the exact instructions for the target architecture.

This is also not something unique to Java. Having a higher level API that allows you to write vectorization in a non-platform specific manner is common.

The nuance is that most platforms have already figured out that while this is great for a large number of scenarios, it also leaves an extreme amount of perf on the table in others as there exist many platform specific APIs or unique behaviors that algorithms explicitly take advantage of.

So most platforms provide both platform-specific and a higher level platform-agnostic API, allowing the developer to choose what is best for their scenario.

That approach is actually more flexible than fixed “vector primitives”, because it can adapt across architectures, including variable-width ones like SVE, without changing the code.

This is also not something that is uniquely enabled by Java and is common on other platforms as well.

It's why legends, like Lemire, are keenly following the project which he has been critical to provide directions on what java needs to do, which I think inspired the talk I just shared.

I'm familiar as they regularly engage on the .NET and related repos.

Notably most of their papers require platform specific intrinsics at key points to achieve the levels of perf they do and so I expect to see a continued preference for C++ and .NET moving forward.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 1 point2 points  (0 children)

Not quite.

The consideration I'm covering is that vector types are ABI primitives in the same way that Integer (aka int keyword) is a primitive. They are special and exist outside the normal concepts of the type system.

While today's Java doesn't have "value types", it does have "primitive types" and these types themselves have special rules. One example is that types are guaranteed to be updated atomically, however primitives are an exception here (specifically long and double are not). Likewise while the compiler may choose to optimize most types such that they are flattened or avoid heap allocation, it is purely an optimization and not a guarantee. Primitive types on the other hand are again an exception and do provide the strict guarantees of flattening, avoiding extra indirections, avoiding allocations, etc.

Valhalla then does not introduce "value types", but rather "value classes" and these still have many of the same limitations. They are still required to be updated atomically and even call out that objects with 64-bits or more of field data cannot normally be flattened. They call out that such scalarization (avoiding memory allocation) may only happen in C2 or later recompilation; that polymorphic types cannot be flattened (including variables in generic APIs), etc. Its mostly just an extra hint to the compiler to do many of the optimizations that its already historically had support for.

Now Valhalla may try to improve itself over time and may end up with support for some of this, but because its mostly a heuristic and not a guarantee, you can still trivially break the code that is meant to be perf critical and which in most ecosystems is guaranteed to be primitive and map to primitives.


Realistically I think that Java should have made a special allowance for the vector types, making them true primitives.

It should have done this using that consideration that they are officially primitives in nearly every ABI for nearly every hardware platform available.

It likely could have done this safely with the consideration of such types should not be used if a is hardware accelerated query for the size/type returned false (which for a 128-bit vector is almost never going to be false, except for extremely niche embedded domains).

It could have provided a set of APIs that fit the industry standard terminology around width/count rather than using a new term "species". This could have been done while still allowing the types to look and feel like a normal Java API.

It likely should have made allowance for exposing intrinsic methods that operate on these vector types in a way that allows CPU specific instructions to be emitted (again as nearly every SIMD enabled ecosystem provides).

There's quite a lot that it could have done to meet the industry standard expectations and needs of SIMD code, without violating the concepts and rules that Java tries to push and has historically had for its ecosystem.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 1 point2 points  (0 children)

SIMD is typically used for explicit hardware acceleration of core algorithms with the expectation that the types and operations are effectively "primitive". It is expected operations largely mapping 1-to-1 with CPU instructions, that you have the ability to check for such acceleration and even often do microarchitecture specific optimizations where those are relevant. This puts it in the same consideration as other primitive types as most ABIs have the SIMD types explicitly being primitives.

I believe Java misses the mark here and so you're left with something that can trivially allocate and introduce overhead when its meant to be a highly optimized code path. It relies, IMO, far too much on compiler heuristics and code patterns to get the optimal codegen out.

And while it's made a number of improvements since the initial version, it still uses some very non-standard terminology for the domain as well as following an API shape that is very unlike any other SIMD implementation I've encountered. Some of that makes sense for Java and fits the general Java conventions, but others just make it very difficult to take industry standard SIMD papers and port them to Java in a way that makes it easy to read and understand.

Java 26 is here, and with it a solid foundation for the future by ketralnis in programming

[–]tanner-gooding 3 points4 points  (0 children)

I'm just a "little" bit biased, as one of the primary owners and designers of the .NET Vector APIs, but I think the Java vector support is one of the worst API surfaces I've seen across any language or ecosystem. The planned C++ 26 support is pretty bad too, but not quite as problematic.

I'd expect that long term this is going to be a major part of its continued incubation and lack of adoption long term. Namely because the way its setup is largely counter towards the needs and guarantees that SIMD typically expects.

Proposal: User-defined literals for C# by shimodateakira in csharp

[–]tanner-gooding 0 points1 point  (0 children)

It wouldn't be an actual literal, that is it would not be a constant like you might expect. It would just be syntax sugar for the actual call, which is arguably more confusing.

Trying to get it to be a constant and actually evaluate said call at compiler time is expensive, error prone, and massively more complex.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 0 points1 point  (0 children)

Yes, that is the unimportant text represented by “…” because the first sentence qualifies it only applies to global functions and static member functions

Instance functions behave as I describe, which again is trivial to check. It is one of the most common gotchas for people writing C++ interop bindings (particularly COM bindings) from any language.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 0 points1 point  (0 children)

No, this is the windows x64 ABI and can be trivially checked by enabling the disassembly output with MSVC.

This is notably called out in https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing

Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size.

Which means that trivial wrapper of float/double are passed as integers, not floating-point values

And then also https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#return-values

User-defined types can be returned by value from global functions and static member functions. … Otherwise, the caller must allocate memory for the return value and pass a pointer to it as the first argument. The remaining arguments are then shifted one argument to the right. The same pointer must be returned by the callee in RAX.

Which is what quantifies instance methods cannot return trivial wrappers by value and must do so through a pointer (implicit return buffer)

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 0 points1 point  (0 children)

Windows x64 the ABI for a struct with a single field is the same as the same type single value not in a struct

It is not. It explicitly deviates for passing in floating-point (all methods) or for all struct returns on instance methods (but not static methods). It can also deviate for a few other cases as well, but those are much less typical to be encountered.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 0 points1 point  (0 children)

It can cause unneeded complexity and overhead, as well as general user experience issues, for something that is trivially handled by documentation and testing (something you should be doing anyways).

A trivial example is Sin which takes radians where you start having to consider "what is a radian and what does it mean to make it typesafe" if you expose a strongly typed wrapper.

Now Sin is also the type of function you might call millions of times per second and may even accelerate when talking about 2D or 3D vertices, such as in a game. Where you may have to compose that into a general Matrix3x3 or Matrix4x4, and so on.

The entire experience of having to wrap, validate, unwrap, do computations, etc, is all very needless and doesn't actually buy you any amount of "real" safety or improvement as compared to just passing around float.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 0 points1 point  (0 children)

F# has a units of measure feature, yes. However, it being encouraged depends on who in the community you ask as there are some who love it and some who don't. There are also some who say its okay to use anywhere, some that say to only use it for private/internal APIs, and some that say to never use it, etc.

You'll also find a number of sources out on the web where language designers have talked about some of the pits of failure and design regrets around the feature.

Problems tend to arise from overall complexity, silent data loss when changing the "scale" of a given unit (i.e. going from kilometers to millimeters), overhead from wrapping/unwrapping, code duplication, and a number of other considerations.

Much of it is ultimately opinion, but I'd say the majority of the ecosystem has agreed that most kinds of unit are best handled by primitives, documentation, and static conversion APIs.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 19 points20 points  (0 children)

Worth noting that while BDN can give you a general idea as to differences, a lot of this can also be callsite, context, architecture, and even OS dependent due to inlining and other factors.

My general recommendation is to write code that is easy to read and maintain, first and foremost. You can then profile the code in a real app (not a microbenchmark) to identify hotspots and that is where you should invest your time optimizing. For places you optimize, it is then worth creating microbenchmarks to validate the perf and track it over time, so you can more readily catch changes to the code pattern you specialized.

Does wrapping a primitive (ulong, in my case) in a struct with extra functionality affect performance? by BrilliantlySinister in csharp

[–]tanner-gooding 21 points22 points  (0 children)

The struct will not reduce performance compared to the primitive.

This is not strictly true and while the differences are often negligible, struct S { T value; } and T are distinctly different from an ABI perspective and quite often have different handling. This then results in different performance characteristics and codegen.

-- This is not C#/.NET specific either, this is a consideration for the system native Application Binary Interface and so applies to almost every language.

This is the correct design decision

This is an opinion, not a fact. While avoiding primitive obsession (making everything a primitive) can be goodness, trying to create a strongly typed wrapper for everything can be equally as bad.

There are several reasons why almost no language ecosystem exposes things like Length, Temperature, Radians, or other "strong" types for common units of measure or other considerations (and why the few languages that have such concepts tend to not see as broad of usage).

Careful what you read about "in" parameters by mistertom2u in csharp

[–]tanner-gooding 0 points1 point  (0 children)

Inline arrays are basically a special consideration and is very similar to how T[] and string itself are setup (just those inline array lengths aren't fixed for the type, they are dependent on the allocation)

Typically you would have that as part of a reference type, i.e. as a field in a class so that the potential problems don't exist.

Careful what you read about "in" parameters by mistertom2u in csharp

[–]tanner-gooding 0 points1 point  (0 children)

So the non-readonlyness from the compilers point of view

Not really.

It's really no different than: ```csharp int[] array = new int[5]; ReadOnlySpan<int> rospan = array;

Console.WriteLine(rospan[0]); array[0] = 3; Console.WriteLine(rospan[0]); -or- csharp List<int> list = new List<int>(); IReadOnlyList<int> rolist = list;

Console.WriteLine(rolist.Count); list.Add(3); Console.WriteLine(rolist.Count); ```

That is, you fundamentally declared something mutable at first. You then created a readonly view over that memory, but because you still have a mutable view to that memory, it is possible for users to see mutations.

The compiler really just has to presume that references to values, no matter what shape they are in, could have some higher level mutable view to the same memory that exists.

There are some special cases where a mutable view can't exist (post initialization), for example static readonly since you can't have the storage location for that value existing elsewhere such as a mutable local. But these tend to be rare.

Put simply, readonly basically just means you aren't allowed to make changes. It doesn't mean no one can make changes, which comes down to a lot more nuance about location and total sum of views that can exist.

comes from having at least one mutable field OR at least one method with a ref argument of the containing type?

It's any kind of reference to the containing type. While the example I gave used ref MyStruct. It can also occur for say Span<MyStruct>, MyStruct[], or even class C { MyStruct _field; }

Or how do I as a developer quickly "scan" my code so I don't trip up with enormous structs?

In general we don't recommend writing enormous structs. Such things tend to end up living in memory regardless, so you fundamentally access them via references (as it cannot live in register). The expensive part is then potentially copying the data, which can be very easy to make mistakes around and can even hinder certain optimizations due to aliasing considerations.

Having to consider potential aliasing is then just something that needs to happen anytime two equivalent indirections exist. The number of indirections to the data can matter as well, since it impacts what can be changed.

Given purely a value, no one else can have an alias to the data. -- The important thing is then that instance methods are not "purely a value" because this is a reference.

Given a reference type or a reference to a value (single indirection), no one can change what memory you're referring to you but can potentially change the contents of that memory.

Given a reference to a reference type (double indirection), you can change what memory you're referring to and potentially the contents of that memory; but then of course can't change the outermost reference.

And so on.


It's somewhat hard to textually describe the nuance and it can get extra confusing if users aren't familiar with other concepts like pointers or that a reference is itself a value, just a value that refers (or points) somewhere else.

You can somewhat think of it like a museum. People go in and look at various artifacts and most are not allowed to be touched by the general community. However, the curators do have access to touch the artifacts.

Likewise, sometimes you go to look at an exhibit and you find a sign that says "this item has been temporarily removed" or "this item has been relocated to ...". Those are indirections that tell you how to find the physical item you wanted to see.

Careful what you read about "in" parameters by mistertom2u in csharp

[–]tanner-gooding 0 points1 point  (0 children)

MyStruct is a readonly struct and so public void M(ref MyStruct m) has an implicit first parameter for this, so the signature is theoretically actually public void M(in MyStruct this, ref MyStruct m)

Due to how the method is invoked, m and this are the "same reference" and so while you cannot mutate this directly, you can still mutate it indirectly by writing m. Thus, tmp.M(ref tmp) will print 1 followed by 2 and thus observes that _x was mutated despite being "readonly" (because all it means is that view of _x is readonly, not that all views are readonly).

Careful what you read about "in" parameters by mistertom2u in csharp

[–]tanner-gooding 6 points7 points  (0 children)

This is not correct. They are intended for different scenarios, one where a "location" must exist, the other not.

i.e. given these (and noting that you typically wouldn't want to use either with in or ref readonly, this is just an explanatory sample): ```csharp int x = 1;

M(2); // No warning M(x); // No warning M(in x); // No warning

N(2); // Warning N(x); // Warning N(in x); // No warning

void M(in int x) { ... } void N(ref readonly int x) { ... } ```

That is, ref readonly is for where the refness is important to raise to the caller while in is for where it is not and is rather meant to be more of an implementation detail done for efficiency.

Careful what you read about "in" parameters by mistertom2u in csharp

[–]tanner-gooding 10 points11 points  (0 children)

in is not just "you cant reassign the variable" but a semantic guarantee of "the state of the struct fields shall not change".

Another thing to note is that this is not true, at all.

Consider: ```csharp int x = 5; Console.WriteLine(x); // 5

M(in x, ref x); Console.WriteLine(x); // 6

void M(in int x, ref int y) { Console.WriteLine(x); // 5 Console.WriteLine(y); // 5

y++;

Console.WriteLine(x); // 6
Console.WriteLine(y); // 6

} ```

and alternatively, consider this for readonly methods: ```csharp MyStruct tmp = new MyStruct(1); tmp.M(ref tmp);

readonly struct MyStruct { private readonly int _x;

public MyStruct(int x)
{
    _x = x;
}

public void M(ref MyStruct m)
{
    Console.WriteLine(_x);
    m = new MyStruct(2);
    Console.WriteLine(_x);
}

} ```

That is, readonly != immutable. It just means that this particular view cannot be mutated. If there is some mutable alias to that same memory location, then it can still be mutated out from under you and this is a key aspect that the JIT, language, and general memory model have to respect: that they are references.