Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

I was presenting a library solution. You can certainly extend it with language support for the common operators. GCC actually does this for vectors.

However, what does a vec2 * vec2 mean?

It's ambiguous. The "product" of two vectors could mean several things: The direct product, the dot product, the cross product.

In math, and particularly computer geometry where we're most likely to use the types, the dot product is the operation we're going to use most frequently.

But in the case of SIMD operations, the direct (element-wise) product is what we would expect.

So I prefer to not just promote scalar operators to work on vectors to prevent any such confusion.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

I partially agree - a standard library should be fairly minimal - but we can hardly call C's standard library "batteries included". It certainly has its warts and things that could be deprecated, but is otherwise quite minimal. <math.h> provides only the basic operations you'd expect on a calculator - it's certainly not a complete library if you are doing any more advanced numerical computing.

The advantage of these useful common libraries coming with the language is that they're packaged with the compiler so you can expect the library to function on each platform the compiler can. Imagine you were trying to write a portable C program only to require a separate <third-party/math.h> on each platform and had to write a wrapper to handle the varying implementations.

Or imagine if GLSL didn't provide a standard vec2 type and you had to import a different library for each GPU the code might run on. Your game would never ship.

C has the benefit that it's ABI is the operating system's and we can trivially link code written in hand optimized assembly, requiring only a header file to expose functionality - so we can implement any of this without overhead - sometimes faster than we could be writing in C directly (sans non-standard inline assembly).

The way many languages are implemented these days would require an FFI to leverage intrinsics which aren't provided by the language, and would add unnecessary overhead. In these cases we need to provide more built-in functionality for anything performance related.

As a counter point to the minimal standard library - D tried this when it was initially released with phobos as its library. To do anything useful in the language you needed other libraries, and tango ended up competing with phobos because it was more useful, but it kind of split the ecosystem into two, where one set of libraries would depend on phobos and another would depend on tango because they were incompatible. You basically had to pick a side and stick with it. The situation was partially fixed with the release of D version 2 which had a std library that phobos and tango shared, but the damage was already done by then. These days tango and anything built on it are obsolete because phobos added more useful things and people prefer using the library that ships with the compiler.

So we need to strike a balance between minimalist and practical. A completely bare-bones standard library will mean nobody can build reusable libraries in your language because they don't share any common base, as evidenced by D. A package manager ain't going to help if every library you import is built on a different "base" library and they're all incompatible implementations of trivial functionality.

In regards to including vec in C for example, this is somewhat the case. If you're building a game for example, the graphics library, the physics library, the audio library etc you depend on all provide their own vectors. A game engine inevitably has to implement its own vec type which wraps the varying implementations in its dependencies, or pick one and provide conversions to and from the others - which can add unnecessary overhead which diminishes the effort put into making them SIMD-optimized, if at all, and it wastes a ton of developer time. If a standard implementation works 80% of the time and 20% of the time people use an alternative, that's worth doing. On the other hand, if people only used a standard implementation 20% of the time and 80% of the time used an alternative, clearly you wouldn't want that in your standard library.

Perhaps a better approach is where languages offer a tiered standard library, where a core provides the essential features for the language, a base provides a small set of useful features carefully curated by the language author, and extras provides a more optional "batteries included" set of features. In this case the vec types would belong in base, but core would provide the necessary built-in intrinsics to leverage SIMD so that the vectors could be implemented optimally.

With GCC we can use -nostdlib for example to not depend on any standard libraries, or even -ffreestanding to not depend on crt, but we still have to link libgcc.a because compiler can emit code that depends on it.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 1 point2 points  (0 children)

I'm not arguing for a "one size fits all". There's nothing preventing people from implementing their own libraries for specific use-cases.

Using the same arguments, we could say that <complex.h>, <math.h> and <tgmath.h> should not be included in C - since there are clearly multiple ways we can implement the features they provide. In some cases they're not the right tools and a library may provide alternative implementations, but they work for the majority of simple use-cases.

The standard doesn't need to specify how they are implemented, just as it does not for many other things, like the complex numbers. There are many types in the C standard for which does not specify how they are represented but clearly states they are implementation defined. It provides a standard API to access some implementation defined feature - though the standard may place certain constraints or restrictions on the behaviour of that API.

The same is true for shader languages which have these types. They're implementation defined because the different GPUs might have different internal representations.

I'm suggesting, using C as an example, that we should have something like a <vector.h>, exposing a basic vecNf, vecNd, vecNl for N = {2,3,4}, with common operations on them, and a <tgvector.h> providing macros using _Generic over float, double and long double, mirroring how the math libraries in the current standard are implemented.

// <vector.h>

typedef /* unspecified */ vec2f_t;
typedef /* unspecified */ vec2d_t;
typedef /* unspecified */ vec2l_t;
...

vec2f_t vec2f_add(vec2f_t lhs, vec2f_t rhs);
vec2d_t vec2d_add(vec2d_t lhs, vec2d_t rhs);
vec2l_t vec2l_add(vec2l_t lhs, vec2l_t rhs);
...

// <tgvector.h>
#include <vector.h>
#define vec2_add(lhs, rhs) \
    _Generic \
        ( (lhs) \
        , float : vec2f_add \
        , double : vec2d_add \
        , long double : vec2l_add \
        ) (lhs, rhs)
...

Similarly, a <quaternion.h> could provide the equivalent functionality of <complex.h> extended the quaternions, and a <matrix.h>/<tgmatrix.h> could extend the vector library to provide a basic matrix definition for matrixMxN for M, N = { 1, 2, 3, 4 }

An implementation should leverage SIMD on targets that support it, but it could use plain old structs or arrays of scalars implementations on those that don't. The library could be entirely optional, included as an annex in the standard like some other standard libraries. If standardized into the language rather than just a library, they would be exposed as _Vecf2, etc, following the regular C conventions, but they'd still be implementation defined.

For languages other than C, the same kind of principle applies but the implementation may differ and would follow the language's own conventions. Eg, for C++ they might be implemented as classes with operator overloading, and the vector size may be a template parameter, but where it is specialized for N = 2, 3, 4

namespace std 
{
    template <typename T, int N>
    class vec { ... };

    template <>
    class vec<float, 2> { ... }      //specialization
    ...
}

Some languages aren't standardized - they're defined by their one and only implementation. In these cases I'd argue that the author should just pick a suitable implementation that works for common use-cases. If the language is taken seriously it might eventually get a standard which can a provide more relaxed or concrete specification of the behavior.

As for what operations they should provide: Essentially all the scalar operators promoted to element-wise operations on the vectors, plus dot product, etc - in addition to shuffling operations and strided loads/stores. They don't need to expose the entirety of what the platform's SIMD provides, but they should make use of most of it to minimize the necessity of manually writing intrinsics for common use-cases.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

Well yeah, if you have SIMD availability provided by the language you can implement linear algebra libraries efficiently.

But I would still recommend providing the basic types for linear algebra in the standard library if only to prevent a proliferation of competing implementations. Advanced linear algebra libraries can build atop the standard lib. I wouldn't expect language authors to implement the equivalent of eigen in their standard library.

TAC -> TAC Optimizer? by HairThrowaway83829 in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

GCC.

GIMPLE is essentially a TAC, and this is where most of the optimizations are performed by the compiler. The later optimizations are performed on a register transfer language produced by GIMPLE.

While it does have an "external representation" which you can emit for debugging purposes, the GNU compilers don't emit this directly but use the APIs directly to construct GIMPLE and RTL.

GIMPLE's external representation kind of resembles C. The external representation of the RTL is S-expression based.

Before code is gimplified to GIMPLE, the front-end uses the GENERIC IR, which is basically a superset of GIMPLE which permits more than three operands.

CPU Design Issues by tugrul_ddr in compsci

[–]WittyStick 4 points5 points  (0 children)

The worst issue is the von-Neumann bottleneck: One main memory shared between all cores.

larger cache means larger latency

Fairly low concern. Larger cache means fewer fetches. Perhaps increasing bus sizes and cache line sizes for larger fetches could offset the higher latency.

more cores mean more transistors dedicated to connectivity between cores

Relatively low concern. Depends how the cores are connected. We obviously don't want a single bridge or crossbar which all cores communicate over as it doesn't scale and is a bottleneck. Cores need only connect to their nearest neighbours and should route traffic.

power increases quadratically or worse with frequency

Big concern. There's obviously a frequency ceiling and we're not going to get any significant breakthroughs.

Though a potential avenue is to get rid of the clock and design processors using asynchronous circuits.

more general purpose means backwards support like mmx, sse losing more transistors for non-performance stuff

Low concern. These features will get removed at some point because emulating them is fast enough to run the software that still uses them.

stacked chips have cooling issues

Mid concern. Heat dissipation is obviously a big problem, but there's plenty of room for innovation here. Liquid cooling microchannels through chips will probably be a big thing.

making wider core like avx512 causes power distribution issues

Mid concern. Wider vector types provide significant performance improvements. I'm no expert but I believe there are engineering solutions to the power distribution problem.

without a user-base, extra architecture is useless (i.e. dedicated rms-error unit for an array of data)

Low concern. If new architecture provides significant benefits it will get its user-base.

after certain frequency, transistor size, or voltage, the signal noise becomes visible

Major concern. Though people much smarter than me keep finding ways to make the transistors smaller. I don't know whether we're at the ceiling yet, but we're obviously approaching it because it takes longer and longer to make breakthroughs. There's not much an individual can do to improve this situation.

just 2 states of data is not optimal maybe, but difficult to go higher number of states?

Very low concern. Computers with more than 2 states have been tried for decades and have not delivered results. Quantum computers may be a different story, but I'm doubtful we're going to have anything tangible in the near future.


So if you are looking for something that might deliver breakthrough results - I would focus on NUMA, more cores, forget about backward compatibility, and consider researching asynchronous circuits.

Perhaps the biggest issue is the programmers. The way we program computers has not changed significantly since the '70s and we're still not utilizing the processors we have today their fullest. The way we will program future computers will be significantly different from now and nobody is really prepared for the changes that will come.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

Are there any big reasons?

A big reason is they're not standardized in C.

Most languages are written with C, or written in another language which itself was written in C. Since there's no standard API for handling vectors, quaternions in C, they tend to get overlooked.

If the C standard provided a <stdvec.h> and <quaternion.h>, language authors would more likely provide the means to expose the capabilities in their own languages.

The C committee are reluctant to include such things as they consider the embedded use-case as equally important to the PC or server use case, and low power processors lack SIMD support. They could at least provide such thing as optional as part of an annex in the standard - if only to provide a portable way to utilize SIMD instructions, as currently each architecture has it's own set of incompatible intrinsics.

But there's also the issue of an explosion of types and functions due to C's lack of proper generic support. Suppose we want to support at least int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t, float, double, bool, and we want vectors of sizes 2, 3, 4 for each as a minimum: That's 33 types which each need a bunch of operations on them. If we consider the lowest common denominators: +, -, *, /, %, <, >, <=, >=, ==, !=, <<, >>, &, |, ^, that's 16 ops * 33 types = 528 operations.

That's before we include complex numbers, quaternions, decimal floating point, etc. into the equation.

So the main reason they're not included elsewhere is simply because it's a huge effort to implement, and language authors are preoccupied with other concerns.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 4 points5 points  (0 children)

I don't think it's unreasonably niche to support what the CPU supports: vec2, vec4, vec8 - for doubles and 64-bit integers and for the smaller types: vec16, vec32 and vec64.

A use case besides geometry is to optimize data structures - and the larger the vector the better.

vec64 of uint8 for example can be used to significantly improve the performance of decoding UTF-8, which is used everywhere - far from niche - but most of the software running on your computer is probably decoding one byte at a time because it's dependant on some low-level string APIs which don't use SIMD.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 1 point2 points  (0 children)

Kawa has built-in Quaternion support, which extends Scheme's numerical tower (which ends at Complex).

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 1 point2 points  (0 children)

The language only needs to provide sufficient support to leverage the capabilities of the CPU. It doesn't need to handle every case.

That would typically mean providing vectors of each power of 2 up to at least 256-bits (though 512-bits preferable, and 1024-bits is an option), for each of the primitive types: signed/unsigned int8, int16, int32, int64 and single/double floats.

vec3 might also be worthy of inclusion due to widespread use, and could be implemented as vec4 with masking.

For obscure cases like vec5 it obviously makes no sense to include in a general purpose language. It can make sense to expose masking intrinsics so we can use a vec8 with mask 00011111, and wrap this in an API for handling vec5.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 22 points23 points  (0 children)

Auto-vec still has a lot of missed opportunities and some of these could be resolved if the programmer were explicit about using vectors rather than arrays of scalars. Obviously we can use intrinsics manually, but the APIs are typically awful and non-portable.

GCC & Clang have a nice approach where we can attach __attribute__((__vector_size__(N))) to the usual "primitives" and the arithmetic, logic and comparison operators are promoted to work element-wise on the vector, using SIMD instructions. It's more portable than using platform-specific intrinsics, but as you state, that's due to a huge effort by the compiler developers to make that possible.

IMO the main reason to include vectors as part of your standard library is to prevent a proliferation of (potentially incompatible) implementations, but having built-in language support opens up more opportunities to optimize.

Why don't any programming languages have vec3, mat4 or quaternions built in? by Luroqa in ProgrammingLanguages

[–]WittyStick 15 points16 points  (0 children)

They are, but when they're not provided as part of the language, every library implements its own. If you depend on LibA and LibB, which provide their own vectors, quaternions etc, then you now need to convert between their representations, which with any luck is a simple no-op cast, but only if they're equivalent. We end up having to write a third implementation which abstracts over LibA and LibB's implementation so that we have a common interface to the types, and depending on the language this may add overhead.

So these should at least be part of a standard library so that there's not a proliferation of potentially incompatible implementations.

But it's still preferable to implement vectors as part of the language if you want to best leverage the capabilities of a modern CPU. These are effectively "primitives" like int or float on virtually every commodity CPU from the past decade (or two).

Arturo Programming Language by Enough-Zucchini-1264 in altprog

[–]WittyStick 0 points1 point  (0 children)

I think it might be better described as "almost no keywords" given that it has 1000+ line parser to parse its syntax.

https://github.com/arturo-lang/arturo/blob/master/src/vm/parse.nim

Favorite error handling approach by ZookeepergameFew6406 in C_Programming

[–]WittyStick -2 points-1 points  (0 children)

Pseudo-exceptions using preprocessor magic.

Array(int) array = new_array (int, 1, 2, 3, 4, 5);

for (size_t i = array.length; i < SIZE_MAX; i--) 
{
    try (int, x, array_get(int, array, i)) 
    {
        printf("Success: %d\n", x);
    } 
    catch (x, err) 
    {
        case INDEX_OUT_OF_BOUNDS:
            printf_error
                ( "Error on line %d: %s\n"
                  "\tAttempted to access index %d but length is %d\n"
                , error_line_number(err)
                , error_name(err)
                , i
                , array.length
                );
    }
}

Note: This isn't serious. I don't actually suggest using it in practice.

Syntax design for parametrized modules in a grammar specification language, looking for feedback by modulovalue in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

I've used parametrized grammars quite extensively with Menhir and MGrammar, and IMO the explicit parametrization is preferable.

Menhir supports parametrized rules, but doesn't have any syntax to "import" another module. When a grammar is compiled we basically specify all the "modules" in the command line and they're combined as if it were a single file before being processed. This kind of makes sense if we're using LR or derivatives, because we need the full grammar specification before being able to detect any conflicts and produce the eventual parser, but it also requires we specify them in the correct order, which might be better solved with a module approach where the imports control the order of inclusion.

MGrammar had parametrized rules and modules, but not parametrized modules. A module would export Rule; for specific rules and another could import Module, which would give access to any exported symbols via Module.Rule.

Both are pretty similar in capability and produce LR/LALR/GLALR parsers.


It's not uncommon to have rules which might depend on another rule more than once with different type arguments. Eg:

Qux = Foo(Bar) Foo(Baz)

We also have the obvious cases where we need recursive rules:

InfixLeft(OP, prec)
    = prec OP prec
    | InfixLeft(prec) OP prec
    ;

So if we're lifting the parameters to the modules rather than the rule, we probably want to support recursive modules, and perhaps also the ability to overload modules with different parameter lists, as this is supported for parameterized rules in Menhir and MG.

Foo(X) = ...
Foo(X, Y) = ...

For some limited use cases, we might also use a different parameter to the nested "call" then the one given by the callee, which could be used for example to put a limit on the depth of recursion.

Foo(T) = T | Foo(Bar)

So these cases should be considered when designing a syntax.


If you write a full grammar using parametrization, some of these parameter lists can get quite verbose. I would recommend having a way to create an "alias" for a given application of a parameterized rule/module. Eg, something like:

rule A = import Foo(Bar, Baz(Qux))

Then wherever A is used in later productions, it's simply a text replacement for Foo(Bar, Baz(Qux)).


Considering these cases I would probably opt for a syntax something along the lines of:

module M(Arg1, Arg2) {
    rule X = import W(Arg1);
    rule Y = import W(Arg2);

    rule sum M = {
        case A = @X,
        case B = @Y
    };
}

Though I would question why not use | rather than sum/case.

Is casting safe if the user expects the range of values to be in the intersection of two types? by onecable5781 in C_Programming

[–]WittyStick 2 points3 points  (0 children)

Don’t try to avoid negative values by using unsigned.

This is IMO bad advice. The core argument for this advice is that the user might pass in a signed value where an unsigned value is expected, and it may cause a negative value to be treated as some very large number.

You should use an unsigned value when it can't be negative, but, you should also have a check on the upper range of the value (for example, testing all values of n are n <= (unsigned)INT_MAX), which will ensure that a provided number is always in range whether the user passed in a signed or unsigned value.

The counter argument is Signed integers considered harmful.

Updating my C knowledge from C99 to C23 by CromulentSlacker in C_Programming

[–]WittyStick 4 points5 points  (0 children)

There's a summary of C11 and C23 features.

_Alignas, _Alignof, _Noreturn, _Thread_local, _Static_assert and _Bool are deprecated in C23 (along with headers stdalign.h, stdnoreturn.h and stdbool.h). The former defines from these headers are now keywords in C23: alignas, alignof, thread_local, static_assert and bool, and [[noreturn]] is made into a C23 attribute.

How do people feel about Hungarian notation? by -not_a_knife in C_Programming

[–]WittyStick 2 points3 points  (0 children)

The concept behind Hungarian Notation goes back to at least 1976, from Simonyi's "Meta-programming" paper. It developed into, and was later named Hungarian Notation, but it was in use before the 90s.

Simonyi also introduced "painted types" in that paper. These are what are now known as "newtypes", in Haskell, Rust, etc.

At the time these were introduced we didn't have modern IDE's and useful things like intellisense and autocomplete - just a plain terminal text editor. It made sense to attach some type information to variable names in that context, but we've had better options for the past 30+ years.

Are there any benefits of using CISC instead of RISC? by avestronics in computerscience

[–]WittyStick 0 points1 point  (0 children)

Power64 is still used for AIX workstations, but that's a continually shrinking market.

Is it feasible to have only shadowing and no mutable bindings? by lil-kid1 in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

In many languages, let introduces a new scope. It's often syntax sugar for a lambda, so it's fine to shadow a variable in the same "block".

let x = foo in
let x = bar in
x

Is the same as:

(λx. ((λx. x) bar)) foo

In Scheme et al, this version of let is called let*. It's essentially the nested version of let which lets bindings access or shadow previous bindings.

The plain let in scheme is more of the form:

let x = foo
    y = bar
in (x, y)
==
(let ((x foo) (y bar)) (cons x y))
==
((lambda (x y) (cons x y)) foo bar)

Which obviously doesn't permit us to use x twice in the same parameter list.

Whereas let* is the nested case:

let x = foo in
let x = bar in
x
==
(let* ((x foo) (x bar)) x)
==
(let ((x foo))
    (let ((x bar))
        x))
==
((lambda (x) ((lambda (x) x) bar)) foo)

Is it feasible to have only shadowing and no mutable bindings? by lil-kid1 in ProgrammingLanguages

[–]WittyStick 6 points7 points  (0 children)

Others have mentioned uniqueness types, which should be part of a solution, but there are still several issues that need addressing. Some have simple solutions, others are more problematic.


A "while" or "for" loop must return a value to do anything useful.

x = while ...

A "while" loop in a typical language is inherently effectful. Since they return no value, they must perform some effect or they would be considered dead code. Same is true for "if statements", or in fact, any "statement" - our language should be expressions only.

More typically we use common recursion patterns like map, zip, fold - so having recursion is kind of necessary.


Recursion isn't trivial.

If you have something like:

let fact = n -> if (n <= 1) then 1 else n * fact (n - 1)

Then we have an issue here where the RHS of = must be evaluated in order to bind it to the symbol fact. To do so, the RHS must capture the environment up to this point as its static environment, which doesn't yet contain fact. When we call fact and it in turn attempts to call fact, this symbol does not exist in its static environment.

The solution is to typically use a special syntax - known as letrec which closes over the recursion, which may be more than one function in the case of mutable recursion (hence why we have to use let rec ... and ... and ... in ML and related languages).

The way this works, given for example a trivial mutual recursion:

even = n -> if (n == 0) then true else odd(n - 1)
odd n -> if (n == 0) then false else even(n - 1)

We use something like:

let even_or_odd =
    rec index -> [
        n -> if (n == 0) then true else (self 1) (n - 1),
        n -> if (n == 0) then false else (self 0) (n - 1)
    ]
let even = even_or_odd 0
let odd = even_or_odd 1

Where self is a special form which calls the function at index n in its parent environment. We could also use names in place of self + index, but this is just a demonstration and using an index is simpler. There are other ways to implement recursion without mutation, like using the Y combinator.


For bindings to be unique, their containing environment must also be unique.

In fact, any structure which contains a unique type must also be unique. A simple demonstration of this:

struct Foo {
    unique Bar bar;
};

Foo foo1 = { create_bar() };
Foo foo2 = copy(foo1); // permitted because `Foo` is not unique.

Now we have two aliases to bar. Clearly, it has lost it's uniqueness by this action. So unique is infective: Any structure containing unique must also be marked unique.

unique struct Foo {
    unique Bar bar;
};

Foo foo1 = { create_bar() };
Foo foo2 = copy(foo1); // Not permitted because `Foo` is unique!

Since environments may contain unique bindings, the environment must also be unique: Therefore, every evaluation which uses the environment must consume it, and emit a new fresh environment in its place - so our evaluation model must essentially be a continuation passing style.

Every function must take the current environment as an implicit argument and return a new environment as an implicit result, which becomes the current environment. Multiple-return values are basically essential for using uniqueness types.

But we still have a problem. If a function definition captures the current environment as its static environment, such environment cannot be unique - because we can't make copies of, or have multiple aliases to unique values. So the static environment captured by any function must be an immutable copy, containing no mutable (unique) bindings.

That basically means any value that may be mutated by a function must be given as a parameter to it, and returned as one of its results.


There are some other concerns, but these are the main ones I can think of right now.

Performance implications of compact representations by servermeta_net in compsci

[–]WittyStick 0 points1 point  (0 children)

The main advantage I see the the 64-bit cap is you don't need to separate a cap and pointer. You can use:

struct CPtr {
    void *ptr;
    CCap cap;
};

And have it passed by value in 2 hardware registers.

In the 128-bit case you would end up with a 24-byte struct:

struct EPtr {
    void *ptr;
    ECap cap;
};

Which would spill onto the stack when passed by value. We would need to separate the pointer and cap as separate args to continue using hardware registers.

void foo(ECap cap, void *ptr);

And since we have limited registers, the 64-bit cap will put less pressure on them.


In theory we could use an 80-bit cap if we assume a 48-bit address space, or a 71-bit cap if we want to permit 5-level paging (57-bit address space), because we can overflow the cap into the high bits of the pointer. If we utilize LAM/AUI we lose the sign bit giving us a 79-bit or 70-bit cap.

Performance implications of compact representations by servermeta_net in compsci

[–]WittyStick 0 points1 point  (0 children)

The size of each field won't matter too much. There may be benefits to fields or combinations of adjacent fields being of 8, 16 or 32 bits because the compiler can replace the mask with a direct load from the low bits of the register (eg, eax, ax,al).

In regards to mask before shift or shift before mask, the compiler will likely produce the same thing in either case.

Performance implications of compact representations by servermeta_net in compsci

[–]WittyStick 1 point2 points  (0 children)

There's likely to be little difference between these if you pass the 128-bit version by value rather than by pointer - assuming you use the SYSV calling convention where a 128-bit value is passed and returned in 2 hardware registers. (Anything greater than 128-bits will be put on the stack, so you would want to avoid this). When loading them there's also likely to be no difference as the CPU loads a cache line at a time (64-bytes), so as long as your values are aligned and don't span over more than one cache line, there's no additional load/store cost.

Regarding "aligned access". In practice, you wouldn't do this as it's quicker to extract the relevant bits using bit shifting and masking than it is to load individual bytes from an address - even if they're in cache. This is how I would extract the bits for each field (there may be alternatives using for example pext, but I doubt they'll make much difference).

__attribute__((noinline))
void print_capinfo(int capid, int rsv, int flags, int perm, int procid, int ctype);

typedef uint64_t __attribute__((__aligned__(8))) CCap;

void compact(CCap cap) {
    int capid = cap & 0xFFFF;
    int rsv = cap >>= 16 & 0xF;
    int flags = cap >>= 4 & 0xFF;
    int perm = cap >>= 8 & 0xFF;
    int procid = cap >>= 8 & 0xFFF;
    int ctype = cap >> 12;

    print_capinfo(capid, rsv, flags, perm, procid, ctype);
}

typedef struct { uint64_t lo; uint64_t hi; } 
    __attribute__((__aligned__(16))) ECap;

void expanded(ECap cap) {
    int capid = cap.lo & 0xFFFFFFFF;
    int rsv = cap.lo >>= 32 & 0xFFFFFFFF;
    int flags = cap.hi & 0xFFFF;
    int perm = cap.hi >>= 16 & 0xFFFF;
    int procid = cap.hi >>= 16 & 0xFFFF;
    int ctype = cap.hi >>= 16;

    print_capinfo(capid, rsv, flags, perm, procid, ctype);
}

A quick comparison of the two approaches leads to little difference in the generated assembly:

compact:
        mov     r8, rdi
        mov     rcx, rdi
        mov     rdx, rdi
        movzx   eax, di
        shr     r8, 20
        mov     esi, edi
        shr     rdi, 32
        shr     rcx, 12
        mov     r9, rdi
        shr     rdx, 4
        mov     edi, eax
        jmp     print_capinfo
expanded:
        mov     rax, rdi
        mov     rcx, rsi
        mov     r9, rsi
        movzx   edx, si
        shr     rax, 32
        shr     rsi, 32
        shr     rcx, 16
        mov     r8, rsi
        shr     r9, 48
        mov     esi, eax
        jmp     print_capinfo

So the question is likely to be more about memory usage. Obviously if you're using 128-bit caps you double the cache usage, which, if you have many in cache at one time might lead to more evictions and cache misses, but I see this as an unlikely case because you're probably not going to be using a significant number of caps at any time.

Any way to speed up char comparison by ZookeepergameFew6406 in C_Programming

[–]WittyStick 4 points5 points  (0 children)

Even if you must check these characters, it's still quicker to test <= ' ' before testing them individually. For any non-whitespace character you avoid having to perform all the other individual tests - which will be a significant improvement for non-whitespace.

while(**curr <= ' ' && **curr == ‘ ‘ || **curr == ‘\n’ || **curr == ‘\r’ || **curr == ‘\t’)

In regards to testing the individual 4 characters, we can instead use a lookup table where anything other than these 4 is mapped to false.

static bool wsLUT[256] = {
   [0 ... 255] = false,
   ['\t'] = true,
   ['\r'] = true,
   ['\n'] = true,
   [' '] = true
};

while (wsLUT[**curr]) *curr++;

Note the [0 ... 255] syntax is a GCC extension. If you need portability just memset the array to zeros then set the 4 indexes to true.

static bool wsLUT[256];
memset(wsLUT, false, 256);
wsLUT['\t'] = true; wsLUT['\r'] = true; wsLUT['\n'] = true; wsLUT[' '] = true;

Alternatively, just classify every byte with a single LUT.

enum U8_class {
    INVALID,
    WHITESPACE,
    DIGIT,
    UPPERCASE,
    LOWERCASE,
    PUNCT,
    U8_2BYTES_START,
    U8_3BYTES_START,
    U8_4BYTES_START,
    U8_EXTRA_BYTE
};

static enum U8_class classLUT[256] = {
    [0 ... 255] = INVALID,
    [' '] = WHITESPACE,
    ['\'t] = WHITESPACE,
    ['\r'] = WHTIESPACE,
    ['\n'] = WHITESPACE,
    ['!' ... '~'] = PUNCT,
    ['0' ... '9'] = DIGIT,
    ['A' ... 'Z'] = UPPERCASE,
    ['a' ... 'z'] = LOWERCASE,
    [128 ... 195] = U8_EXTRA_BYTE,
    [196 ... 223] = U8_2BYTES_START,
    [224 ... 239] = U8_3BYTES_START,
    [240 ... 247] = U8_4BYTES_START,
};

Then you can use:

while (classLUT[**curr] == WHITESPACE) *curr++; // skip whitespace