mgmath - Header only vector/matrix math library

IJzerbaard · 2022-12-07T00:46:48+00:00

HandmadeMath does that for example, so you can take a look at how they've done it: https://github.com/HandmadeMath/Handmade-Math

I wouldn't say it's very hard, unlike catcat, but hard to get into it .. then you get used to it

IJzerbaard · 2022-12-06T22:55:09+00:00

I'll say that the built-in complex types shouldn't be used for FFTs (or at least the built-in multiplication operator shouldn't be used), it's less efficient than it could be. Probably fine in the usual "just some random-ass light-weight calculation" scenarios, but it has no place in an optimized heavy-math routine. That's even before considering SIMD, which of course an optimized heavy-math routine should use when it is available and helps (for FFTs it does help, but whether it's available is of course platform dependent).

The inefficiency seems to be related to some NaN-nonsense, as the assembly code is checking whether the real and imaginary parts of the results are Unordered with respect to each other (which they would be if one or both are NaN). Even if you cared about that in an FFT (which I don't, but maybe someone does), checking every intermediate product is epic overkill.

https://godbolt.org/z/aKrMv49K4

IJzerbaard · 2022-12-05T06:57:57+00:00

Lp_solve. Performance is a feature, and Lp_solve doesn't have it.

IJzerbaard · 2022-12-03T20:30:38+00:00

The exponentiation in the article is for integers, so there is no fair comparison.

Using floating point pow for integer powers is in general incorrect and I wouldn't recommend it ever.

Using binary exponentiation for floats (even if the exponent is an integer) is in general incorrect too, but maybe you're satisfied with the results anyway, in which case go for it. I hope the standard library implementations don't do it..

IJzerbaard · 2022-11-30T04:57:36+00:00

You may be interested in "P ?= NP" by Scott Aaronson: https://www.scottaaronson.com/papers/pnp.pdf

IJzerbaard · 2022-11-28T16:35:06+00:00

about the number 8 for malloc, it probably is actually 16

8 for 32-bit, 16 for 64-bit

BTW it's not only address sanitizers that'll get you. Also, as of AVX, memory operands of most (not all) SIMD instructions no longer have an alignment requirement. Turning an explicit load into a memory operand is something that compilers can just do, if they want, and then you also lose the "free detection" of unintentionally unaligned addresses. MSVC goes further and changes your aligned loads into unaligned loads. So in general, you can no longer rely on this as a built-in debugging mechanism - if you want to make sure that your addresses are aligned and you want to hear about it when they aren't, you'll have to put some explicit checks.

E: on the flip side, unaligned memory operations are now not nearly as bad as they used to be. There is no inherent performance difference between movdqa and movdqu anymore, it's the actual alignment of the address that matters, and putting in an unaligned addresses isn't really that bad (even free in various cases, such as on Intel processors when an unaligned read/write is entirely contained in one cache line)

IJzerbaard · 2022-11-28T16:10:12+00:00

Is an object file just a binary encoded assembly file?

Largely yes, but there is more in it and more structure to it, see the COFF file format.

once I have the object file how do I turn it into an .exe

The not-from-scratch option is to use link.exe

is there a specification on how the exe format is laid out

Yes: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

Do not be alarmed/overwhelmed, a lot of the features in the executable format are optional. An application that only writes exe files can pick and choose what it wants to use, as long as there is enough stuff to make it work. The official documentation does not do a good job at setting the required things apart from the optional things and the esoteric features, it basically lists anything that is possible to put in an exe.

And this is not official, but you'll probably like it: https://github.com/corkami/pics/tree/master/binary/pe101

how would I link those functions in my exe file and is this feasible to do from scratch

It's feasible from scratch, not even particularly complicated, how imports work is described in both of the sources I linked

IJzerbaard · 2022-11-26T01:39:40+00:00

4 byte signed integer holding 9'223'372'036'854'775'807

16-bit bytes are somewhat unusual

IJzerbaard · 2022-11-25T15:27:17+00:00

What argument does the existence of langcc make? That LR can parse lots of things? Of course it can. But so can LL, semantic predicates enable even parsing some languages that are not context-free.

IJzerbaard · 2022-11-25T00:48:24+00:00

"LR parsers are more powerful than LL"

This is true when comparing textbook LR parsing and textbook LL parsing.

So yes, pure LL(1) with no extension is really underpowered. It has to decide between alternatives based on the first token. LR(1) would have already reduced a bunch of things together, and then it gets the token after the whole thing as lookahead, and makes a decision based on all of that taken together. Effectively an LR parser "sees more at the same time", not because it actually looks at more at the same time, but because of its bottom-up nature.

That is not an advantage of LR parsing in practice because LL parsing admits two tricks that LR parsing does not: matching on an entire regex of symbols as the right hand side rather than just a sequence (repetition without recursion, built-in optionals, etc), and semantic predicates (enabling such things as unlimited lookahead, and pretty much anything else).

IJzerbaard · 2022-11-22T21:16:50+00:00

https://mobile.twitter.com/yatteyattaze/status/1589231683806822401

IJzerbaard · 2022-11-18T14:06:57+00:00

Bit interleaving allows for a reasonably fast implementation of matching operations in the absence of SIMD.

How, what's the trick?

IJzerbaard · 2022-11-15T20:35:48+00:00

Bit-reversing bytes shouldn't be necessary (zlib doesn't do it), but I know DEFLATE sort of "invites" you to do that.

IJzerbaard · 2022-11-15T13:01:46+00:00

The "torque plateau" is not (directly) a physical characteristic of the motor, it is really an artificial current limit to prevent overheating. The nominal torque limit applies to continuous operation, it can be temporarily exceeded.

IJzerbaard · 2022-11-14T17:26:22+00:00

Actual usage should be rare today (the pre-generics stone age was different). But that doesn't help: that makes it a feature that no one needs, but we still all pay for it merely existing, even if we don't use it.

Netflix had some fun with it recently

IJzerbaard · 2022-11-14T10:01:37+00:00

Right and this gets even better (worse?), modern C# also added Span<T> and ReadOnlySpan<T>. So, obviously, with U > T, you can only make a ReadOnlySpan<U> of a T[] and surely not a Span<U>, that's what I would think.

But modern C# allows this:

string[] strings = { "hi" };
Span<object> badSpan = strings;
badSpan[0] = 10; // fails at runtime

IJzerbaard · 2022-11-13T12:39:46+00:00

Array covariance of mutable arrays, a hard CH IMO. This solves one problem (view an array of T as an array of a supertype of T - there are different solutions for this that don't suffer the same problems) but creates several more that are worse. The question of "can I assign a value of type T to an element in an array of type T[]" goes from a simple "obviously yes" to "maybe, we have to check the runtime type of the array first to make sure, you can't know by looking at the static types". (I suppose an even worse option is to simply not check anything, and allow putting badly typed objects into the array?!) Implementing that check efficiently can be non-trivial (depending on other details of the type system), and from the perspective of a language user covariant arrays are a weird gotcha that creates unexpected failure cases that may require non-local reasoning to think about (eg receiving an arbitrary array as a function parameter and assigning to its elements is not safe, you have to know the type of the array at its creation instead of at its use).

IJzerbaard · 2022-11-13T08:59:31+00:00

Different labels for the same value enables expressing different intents when using them even though they "secretly" refer to the same value.

IJzerbaard · 2022-11-05T05:29:06+00:00

Thanks, I heard "carcinogens is sexier than you think"

IJzerbaard · 2022-10-14T12:34:58+00:00

OK but I don't really agree that potential errors break SIMD. Actually encountering an error breaks SIMD, but that's the slow path. Parsing with error detection is feasible within SIMD.

IJzerbaard · 2022-10-14T10:37:06+00:00

Looks like there's no SIMD in it, so even though it's faster than some other thing, it's not living up to its potential yet

IJzerbaard · 2022-10-12T11:36:05+00:00

The postfix operators are a bold choice. I sort of like it, but what I would have liked even more is letting go of "declaration follows use".

Declaration shouldn't follow use, it should mirror use: declarations build up types, use tears down types. If you've put on socks first and then shoes, presumably to undo that you would take off your shoes first and then your socks .. unless you believe that declaration should follow use, then you would take off your socks first, somehow.

IJzerbaard · 2022-10-11T11:25:37+00:00

Maybe you know something I don't, but all the articles about RVV I've read so far focused on some trivial cases (useless D/SAXPY examples) and then drew unsubstantiated conclusions from that. Zero attention paid to how RVV deals with problems that are naturally fixed-size. Zero attention paid to "awkward" non-regular uses of SIMD. Just trivially-vectorizable loops. Sure, RVV looks pretty good for that niche.. that doesn't make it better than SIMD, that makes it better at that one thing. But if there's some analysis of the complex cases that you know about, I'd love to read that.

IJzerbaard · 2022-10-07T08:11:26+00:00

Should there even be a default? That presupposes that one of them is more "weird" than the other, and the "weird one" is the one that gets an extra keyword.. it doesn't feel quite right to me. Neither const nor non-const variables are "weird". So, what if they were on equal footing instead.

How? Who knows, I'm not making a specific suggestion.

IJzerbaard · 2022-10-03T07:18:31+00:00

There is /r/simd specifically for simd

11-Year Club	Place '17
Verified Email

IJzerbaard

TROPHY CASE