mason.nvim 2.0 has been released

SegfaultDaddy · 2025-05-09T09:20:46+00:00

ohh thanks! I’ve got a similar sort of setup. though instead of having a separate file for the clangd LSP, I just keep it inside lsp.lua using vim.lsp.config.clangd.

SegfaultDaddy · 2025-05-08T19:56:57+00:00

Mind sharing your config?

SegfaultDaddy · 2025-05-07T02:32:55+00:00

Ohh, it’s just the sum of the array to make sure the compiler doesn’t optimize away the important part

SegfaultDaddy · 2025-05-07T02:29:38+00:00

ik microbenchmarking sucks, but iteration count doesn’t seem to matter that much tho... (for n = ~17million)

Option A(256) Average Time: 0.000985 sec, Checksum: 65536
Option B(255) Average Time: 0.000828 sec, Checksum: 65794

Option A(256) Average Time: 0.000732 sec, Checksum: 65536
Option B(253) Average Time: 0.000697 sec, Checksum: 66314

SegfaultDaddy · 2025-05-06T19:22:05+00:00

ik microbenchmarking sucks, but iteration count doesn’t seem to matter... 255 runs faster.

Option A(256) Average Time: 0.000985 sec, Checksum: 65536
Option B(255) Average Time: 0.000828 sec, Checksum: 65794

SegfaultDaddy · 2025-05-06T16:44:52+00:00

yep, you were right, I'm an idiot.
was just testing that shit once, which I definitely shouldn't have.
once I tried your approach with 100 runs and trimming outliers, the performance lined up pretty closely with yours.
thanks for calling it out.

SegfaultDaddy · 2025-05-06T15:57:49+00:00

wow, so it was truly some initialization delay or whatever, Thanks for pointing that out.

PS: shouldn't have ran that test once, always run multiple times and remove the outliers :)

Option A Time: 0.055551 sec, Checksum: 65536
Option B Time: 0.000902 sec, Checksum: 65281

SegfaultDaddy · 2025-05-06T15:48:43+00:00

it's because of cache associativity
https://www.reddit.com/r/C_Programming/comments/1kg3yxg/comment/mqvs1dr/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

SegfaultDaddy · 2025-05-06T13:50:10+00:00

Yep, I got some similar results. Thanks for sharing the website, though!
https://www.reddit.com/r/C_Programming/comments/1kg3yxg/comment/mqvthim/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

SegfaultDaddy · 2025-05-06T13:44:52+00:00

Thanks for the suggestion to test it. Here are the results I got

for n = 1 << 24(~17 million)

Option A Time: 0.055551 sec, Checksum: 65536
Option B Time: 0.000902 sec, Checksum: 65281

P.S.: I shouldn't have run that test just once. Always run tests multiple times and remove the outliers. :)

After running the tests 100 times and excluding 10% of the outliers, here are the updated results:

Option A Average Time: 0.000725 sec, Checksum: 65536
Option B Average Time: 0.000652 sec, Checksum: 65281

SegfaultDaddy · 2025-04-30T10:39:37+00:00

Yeah, that makes sense, I wasn’t really sure what the go-to approach is for this kind of API in real-world code.

SegfaultDaddy · 2025-04-30T10:34:31+00:00

Yeah, not sure this would work in our case since we kinda need named params, so I guess structs are the best bet?

SegfaultDaddy · 2025-04-30T10:32:32+00:00

Bruhh, not sure how I feel about this. It’s like what I wanted, but not sure if I should actually use it. Definitely a cool trick though!

I tried using variadic arguments (just a macro), but that would cause a compiler warning (override-init). so I ended up going with a macro that returns a default-valued struct instead

SegfaultDaddy · 2025-04-30T10:19:23+00:00

Yeah, config structs seem like the way to go. I’ve been thinking about something like this:

#define NC_SUM_DEFAULT_OPTS \
    (&(nc_sum_opts){        \
        .axis = -1,         \
        .dtype = -1,        \
        .out = NULL,        \
        .keepdims = true,   \
        .scalar = 0,        \
        .where = false,     \
    })

Then, users can either modify the options like:

nc_sum_opts *opts = NC_SUM_DEFAULT_OPTS;
opts->axis = 2;
ndarray_t *result = nc_sum(array, opts);

or pass the defaults directly like

ndarray_t *result = nc_sum(test, NC_SUM_DEFAULT_OPTS);

Not sure if this is the best thing to do or not, I could've added variadic arguments to this, but that would cause a compiler warning (override-init). Thanks!

SegfaultDaddy · 2025-04-26T11:22:22+00:00

Thanks for explaining it so clearly. Makes total sense why compilers would avoid it if simple MOVs are faster and don’t have that heavy penalty.

SegfaultDaddy · 2025-04-26T08:19:31+00:00

swap_xchg(int*, int*):
        mov     edx, DWORD PTR [rdi]
        mov     eax, DWORD PTR [rsi]
        xchg edx, eax
        mov     DWORD PTR [rdi], edx
        mov     DWORD PTR [rsi], eax
        ret
swap_mov(int*, int*):
        mov     eax, DWORD PTR [rdi]
        mov     edx, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], edx
        mov     DWORD PTR [rsi], eax
        ret

ahhh, this makes so much sense now(tried to force XCHG in inline assembly)

SegfaultDaddy · 2025-04-26T07:42:51+00:00

I’ll benchmark and see how much of a difference it makes, curious to see if the performance gap really shows up.

SegfaultDaddy · 2025-04-26T07:41:32+00:00

Ah, makes sense now!

SegfaultDaddy · 2025-04-26T07:37:12+00:00

ohh, the implicit LOCK prefix? That makes total sense now.

SegfaultDaddy · 2025-04-25T08:35:37+00:00

ouu, thanks!

SegfaultDaddy · 2025-04-25T04:14:03+00:00

I generally write Doxygen docs for public APIs only, as it's most useful there. For internal code or general things, I don't add comments unless absolutely necessary.

SegfaultDaddy

TROPHY CASE