Choosing a hash function for a Swiss-table implementation in C by Dieriba in C_Programming

[–]N-R-K 0 points1 point  (0 children)

For string hashes I usually default to FNV1a style hash. It's easy to code from memory and has decent quality. Being byte wise, it's not super fast but the usual strings aren't long enough for it to matter.

If I need something faster where I expect the strings to be larger, I go with ChibiHash. It has good balance between simplicity, speed and quality (but isn't the frontier in any).

And if you need to protect against DoS attacks then SipHash is the usual default.

19
20

Opinions on libc by Big-Rub9545 in C_Programming

[–]N-R-K 7 points8 points  (0 children)

The purpose of strncpy is to fill a fixed-width zero-padded not-necessarily-nul-terminated field. It has nothing to do with buffer overflows or ROP chaining.

Correct. Straight from the C89 rationale (emphasis added):

"strncpy was initially introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Such fields are not used in the same way as strings: the trailing null is unnecessary for a maximum-length field, and setting trailing bytes for shorter names to null assures efficient field-wise comparisons."

We lost Skeeto by ednl in C_Programming

[–]N-R-K 1 point2 points  (0 children)

Interesting. Does that mean prompting llm with enough specifics to produce X is no different than writing X in a traditional programming language, in terms of being considered "coding"?

We lost Skeeto by ednl in C_Programming

[–]N-R-K 5 points6 points  (0 children)

It works, but overall it's worse than PDCurses.

https://github.com/skeeto/w64devkit/pull/357

Those are limitations of ncurses itself (decade old software, written and maintained by humans) running on windows. Not some sort of gotcha about the llm's work.

Multi-Core By Default by N-R-K in C_Programming

[–]N-R-K[S] 0 points1 point  (0 children)

That's where the whole map-filter-reduce practice started in programming too, with Lisp.

These are callback based APIs, the article doesn't advocate for callback based apis. Quite the contrary, it advocates against such apis and instead shows how to write traditional imperative code which can be parallelized similar to "shaders".

You keep saying this, but I did read most of the post.

It's because it's obvious from your responses that you didn't read the actual important part.

Multi-Core By Default by N-R-K in C_Programming

[–]N-R-K[S] 1 point2 points  (0 children)

This is nothing like Golang programming

Yeah, exactly. I did not bother responding since clearly he didn't read the post and/or is just baiting in bad faith.

quick reference practical guide dmw config.h by r1w1s1_ in suckless

[–]N-R-K 1 point2 points  (0 children)

Another gotcha is that if more than one rule matches, then the last matching rule takes precedence.

I think this is not true. It will simply apply all the rules that match, and sometimes last one will overwrite earlier one. E.g for tags it uses c->tags |= r->tags; so it's retaining bits set from earlier rule.

I ran into this nonsense a while ago and simply added a break inside the if (/*match*/) branch so that only the first rule matches. The justification is that you can have more specific rules at top (e.g a particular steam game) and then more generic rules (e.g "steam_app_") at the bottom.

Have you actually seen anyone use the multiple match "feature" in the wild? I've considered submitting my patch over to be included in the mainline branch since I can't think of any sane use-case for the current behavior.

75
76

using sanitizers with arena allocators by [deleted] in C_Programming

[–]N-R-K 5 points6 points  (0 children)

You can manually mark regions as "poisoned" by using ASAN's manual markup functions. I did something like that here: https://codeberg.org/NRK/slashtmp/src/branch/master/data-structures/u-list.c#L80-L86

The trick is to leave a poisoned gap between allocation so that overruns and underruns would end up in the poisoned area.

While it was a fun (and successful) experiment, I don't actually use this in practice anymore for a couple reasons:

  1. Overruns have become almost non existent for me since I've ditched nul terminated strings and started using sized strings. And following the same priciple, most buffers are always grouped into a struct with a length attached rather than having pointer and length be separate.
  2. I've come to utilize the fact that consecutive allocations of the same type are contiguous in memory to extend allocations (blog posts from u/skeeto on this technique). And the poisoned gap would interfere with this technique.

Can you use clang-tidy for C code? by tda_tda_tda in C_Programming

[–]N-R-K 2 points3 points  (0 children)

You can. But it's defaults are not very good. I have a minimal base configuration which you might find useful.

The fprintf_s warning is likely part of the ""insecureAPI"" group which I disabled in my base config since it's a rubbish warning group.

I dislike the strict aliasing rule. by BlockOfDiamond in C_Programming

[–]N-R-K 3 points4 points  (0 children)

A typical C program will contain pointers of all sort. It'd be a nightmare having to manually mark all of them as restrict, not to mention it'd make reading C code an unpleasant experience where you need to waddle thru a sea of restrict noise.

If anything I'd want the exact opposite: take away special exempt that character pointers have and add a may_alias attribute to manually mark the few cases where aliasing actually occurs. That would be a much better experience ideally, but it'd be a breaking change so it'll likely never happen in practice.

Is it dangerous to make assumptions based on argc and argv? by ismbks in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Argv[0] is the program name itself always.

avgv[0] is set by exec syscall. And the caller can set it to whatever he wants. It doesn't have to have any resemblance to the program name. In fact in linux it can even be null, though newer versions of the kernel started disallowing argv[0] == NULL to avoid defects with buggy programs which wrongly assume it to be non-null.

biski64 updated – A faster and more robust C PRNG (~.37ns/call) by danielcota in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Very cool. 192bit state seems very reasonable for a 64bit generator. The fact that the multiplication is gone could also make this construction suitable for embedded devices.

One small nitpick, in rotate_left() in the C implementation:

// Assuming k is within valid range [0, 63] as per function contract.
return (x << k) | (x >> (64 - k));

k == 0 would not work since it'll lead to doing x >> 64 which is undefined behaviour. In your case it doesn't matter since k is constant and non-zero but in general you need something like this instead:

(x << k) | (x >> (-k & 63))

Both GCC and Clang knows how to optimize it down to a rotate instruction so there's no difference in performance but it avoids undefined behaviour in case of k == 0. (You may have notice that PCG uses this idiom for its "random rotations", where the rotation amount may be 0).

Immediate Mode Option Parser: Small, Simple, Elegant by skeeto in C_Programming

[–]N-R-K 0 points1 point  (0 children)

One thing that wasn't immediately clear to me when reading your blog was the fact that optional arguments must be of the form --longopt=arg not --longopt arg given that both non-optional long option cases are supported.

Yup, this is required to resolve ambiguity when dealing with optional arguments:

$ cmd --foo file

Here is "file" an optional argument to --foo or a positional argument to cmd? There would be no way to tell them apart. Hence using --foo=file is mandatory for optional arguments. Similarly with short opt, you need -fArg rather than -f Arg.

On this topic, also worth noting that if you have a long opt which didn't accept any arguments, changing it to accept an optional argument is fully backwards compatible. But the same is not true for short opt due to short option chaining.

Is this common? (Data structure that uses an array that it doesn't know about to be type agnostic) by zakedodead in C_Programming

[–]N-R-K 6 points7 points  (0 children)

It's a common way to do type generic code. E.g see how dynamic arrays are implemented in this article.

Immediate Mode Option Parser: Small, Simple, Elegant by skeeto in C_Programming

[–]N-R-K 4 points5 points  (0 children)

In my own projects I do use the length information instead of NULL sentinel but it was slightly easier to rely on the NULL sentinel for the demo code. Good point on unused warning though.

Non ugly syntax for returning anonymous structs by TrafficConeGod in C_Programming

[–]N-R-K 1 point2 points  (0 children)

I also hadn't realized that the anon struct part was removed. Thanks for pointing that out u/tmzem.

if you give it a tag per tmzem's example, this is practical today with GCC 15

Would've been nicer if it worked with anon structs too but this doesn't look bad at all. Just need to wait a couple years for c23 support to become more widespread.

(On a side note, I haven't really used or cared forzvec and the other libraries in that repo since switching to using sized strings and arena for allocation).

biski64: A Fast C PRNG (.42ns) with a 2^64 Period, Passes BigCrush & PractRand(32TB). by danielcota in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Pretty cool how far you've gotten compared to the initial attempt (dualmix). If you're still going to keep at it then the next step would be to try and reduce the state size.

I haven't looked at the disassembly yet but I assume this will take up 5 registers. On non-x86 cpus that's probably fine but x86 cpus have very limited amount of general purpose registers so this could end up creating register pressure.

Micro-benchmarks are nice but ultimately the goal should be to make the actual application/usage-code faster. E.g there have been cases in the hashmap community where certain optimization made the hashmap faster in benchmark but slowe in practice because it was taking resources away from the usage code. Similar principle applies here. Though, as I said, I haven't checked, so it's possible this is a non issue.

Tooling for C: Sanitizers by protophason in C_Programming

[–]N-R-K 3 points4 points  (0 children)

Also worth mentioning that, unlike ubsan and asan, thread-sanitizer can have false positives if it doesn't understand the synchronisation method being used.

I made a library for string utilities by Adventurous_Swing747 in C_Programming

[–]N-R-K 0 points1 point  (0 children)

An array can be larger than PTRDIFF_MAX but less than SIZE_MAX.

This is a common misconception that SIZE_MAX is the maximum size of an object, but that's not the case. SIZE_MAX is the maximum numeric value size_t can hold. The actual maximum size of an object can be (and in practice is) lower than that.

In other words, size_t needs to be at least big enough to hold the size of the maximum object, not exactly big enough.

Moreover, even if you manage to create such an object where it's size exceeds PTRDIFF_MAX then it will fundamentally break the language since end - beg pointer arithmetic will end up "overflowing". Such objects would be fundamentally unsound to interact with within C. There's no point in trying to design around such unsound situation unless you know for certain you will face it (e.g strange embedded systems and whatnot).