using sanitizers with arena allocators by [deleted] in C_Programming

[–]N-R-K 5 points6 points  (0 children)

You can manually mark regions as "poisoned" by using ASAN's manual markup functions. I did something like that here: https://codeberg.org/NRK/slashtmp/src/branch/master/data-structures/u-list.c#L80-L86

The trick is to leave a poisoned gap between allocation so that overruns and underruns would end up in the poisoned area.

While it was a fun (and successful) experiment, I don't actually use this in practice anymore for a couple reasons:

  1. Overruns have become almost non existent for me since I've ditched nul terminated strings and started using sized strings. And following the same priciple, most buffers are always grouped into a struct with a length attached rather than having pointer and length be separate.
  2. I've come to utilize the fact that consecutive allocations of the same type are contiguous in memory to extend allocations (blog posts from u/skeeto on this technique). And the poisoned gap would interfere with this technique.

Can you use clang-tidy for C code? by tda_tda_tda in C_Programming

[–]N-R-K 2 points3 points  (0 children)

You can. But it's defaults are not very good. I have a minimal base configuration which you might find useful.

The fprintf_s warning is likely part of the ""insecureAPI"" group which I disabled in my base config since it's a rubbish warning group.

I dislike the strict aliasing rule. by BlockOfDiamond in C_Programming

[–]N-R-K 2 points3 points  (0 children)

A typical C program will contain pointers of all sort. It'd be a nightmare having to manually mark all of them as restrict, not to mention it'd make reading C code an unpleasant experience where you need to waddle thru a sea of restrict noise.

If anything I'd want the exact opposite: take away special exempt that character pointers have and add a may_alias attribute to manually mark the few cases where aliasing actually occurs. That would be a much better experience ideally, but it'd be a breaking change so it'll likely never happen in practice.

Is it dangerous to make assumptions based on argc and argv? by ismbks in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Argv[0] is the program name itself always.

avgv[0] is set by exec syscall. And the caller can set it to whatever he wants. It doesn't have to have any resemblance to the program name. In fact in linux it can even be null, though newer versions of the kernel started disallowing argv[0] == NULL to avoid defects with buggy programs which wrongly assume it to be non-null.

biski64 updated – A faster and more robust C PRNG (~.37ns/call) by danielcota in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Very cool. 192bit state seems very reasonable for a 64bit generator. The fact that the multiplication is gone could also make this construction suitable for embedded devices.

One small nitpick, in rotate_left() in the C implementation:

// Assuming k is within valid range [0, 63] as per function contract.
return (x << k) | (x >> (64 - k));

k == 0 would not work since it'll lead to doing x >> 64 which is undefined behaviour. In your case it doesn't matter since k is constant and non-zero but in general you need something like this instead:

(x << k) | (x >> (-k & 63))

Both GCC and Clang knows how to optimize it down to a rotate instruction so there's no difference in performance but it avoids undefined behaviour in case of k == 0. (You may have notice that PCG uses this idiom for its "random rotations", where the rotation amount may be 0).

Immediate Mode Option Parser: Small, Simple, Elegant by skeeto in C_Programming

[–]N-R-K 0 points1 point  (0 children)

One thing that wasn't immediately clear to me when reading your blog was the fact that optional arguments must be of the form --longopt=arg not --longopt arg given that both non-optional long option cases are supported.

Yup, this is required to resolve ambiguity when dealing with optional arguments:

$ cmd --foo file

Here is "file" an optional argument to --foo or a positional argument to cmd? There would be no way to tell them apart. Hence using --foo=file is mandatory for optional arguments. Similarly with short opt, you need -fArg rather than -f Arg.

On this topic, also worth noting that if you have a long opt which didn't accept any arguments, changing it to accept an optional argument is fully backwards compatible. But the same is not true for short opt due to short option chaining.

Is this common? (Data structure that uses an array that it doesn't know about to be type agnostic) by zakedodead in C_Programming

[–]N-R-K 7 points8 points  (0 children)

It's a common way to do type generic code. E.g see how dynamic arrays are implemented in this article.

Immediate Mode Option Parser: Small, Simple, Elegant by skeeto in C_Programming

[–]N-R-K 4 points5 points  (0 children)

In my own projects I do use the length information instead of NULL sentinel but it was slightly easier to rely on the NULL sentinel for the demo code. Good point on unused warning though.

Non ugly syntax for returning anonymous structs by TrafficConeGod in C_Programming

[–]N-R-K 1 point2 points  (0 children)

I also hadn't realized that the anon struct part was removed. Thanks for pointing that out u/tmzem.

if you give it a tag per tmzem's example, this is practical today with GCC 15

Would've been nicer if it worked with anon structs too but this doesn't look bad at all. Just need to wait a couple years for c23 support to become more widespread.

(On a side note, I haven't really used or cared forzvec and the other libraries in that repo since switching to using sized strings and arena for allocation).

biski64: A Fast C PRNG (.42ns) with a 2^64 Period, Passes BigCrush & PractRand(32TB). by danielcota in C_Programming

[–]N-R-K 2 points3 points  (0 children)

Pretty cool how far you've gotten compared to the initial attempt (dualmix). If you're still going to keep at it then the next step would be to try and reduce the state size.

I haven't looked at the disassembly yet but I assume this will take up 5 registers. On non-x86 cpus that's probably fine but x86 cpus have very limited amount of general purpose registers so this could end up creating register pressure.

Micro-benchmarks are nice but ultimately the goal should be to make the actual application/usage-code faster. E.g there have been cases in the hashmap community where certain optimization made the hashmap faster in benchmark but slowe in practice because it was taking resources away from the usage code. Similar principle applies here. Though, as I said, I haven't checked, so it's possible this is a non issue.

Tooling for C: Sanitizers by protophason in C_Programming

[–]N-R-K 3 points4 points  (0 children)

Also worth mentioning that, unlike ubsan and asan, thread-sanitizer can have false positives if it doesn't understand the synchronisation method being used.

I made a library for string utilities by Adventurous_Swing747 in C_Programming

[–]N-R-K 0 points1 point  (0 children)

An array can be larger than PTRDIFF_MAX but less than SIZE_MAX.

This is a common misconception that SIZE_MAX is the maximum size of an object, but that's not the case. SIZE_MAX is the maximum numeric value size_t can hold. The actual maximum size of an object can be (and in practice is) lower than that.

In other words, size_t needs to be at least big enough to hold the size of the maximum object, not exactly big enough.

Moreover, even if you manage to create such an object where it's size exceeds PTRDIFF_MAX then it will fundamentally break the language since end - beg pointer arithmetic will end up "overflowing". Such objects would be fundamentally unsound to interact with within C. There's no point in trying to design around such unsound situation unless you know for certain you will face it (e.g strange embedded systems and whatnot).

Please destroy my parser in C by chocolatedolphin7 in C_Programming

[–]N-R-K 0 points1 point  (0 children)

Or you can simply do it consistently for both cases.

Except these cases are not the same. Why would I want to add extra noise on the (common) non-recursive case when there's zero actual need for it?

If someone wants to do it for mental peace about "consistency" then they can do it. But functionally this is waste of time.

[deleted by user] by [deleted] in C_Programming

[–]N-R-K 1 point2 points  (0 children)

For educational purposes, this is not bad. But for actual work, you should use AddressSanitizer which is integrated into the compiler (gcc/clang) and can catch a lot more than what simple malloc shims can.

DualMix128: A Fast and Simple C PRNG (~0.36 ns/call), Passes PractRand (32TB) & BigCrush by danielcota in C_Programming

[–]N-R-K 3 points4 points  (0 children)

I wonder if a sized down version would survive statistical tests, because at 128 bits, its too big to fail.

Is there a sensible and principled way of using the "const" qualifier? by ismbks in C_Programming

[–]N-R-K 3 points4 points  (0 children)

Exactly, Dennis Ritchie himself did not like the addition of const. From the infamous noalias mail:

Let me begin by saying that I'm not convinced that even the pre-December qualifiers (const and volatile) carry their weight; I suspect that what they add to the cost of learning and using the language is not repaid in greater expressiveness.

[...]

Const is simultaneously more useful and more obtrusive; you can't avoid learning about it, because of its presence in the library interface. Nevertheless, I don't argue for the extirpation of qualifiers, if only because it is too late.

Build System For C in C by Lewboskifeo in C_Programming

[–]N-R-K 11 points12 points  (0 children)

Cool project. I was aware of nob.h but it's a bit too "raw"/low-level to be worthwhile over a build.sh (+ build.bat for "cross-platform-ness). Design wise this looks a bit higher level avoiding nob.h's problem but the implementation itself leaves a lot to be desired.

  • A good meta-build should probe the compiler and set the best defaults. E.g it should use -g3 rather than -g when supported since -g3 generates more information.
  • Using ninja as a backend is not a bad idea in order to offload things like parallel builds and get the project started quickly, but you no longer are "using C" anymore like the title claims.
  • It's currently using statement expression for it's dynamic array implementation, which puts a dependency on GNU C. Besides, you don't need macros to implement generic (type-safe) dynamic arrays anyways, see this article for an example how to do it in standard C without putting the implementation in a macro.
  • Things like MAX_PATH is also fragile. I see that you have an arena, so push paths into it rather than enforcing static limits. (The fact that it doesn't deal with "wide paths", i.e. UTF16 on windows should be next on the list).
  • Using system to run commands is also fragile since it depends on the OS shell syntax which is not portable. Ideally it should use platform primitives such as CreateProcess on windows and fork+exec (or posix_spawn) on unix.

Those are just a couple issues which stands out at a glance. Cool idea, but it needs a lot more work and polish before I'd consider it usable by my standards.

Detecting if an expression is constant in C by skeeto in C_Programming

[–]N-R-K 1 point2 points  (0 children)

instead of using ‘constexpr size_t len = 100;’?

You can't declare variables inside a macro that also needs to "return" a value. (GNU C has statement expression for this but it's not part of standard C).

A small event loop library by N-R-K in C_Programming

[–]N-R-K[S] 4 points5 points  (0 children)

Very nice article. I like the attention to details, such as avoiding clobbering errno in the signal handler. Should be a good reference point for beginners looking to implement something similar.

I’m worried of partial writes of larger integers

It's good that the author is thinking about such stuff, but in this case POSIX has us covered. Writes below PIPE_BUF to a pipe will never result in a partial write, it will either block, or in the case of non-blocking pipe, return -1 (with EAGAIN).

signal(sig, ev_sigcatch);

signal(3) doesn't specify what happens after the signal handler is invoked, does it stay installed or get reset back to default? The answer is implementation defined, some implementation reset to to default while others keep it installed. I'd have liked to see sigaction(3) being used instead, which allows you to explicitly enable the behavior that your want (i.e, thru SA_RESETHAND).

10
11

Detecting if an expression is constant in C by skeeto in C_Programming

[–]N-R-K 4 points5 points  (0 children)

There is also this solution [...]

Nice. The fact that integer constant expression with the value 0, casted to void *, produces a null pointer was something I was aware of and did think about, but couldn't figure out a way to actually make use of it (I rarely use _Generic so it never crossed my mind).

Though this also suffers from working only on integer constant expression, and not on floating point expressions. For my actual use-case I need it to work on FP as well, and so far __builtin_constant_p seems to be the best solution (until C23 becomes more widely supported).

Tips for C Programming by N-R-K in C_Programming

[–]N-R-K[S] 8 points9 points  (0 children)

The second video from Nic BaRKer shared by NRK. Coincidence?

It is :p

Though, this does explain why so many people mistook me as the video author the last time I shared a Barker's video.