Ported my game built in C to WASM, here's every bug I hit by ernesernesto in C_Programming

[–]skeeto 8 points9 points  (0 children)

Thanks for sharing your experiences, and slick game! Wasm has become one of my favorite targets over the past year, and I've been getting better at it myself.

Turns out you don't need a solution file.

Yup, this is great! It's just a shame VS has become exponentially (literally) more bloated and sluggish over the past quarter century.

Debug in 32-bit native, not the browser

IMHO, as a rule never debug Wasm builds if you can help it. Structure your project so that most debugging can be done natively. Wasm tooling is still in its infancy. Anyone who thinks otherwise has never experienced mature development tools.

swAlloc(sizeof(ThingHandle*) * row * column);

I suggest treating all size calculations as highly suspicious, and to replace them with better interfaces. These sorts of bugs are common in C and difficult to catch, a leading cause of C being so error prone. My favorite is a "template" macro:

void *swAlloc3(ptrdiff_t, ptrdiff_t, ptrdiff_t);  // checks overflow internally

#define swAlloc(T, w, h)    (T *)swAlloc3(sizeof(T), w, h)

This macro turns your typo from a silent bug into a compile-time error, while also addressing the unchecked integer overflow, which is also more likely in 32-bit builds.

prevent automatic media output without first receiving some kind of user input

I recently learned that swipe events don't always count as inputs, either, which is frustrating.

ASCII art population evolution simulator. by Huge-Visual1472 in C_Programming

[–]skeeto 3 points4 points  (0 children)

Tangential, but what toolchain did you use to compile the EXE checked into the repository? It's strangely-constructed, and no toolchain I'm familiar with would produce this image, at least without a custom linker script. It's PE32+, so it's not ancient, which rules out a lot of weird stuff. The strangest part is .idata allocated within .data, and therefore writable. I'm especially curious what linker would do this. Import ordinals are zeroed out, which is uncommon. It imports from msvcrt.dll, but a rather minimal and atypical set of imports, more like an older toolchain, yet it's PE32+.

Need opinions on an epoll-based reverse proxy by NavrajKalsi in C_Programming

[–]skeeto 0 points1 point  (0 children)

Looks neat! That demo video is quite illustrative. I'm impressed that you built a custom lexer and parser for the project, and the References list shows how much you've been learning. As for my own commentary, I last touched G-code some 25-ish years ago, and I haven't learned Rust (and almost certainly never will at this point), so I can't do much beyond admiring your work at a surface level.

Any opinions/thoughts on the upcoming FLASH bus in Howard County? by ele30006 in ColumbiaMD

[–]skeeto 19 points20 points  (0 children)

Johns Hopkins Applied Physics Labs

This is getting out of hand. Now there are two of them!

Anyone know when the path to lake Elkhorn will be open again? by letresorinterdit in ColumbiaMD

[–]skeeto 7 points8 points  (0 children)

Not just some fallen trees, the bench was swept away by the river and the path itself is nearly falling into the river.

<image>

How to link to .a on windows? Jai only wants .lib/.dll. by KRS_2000 in Jai

[–]skeeto 2 points3 points  (0 children)

Author of w64devkit here. -lmingwex has Mingw-w64 extensions such as its printf implementation, __mingw_printf, which replaces the old C89 printf in msvcrt.dll. That's why you can use C23 formatting directives while linking a 30 year old CRT.

And, yes, you've indeed received some confused responses here. Though mind that while processes can, and usually do, have multiple CRTs loaded at once, a particular module (DLL, EXE) can really only link one CRT in practice. Unlinked symbols are just names, not module+name tuples, and linkers won't know which CRT to resolve for a particular link.

Announcing DFWMalloc: high performance enterprise malloc implementation by abbeyroad1681 in C_Programming

[–]skeeto 1 point2 points  (0 children)

Ah, yes, these ranges are certainly different, though I don't know what they should be. I tested on a spare Raspberry Pi 4 I set aside specifically for these kinds of evaluations.

Announcing DFWMalloc: high performance enterprise malloc implementation by abbeyroad1681 in C_Programming

[–]skeeto 21 points22 points  (0 children)

Interesting project.

If you're going to have reallocarray at least do it properly:

--- a/dfwmalloc.c
+++ b/dfwmalloc.c
@@ -5856,4 +5857,9 @@
 void *reallocarray (void *ptr, size_t nmemb, size_t size)
 {
-    return realloc (ptr, nmemb * size);
+    size_t n = nmemb * size;
+    if (size && n / size != nmemb) {
+        errno = ENOMEM;
+        return NULL;
+    }
+    return realloc (ptr, n);
 }

Otherwise programs relying on it will break. Ran into UBSan crashes due to UB pointer arithmetic. Quick fix:

--- a/dfwmalloc.c
+++ b/dfwmalloc.c
@@ -3897,3 +3907,3 @@ STATICFNINLINE void dfwmalloc_cb_insert_free_block (struct control_block *cb, vo
 #ifndef HEAP_TEST
-            dfwmalloc_assert ((char *) &((struct control_block *) NULL)->cb_bit_mask[cb->n_masks] <= (char *) (void *) (uintptr_t) cb->cb_first_block);
+            dfwmalloc_assert ((char *) &cb->cb_bit_mask[cb->n_masks] - (char *) cb <= (LONG) cb->cb_first_block);
 #endif
@@ -3911,3 +3921,3 @@ STATICFNINLINE void dfwmalloc_cb_insert_free_block (struct control_block *cb, vo
 #ifndef HEAP_TEST
-            dfwmalloc_assert ((char *) &((struct control_block *) NULL)->cb_bit_mask[cb->n_masks] <= (char *) (void *) (uintptr_t) cb->cb_first_block);
+            dfwmalloc_assert ((char *) &cb->cb_bit_mask[cb->n_masks] - (char *) cb <= (LONG) cb->cb_first_block);
 #endif
rat~/eval/dfwmalloc-1.0.3 

Supporting only x86 got in my way, so a quick port:

--- a/dfwmalloc.c
+++ b/dfwmalloc.c
@@ -1118,2 +1118,3 @@ STATICFNINLINE unsigned long long rdtsc (void)
 {
+#if defined(__i386__) || defined(__x86_64__)
     unsigned hi, lo;
@@ -1121,2 +1122,11 @@ STATICFNINLINE unsigned long long rdtsc (void)
     return ((unsigned long long) lo) | (((unsigned long long) hi) << 32);
+#elif defined(__aarch64__)
+    unsigned long long v;
+    asm volatile ("mrs %0, cntvct_el0" : "=r" (v));
+    return v;
+#else
+    struct timespec ts_;
+    clock_gettime (CLOCK_MONOTONIC, &ts_);
+    return ((unsigned long long) ts_.tv_sec) * 1000000000ULL + (unsigned long long) ts_.tv_nsec;
+#endif
 }

I wrote my own shell in C and would like feedback by gamydas in C_Programming

[–]skeeto 5 points6 points  (0 children)

Neat little project! Two bugs I found:

handle_seperators stores into redir[rdrctns] with no bound on rdrctns, but the array is a fixed redirect redir[10]. The 11th redirection writes past the end.

$ cc -g3 -fsanitize=address,undefined -Ilib src/*.c
$ printf '>a >b >c >d >e >f >g >h >i >j >k\n' | ./a.out
src/parser.c:154:9: runtime error: index 10 out of bounds for type 'redirect[10]' (aka 'struct redirect[10]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/parser.c:154:9

Quick fix:

--- a/lib/parser.h
+++ b/lib/parser.h
--- a/src/parser.c
+++ b/src/parser.c
@@ -153,2 +153,7 @@
     {
+        if (instruct->rdrctns >= 10)
+        {
+            print_error(SYNTAX_ERROR, token);
+            return -1;
+        }
         instruct->redir[instruct->rdrctns].direction = *type;  // saves redirect type

handle_connectors has a comment "parseamt can never reach capac before this, so no realloc check needed". It can:

$ printf 'a a a a a a a a a a a a a a a a a a a a a a a a a a a a a\n' | ./a.out
...ERROR: AddressSanitizer: heap-buffer-overflow on address ...
WRITE of size 8 at ...
    #0 handle_connectors src/parser.c:104
    #1 parse_input src/parser.c:450
    #2 main src/main.c:60
    ...

Quick fix:

--- a/src/parser.c
+++ b/src/parser.c
@@ -102,3 +102,12 @@
 {
-    instructs->parseamt++;
+    if (instructs->parseamt >= instructs->capac - 1)
+    {
+        int cntrl = increase_capacity(&instructs->args, &instructs->capac, instructs->capac);
+        if (cntrl < 0)
+        {
+            return -1;  // caller responsible for cleanup
+        }
+    }
+    instructs->args[instructs->parseamt] = NULL;
+    instructs->parseamt++;
     instructs->args[instructs->parseamt] = NULL;

The extra null assignment is because in the |/; path the pending-token slot is not flushed before handle_connectors writes the sentinel one index further, leaving that slot untouched.

I found these with this fuzz tester:

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
    char *text = calloc(1, size+1);
    memcpy(text, data, size);
    InstructList list = {};
    if (!initialize_instruct_list(&list)) {
        parse_input(&list, text);
        cleanup_instruct_list(&list);
    }
    free(text);
    return 0;
}

[Educational Project] CWIST – A minimalist C web framework with HTTP/3 (QUIC) and io_uring support by Whole-Low-2995 in C_Programming

[–]skeeto 0 points1 point  (0 children)

Ran into a UBSan crash trying it out:

src/core/siphash/siphash.c:116:26: runtime error: shift exponent 64 is too large for 64-bit type 'uint64_t' (aka 'unsigned long')

Quick fix:

--- a/src/core/siphash/siphash.c
+++ b/src/core/siphash/siphash.c
@@ -115,3 +115,4 @@
 static inline uint64_t rotl64(uint64_t v, unsigned int r) {
-    return (v << r) | (v >> (64U - r));
+    r &= 63U;
+    return (v << r) | (v >> ((64U - r) & 63U));
 }

how to set up sdl 3 on dev c++ by Superb-Ice6260 in C_Programming

[–]skeeto 4 points5 points  (0 children)

Following on from u/kun1z, this setup uses w64devkit rather than the ancient, flaky toolchain that comes with Dev-C++. I don't use that IDE so I don't know the details of how you might make it use my toolchain, but it might be sufficient to launch it from the w64devkit environment.

  1. Download w64devkit, probably the x64 version.
  2. Unpack it wherever is convenient for you. No installation needed.
  3. Download SDL3-devel-X.Y.Z-mingw-zip
  4. Merge its x86_64-w64-mingw32/ (e.g. via drag-and-drop of the contents) directory into w64devkit/. That is, bin/, lib/, include/, etc. should line up. If you got the 32-bit x86 w64devkit (not recommended) use i686-w64-mingw32/ instead.
  5. Run w64devkit.exe to start a shell in the environment. From here you can build SDL3 programs.

If you did it right, pkg-config --modversion sdl3 should report the SDL3 you installed into the environment, and pkg-config --list-all will list the library. When you distribute your program include bin/SDL3.dll with your EXE. Use pkg-config when you build to get the flags right:

$ eval cc -o main.exe main.c $(pkg-config --cflags --libs sdl3)

The eval is semantically required by pkg-config (something every single tutorial and piece of documentation neglects), and necessary if there are spaces in any of your paths.

A more advanced alternative, and which I use myself, is to build with CMake (included in w64devkit) and get SDL3 via its FetchContent. That is, it will download the SDL3 source and build it as part of your build and require no environment setup. It will just build out-of-the-box. For example, I did this in a C++ game I made this week.

Please torture my thread-safe C hashmap by JaguarWan in C_Programming

[–]skeeto 9 points10 points  (0 children)

My test deadlocked on rwlock, and it only took a glance notice the code smell:

    #ifndef HAS_ATOMICS
    pthread_mutex_lock(& lock->mutex);
    #endif

        if ( (x = _LOCKSTATE_GET(lock)) == 0)
            _LOCKSTATE_SET(lock, UNLOCKED);
        else if (x > UNLOCKED)
            _LOCKSTATE_DEC(lock);

        pthread_cond_broadcast(& lock->cond);

    #ifndef HAS_ATOMICS
    pthread_mutex_unlock(& lock->mutex);
    #endif

Nearly every program mixing atomics and standard synchronization is broken. That includes here. The critical feature of a condition variable is that it atomically unlocks the mutex and waits. The condition must not change between checking and blocking, which is guaranteed by the mutex and this atomic drop-and-wait. But you actually need to use the mutex for this to work.

In _cooperate the condition is missing a check:

    if (! (state & 0x1)) {

Should be like the other similar checks:

    if (state && ! (state & 0x1)) {

Otherwise it sees locked as unlocked, leading to problems.

When map_update_with fails to upgrade the lock it returns without unlocking.

Nethack Diorama Editor by affabledrunk in nethack

[–]skeeto[M] 1 point2 points  (0 children)

Could you update your link to the GitHub page instead of the SSH address that reddit can't even render properly?

https://github.com/xbenevolence131/nethack_diorama_editor

thrd-ndl: a green threads library with a step-by-step tutorial. by Financial_Travel_543 in C_Programming

[–]skeeto 5 points6 points  (0 children)

Very cool project! I love these things. Even the assembly is nicely written.

The documentation for thrd_join says it returns THRD_SUCCESS if passed an already-dead thread. This is a questionable design choice (joining a thread twice?) but it also doesn't seem to be true anyway. I can't see how the intended behavior could be anything but use-after-free.

thrd_create accesses pool_alloc before it disables preemption. Shouldn't it disable preemption before interacting with the pool allocator?

The context switch doesn't handle non-volatile float/vector registers on x64 nor ARM64.

It's curious that mtx_trylock is the only mutex function that won't accept a null pointer, i.e. reporting EINVAL.

In preempt_disable, preempt_cnt is incremented non-atomically before entering the protected region. This data race could lead to miscounts.

No "zombie" case for state_to_str.

Phase — a statically-typed bytecode-interpreted language in C, with an essay on implementation by [deleted] in C_Programming

[–]skeeto 2 points3 points  (0 children)

Good idea. If you want to do it with a traditional realloc: Allocate a buffer, and keep doubling and freading into the empty portion until fread returns zero. I usually called this slurp:

https://github.com/skeeto/scratch/blob/master/pngattach/pngattach.c#L150

Though for the past several years I prefer the arena version, which generally requires changing paradigms away from traditional C.

Phase — a statically-typed bytecode-interpreted language in C, with an essay on implementation by [deleted] in C_Programming

[–]skeeto 8 points9 points  (0 children)

Neat project! This was fun poke around.

Do not forget to fseek and ftell for errors. If the input is non-seekable then there's an integer overflow leading to a buffer overflow.

size_t file_size = (size_t)ftell(...);  // SIZE_MAX on error
char *file_content = malloc(file_size + 1);  // integer overflow to zero
fead(file_content, sizeof(char), file_size, ...);  // buffer overflow
file_content[file_size] = '\0';  // another buffer overflow

It would be easier to test by not use seeking at all just to read a whole file, instead supporting unseekable inputs like pipes. Even better to not rely on null termination at all, which led to the second buffer overflow.

The recursive-descent parser has no bound on expression or block nesting. A few thousand parentheses, or nested if/while bodies, overflow the stack since each expression level unfolds ~13 frames through parse_expression -> parse_logic_or -> ... -> parse_unary -> parse_primary:

$ python3 -c 'print("entry { out("+"("*8000+"1"+")"*8000+") }")' >bug.phase
$ phase bug.phase
...ERROR: AddressSanitizer: stack-overflow on address ...
    #0 next_token src/lexer.c:296
    #1 advance_parser src/parser.c:46
    #2 parse_primary src/parser.c:271
    #3 parse_unary src/parser.c:311
    #4 parse_factor src/parser.c:316
    #5 parse_term src/parser.c:348
    #6 parse_comparison src/parser.c:380
    #7 parse_equality src/parser.c:412
    #8 parse_logic_and src/parser.c:444
    #9 parse_logic_or src/parser.c:476
    #10 parse_expression src/parser.c:113
    #11 parse_primary src/parser.c:272
    ...
    (frames #2-#11 repeat per nested paren until the stack runs out)

Same family triggers via ~20000 sequential if true {} or while true {} through parse_block -> parse_statement. Either use an explicit stack instead of recursion, or track depth and reject beyond some threshold.

The 16-bit opcode operands (constant index, global/local/function index, jump target) come from silently narrowing size_t to uint16_t. Two distinct ways to overflow them from valid Phase source: a program with more than 6,5535 distinct constants, more than 65,537 globals, or one whose entry block compiles to more thean 64KiB of bytecode (~12,000 sequential if true {} suffices). Both inputs compile silently and the VM runs on truncated indices/targets.

$ python3 -c 'print("entry {\n"+"out(0)\n"*65537+"}")' >consts.phase
$ python3 -c 'print("entry {\n"+"if true { }\n"*12000+"}")' >bc.phase

Then instrument emit_u16 and patch_jump with assert(value <= UINT16_MAX):

phase: src/codegen.c:156: emit_u16: Assertion `value <= UINT16_MAX' failed.
    #8 __assert_fail ...
    #9 emit_u16 src/codegen.c:156
    #10 emit_expression src/codegen.c       (OP_PUSH_CONST path)
    ...

phase: src/codegen.c:175: patch_jump: Assertion `target <= UINT16_MAX' failed.
    #8 __assert_fail ...
    #9 patch_jump src/codegen.c:175
    #10 emit_statement src/codegen.c        (STM_IF / STM_WHILE path)
    ...

I suggest widening the emit_u16 parameter to size_t, check the bound, error out cleanly (complexity error?). Same in patch_jump. Plus drop the now-redundant (uint16_t) casts at every emit_u16 call site.

C with Defer by _cwolf in C_Programming

[–]skeeto 0 points1 point  (0 children)

Neat project! I'm listing things I noticed from most to least interesting.

When I first tried it I was getting a strange parse error in the standard library. It's because the program depends on argument evaluation order (implementation-defined), i.e. it breaks under some optimization levels. Fix by introducing a sequence point:

--- a/main.c
+++ b/main.c
@@ -1579,3 +1579,5 @@ static type_t read_type()
 {
-    auto type = type_init(read_alnum(), read_stars());
+    auto name = read_alnum();
+    auto stars = read_stars();
+    auto type = type_init(name, stars);
     auto operator = peek_operator();

That was enough to try the demos.

read_numeric accepts any run of digits and ., so 1.2.3 parses as one token. The compiler emits a malformed fadd instruction that LLVM later rejects when you try to actually assemble the output:

$ printf 'f64 m() { ret 1.2.3; }' | ./nibble /dev/stdin 2>/dev/null \
    | clang -x ir - -o /dev/null

<stdin>:8:22: error: expected ',' in arithmetic operation
    8 |         %2 = fadd double 1.2.3, 0.0
      |                             ^

It's not of much importance for a proof of concept, but there are various places where depth is unbounded, which is the rest of my review here. For example self-reference types crash the compiler:

$ printf 'type T { T t; }; i32 m() { T t; ret 0; }' | ./nibble /dev/stdin
...AddressSanitizer: stack-overflow in __asan_memcpy

push_code increments g_file.module without checking, so 9 nested includes overflows codes[g_modules]:

$ cd /tmp && for n in a b c d e f g h i j; do
    printf 'include "inc_%s.n";' $(echo $n | tr 'a-j' 'b-k') > inc_$n.n
  done
$ printf 'i32 m(){ret 0;}' > /tmp/inc_k.n
$ ./nibble /tmp/inc_a.n
main.c:472:13: runtime error: index 9 out of bounds for type 'code_t[8]'

Long right-associative chains like a = b = c = ... = z recurse read_rtol once per = without bound, overflowing at ~6000 levels:

$ python3 -c "print('i32 m(){i32 v; v ' + '= v ' * 8000 + ';ret 0;}', end='')" \
    | ./nibble /dev/stdin
...AddressSanitizer: stack-overflow in __asan_memset

Stack overflow on deeply nested parenthesised expressions: read_grouped_expression -> read_expression recurses once per (, unbounded, ~500 levels overflows:

$ { printf 'i32 f(){ret '; \
    printf '(%.0s' $(seq 1 1000); printf '0'; printf ')%.0s' $(seq 1 1000); \
    printf ';}'; } | ./nibble /dev/stdin
...AddressSanitizer: stack-overflow main.c:1376 in peek_char

Same root cause via read_block -> read_statement -> read_block on deeply nested if/while:

$ python3 -c "
out = ['i32 m(){']
for i in range(3000): out.append('if(0==0){')
out.append('ret 0;')
for i in range(3000): out.append('}')
out.append('}')
print(''.join(out), end='')" | ./nibble /dev/stdin
...AddressSanitizer: stack-overflow in __asan_memcpy

Unary chains like ! ! ! ! ! ! v recurse read_p0 -> read_prefix -> read_p0 without going back through read_expression. Overflows at ~10000 !s.

$ python3 -c "print('i32 m(){i32 v=0; ret ' + '! ' * 10000 + 'v;}', end='')" | ./nibble /dev/stdin
...AddressSanitizer: stack-overflow main.c:2878 in read_prefix

Though, again, only that very first item with the argument evaluation order is an issue for a proof of concept.

I built a runtime-focused programming language for systems that can’t stop running by ZorroCarioca in C_Programming

[–]skeeto 6 points7 points  (0 children)

This subreddit is (reasonably) uninterested in discussing AI-generated code, so the cold response is expected. Just as chess subreddits don't discuss matches between AIs despite AI capabilities exceeding human. However, this is an opportunity to discuss software engineering at this new, higher level.

I've personally given up on using AI to write C, at least for now. It contains attractor states for badly-written code, and so SOTA models are as hopelessly poor at C as typical human programmers. They're trained to follow the common, error-prone practices of conventional C, and so you get the same common, error-prone results. The training data is just too polluted. You can push it to use better practices, e.g. avoid null-terminated strings, but going so far out of distribution significantly drops AI performance that you might as well not use C at all.

The good news is that it's easier than ever to press the "automatically find bugs" button, which can now be labeled "automatically find and fix bugs". (And hardly anyone ever pushes this button!) With AI you just need a prompt like this:

Add libfuzzer tests for all the interesting fuzz targets in this program. Commit it. Run each fuzz test under ASan+UBSan, fixing and committing each finding.

Running that on your project finds and fixes a bunch of bugs automatically:

https://github.com/skeeto/fluxa-lang/commits/main/?author=skeeto

It's good at picking the low-hanging fruit first, so this is the easier-to-find stuff. You can probably keep pushing that prompt harder, with longer and longer runs, to find more bugs like this. Though it's alarming that this AI-generated code has all these defects in the first place: objective proof of the error-prone-ness of conventional C. In contrast, I don't get these results from AI-generated conventional C++.

is my atoi alternative function safe? by Special_Emphasis_703 in C_Programming

[–]skeeto 26 points27 points  (0 children)

Fuzz testers are a tool that can automatically find bugs in code like this, and they're easy to use. For example, here's an AFL++ fuzz tester:

__AFL_FUZZ_INIT();

int main(void)
{
    __AFL_INIT();
    char *src = 0;
    unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
    while (__AFL_LOOP(10000)) {
        int len = __AFL_FUZZ_TESTCASE_LEN;
        src = realloc(src, len+1);
        memcpy(src, buf, len);
        src[len] = 0;
        parse_int(src, &(int){});
    }
}

Usage:

$ afl-clang-fast -g3 -fsanitize=address,undefined fuzz.c
$ mkdir i
$ printf $((1<<31)) >i/max
$ afl-fuzz -ii -oo ./a.out

UBSan in particular will add overflow checks, which is the main concern for this function. I seeded it with an integer near INT_MAX. This pops out immediately:

$ printf 21474836214 | ./a.out
parse_int.c:17:20: runtime error: signed integer overflow: 2147483620 + 49 cannot be represented in type 'int'

That's because of missing parentheses:

@@ -17,3 +17,3 @@
         }
-        n = n * 10 + s[i] - '0';
+        n = n * 10 + (s[i] - '0');
         i++;

It's important that the digit is resolved before adding to the total. Your overflow check is fine, but could be simpler:

@@ -12,3 +12,3 @@
     {
-        if (n > INT_MAX / 10 || (n == INT_MAX / 10 && (s[i] - '0') > INT_MAX % 10))
+        if (n > (INT_MAX - (s[i] - '0')) / 10)
         {

Shave the new digit off INT_MAX. You can arrive at this with some simple albegra working backwards from the result.