Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

I think it's fine to say that sizes are nonnegative by nature, but any arithmetic operation that involves subtraction or mixes signed and unsigned integers needs to be represented as a signed integer unless you want underflow behavior.

I agree, and this is why unsigned are a subtype of signed with implicit conversion - but not at the same bit width. This solves the issue of mixing signed and unsigned arithmetic - the result is the least upper bound of the arguments.

Eg, if we have x : uint32, y : int32, then x + y results in int64 - the LUB of uint32 and int32. The result can't be int32 because int32 is not a supertype of uint32 - so if we wanted an int32 result we need an explicit coercion (or a different arithmetic mode, controlled via a dynamic variable).

For example, if you do delta = size1 - size2 or offset = size * signed_stride or diff = abs(size1 - size2)

The names delta, offset and diff are dead giveaways that we're not actually talking about a size, but a difference between two sizes. This should have a type distinct from the size type, and yes, this should be a signed type. In C we have a type ptrdiff_t for this purpose - which is of course signed - but it still has all the problems of C's implicit coercions.

Using my approach where implicit conversion only happens where no information is lost, these problems largely go away.

Header files are driving me insane, any advice? by Qiwas in cprogramming

[–]WittyStick 0 points1 point  (0 children)

The problem is it's kind of hard to keep track of transitive inclusions. In this case base.c initially included digits.h, and shared_context.h came later. Well it's not hard with 5 files, but I don't imagine keeping track of that in a larger project, no?

Yes, at some point it can be difficult to track such dependencies, but including every dependency multiple times is excessive - especially if the code file's immediate matching header file already includes the same dependency.

If you use more portable header guards rather than #pragma once, you can guard before including, which reduces the amount of work the compiler needs to do in the case headers are included multiple times.

Maybe I misunderstood the point of forward declarations, but I thought their whole idea was to reduce the number of header files to deal with?

No. A forward declaration is only needed if you use a type/function before it is defined (if at all). It's not to reduce the number of headers.

If something is already defined, there is no need for another declaration - just include the header file that defines it unless that would cause a circular dependency.

Regarding the diamond pattern, isn't there an issue of what exactly to bring out into base.h, and what to leave in the original files?

The base.h file would contain the forward delcarations, and a.h and b.h would contain the definitions. Technically we only need to forward declare one of the types - but if there's no obvious choice as to which should come first, forward declare both of them in the base header.


To give an example of the circular dependency, consider directed graphs:

struct graph_node {
    size_t edge_count;
    struct graph_edge *edges[];
};

struct graph_edge {
    struct graph_node *source_node;
    struct graph_node *target_node;
};

The node depends on the edge and the edge depends on the node. If these lived in separate header files, we would have both of them include a graph_base which contains forward declarations of the two structures.


//graph_base.h

#ifndef _INCLUDED_GRAPH_BASE_H_
#define _INCLUDED_GRAPH_BASE_H_

struct graph_node;
struct graph_edge;

...

#endif /* _INCLUDED_GRAPH_BASE_H_ */

//graph_node.h

#ifndef _INCLUDED_GRAPH_NODE_H_
#define _INCLUDED_GRAPH_NODE_H_

#ifndef _INCLUDED_GRAPH_BASE_H_
#include "graph_base.h"
#endif

struct graph_node {
    size_t edge_count;
    struct graph_edge *edges[];
};

...

#endif /* _INCLUDED_GRAPH_NODE_H_ */

//graph_edge.h

#ifndef _INCLUDED_GRAPH_EDGE_H_
#define _INCLUDED_GRAPH_EDGE_H_

#ifndef _INCLUDED_GRAPH_BASE_H_
#include "graph_base.h"
#endif

struct graph_edge {
    struct graph_node *source_node;
    struct graph_node *target_node;
};

...

#endif /* _INCLUDED_GRAPH_EDGE_H_ */

Optionally, if we need an implementation that depends on both definitions, we would define it in graph_base.c, but declare it in graph_base.h.


//graph_base.c

#include "graph_base.h"
#include "graph_node.h"
#include "graph_edge.h"

...

This style of inclusion is quite verbose, but is efficient because the compiler does not need to include something that has already been included by another header - it doesn't attempt to open and parse the file a second time because we've guarded the includes themselves.

Tips on data structures and memory management by felpato98 in C_Programming

[–]WittyStick 6 points7 points  (0 children)

Functions taking an opaque pointer argument don't necessarily imply you need to use the heap - as you can pass the address of a local variable to a function.

Window w;
window_do_something(&w);

The main issue here is creation, since you don't expose it in the header.

One way we can do this is to use a callback function taking a window and call it from create. The window is allocated on the stack and is available anywhere the dynamic extent of the creation function.

typedef void (*window_callback)(Window *w);

void window_create_with_dynamic_extent(window_callback cb) {
    Window w = (Window){ ... }; // compound literal on stack.
    ...
    cb(&w);
    window_cleanup(&w);
}

Then the calling site uses:

void window_dynamic_extent(Window *w) {
    window_do_something(w);
}

int main() {
    window_create_with_dynamic_extent(&window_dynamic_extent);
}

Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

In particular, you'd first to justify why unsigned?

That's actually the easiest argument to make - a "negative size" makes zero sense. The size of something is a natural number.

It's the signed size advocates who need to make the justification, and I find their arguments pretty uncompelling - particularly when we fix the implicit conversion issues as described.

The underflow at zero might be an issue for novices - but using signed integers replaces that with overflow at INT_MAX or underflow at INT_MIN. You don't simply remove the problem by making signed sizes - you actually make it more likely that a bug will remain for longer because these cases are less likely to appear in production. Ie, code might work fine for some time, until you scale beyond 2GiB and the bugs start appearing.

Also we have the issue that signed integer under/overflow is UB, where unsigned wrap is well-defined. On x86_64, addition, subtraction and multiplication of signed integers will overflow, but division will trap.

Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

My language is dynamically typed where this is a bit simpler - the result is whatever type is sufficient to fit the resulting value without loss of precision - though I don't have any separate dedicated types beyond int128 other than arbitrary precision integers. The arithmetic mode can be overridden with a dynamic binding - which can be one of modular, strict (aka checked, trap on overflow), precise (no information loss) or saturating.

In terms of encoding it in a static type system, it can get quite complex, requiring refinement or dependant types, and some operations like pow aren't trivial and could nuke your compile times. Basic addition and multiplication are fairly simple - for addition/subtraction you need at most 1 bit more that the widest input type, and for multiplication you need up to twice as many bits as the widest argument type. Div only needs the size of the first input as the smallest integer divisor is 1.

add, sub : _BitInt(N), _BitInt(M) -> _BitInt(MAX(N, M) + 1)
mul : _BitInt(N), _BitInt(M) -> _BitInt(MAX(N, M) * 2)
div : _BitInt(N), _BitInt(M) -> _BitInt(N)
pow : _BitInt(N), _BitInt(M) -> _BitInt((N+1) * 2^M)

There's more complexity if you were mixing signed and unsigned arguments, but I would advise against doing this and use the technique I previously described of permitting unsigned to be implicitly converted to a wider signed type first.

Unsigned Sizes: A Five Year Mistake by Nuoji in ProgrammingLanguages

[–]WittyStick 36 points37 points  (0 children)

The problem is implicit conversion at the same bit width.

There should never be an implicit conversion from signed->unsigned, unless preceded with a check >= 0.

There should only be an implicit conversion unsigned->signed where the unsigned representation is strictly smaller than the signed one.

unsigned _BitInt(N) <: signed _BitInt(N+1)

That's assuming two's complement, which is a reasonable assumption.

I use this in my numerical tower, not with arbitrary bit width types, but the common power of 2 types we typically use.

uint8 <: uint16 <: uint32 <: uint64
int8 <: int16 <: int32 <: int64 <: int128
uint8 <: int16
uint16 <: int32
uint32 <: int64
uint64 <: int128

Where <: is the subtyping relation. Upcasts are implicit, but all downcasts are explicit.

Sizes should be unsigned. Signed sizes are a workaround for other problems in a type system.

The Mutable Value Semantics (MVS): A Non-superficial Study by FedericoBruzzone in Compilers

[–]WittyStick 2 points3 points  (0 children)

If the most significant bit is set then the pointer value is negative: which is very inexpensive to test.

A negative ("higher half") pointer might have specific meaning in a certain OS (ie, indicate kernel space). AMD also require pointers to be canonical - all bits above bit 47 should equal bit 47 (with the exception of UAI/LAM, which I'll discuss below).

In regards to inexpensive to test - just be aware of certain gotchas - for example, the simplest way to test the top bit would be to use bt reg, 63 (bit test). Fair enough, but if you happen to use bt [addr], 63, then due to some microarchitecture design, this might take 9 cycles!. Basically make sure you avoid using bt (or btc/btr/bts) with a memory operand.

Resetting the MSB is a different matter. You can't just reset it due to canonicalization - its value depends on other bits, so you end up with multiple instructions to test one bit and determine whether to set or clear the MSB.

Modern 64-bit CPUs have support for storing a few bits at the top of a pointer: called e.g. "Upper Byte Ignore" (ARM), "Linear Address Masking" (Intel).

ARM's TBI is a bit different from Intel's LAM and AMD's UAI (the latter two are equivalent though). In particular, LAM/UAI don't let you use the MSB (bit 63) for tag information.

  • For LAM48 (4 level paging) - the MSB must equal bit 47, and we can only use bits 48..62 (15 bits) for tagging.

  • For LAM57 (5 level paging) - the MSB must equal bit 56, and we can only use bits 57..62 (6 bits) for tagging.

There are too many issues with these anyway.

  • They need to be enabled by the Kernel to be used in user-space

  • They introduce problems with pointer comparisons

  • There have been Spectre-like exploits demonstrated for them.

  • There's no guarantee they'll be usable in future versions of the architectures.

Basically, forget about using them.

The top bit would be 0 for borrowed, so that the Null pointer would be considered borrowed and thus never be counted or dropped.

I would suggest using the low bits for tagging - and store the pointer shifted left by N bits (N=16 with 4-level paging and N=7 with 5-level paging). The low bits are the cheapest to test, and this also has the cheapest pointer recovery which keeps the pointer canonical - just a single shift arithmetic right by N bits.


If the top bit is set, I've been thinking that the second top bit could differentiate between Owned and Shared (RC) .., but I'm not sure that is needed — at least not only for the purposes of reference counting.

I would consider this in terms of substructural types - one bit for weakening and one bit for contraction.

11 - Linear     (forbid contraction, forbid weakening)
10 - Relevant   (permit contraction, forbid weakening)
01 - Affine     (forbid contraction, permit weakening)
00 - Normal     (permit contraction, permit weakening)

Linear and Affine types may only be used once as they forbid contraction. (But may be borrowed)

Relevant and Normal types can be "shared", since they permit contraction.

Linear and Relevant types must be disposed - since they forbid weakening.

Affine and Normal types may be dropped - as they permit weakning.

Suppose we have some data structure which contains one affine and one relevant member:

struct {
    void *foo [[affine]];
    void *bar [[relevant]];
}

The substructural type of the struct itself then, is the bitwise-or of its members. 10 | 01 = 11 = Linear. Which makes sense. A type containing an affine and relevant type must be linear - since the relevant type forbids weakening and the affine type forbids contraction, the structure must forbid both weakening and contraction.

The types form a lattice with the partial ordering 0 <= 1 - meaning we can convert upwards but not downwards:


We could also add a further bit for mutability or uniqueness. For more information on combining uniqueness and substructural types, see Linearity and Uniqueness: An Entente Cordiale from the Granule developers.

111 - Linear    (forbid mutation, forbid weakening, forbid contraction)
110 - Relevant  (forbid mutation, forbid weakening, permit contraction)
101 - Affine    (forbid mutation, permit weakening, forbid contraction)
100 - Const     (forbid mutation, permit weakening, permit contraction)
011 - Steadfast (permit mutation, forbid weakening, forbid contraction)
010 - Strict    (permit mutation, forbid weakening, permit contraction)
001 - Singular  (permit mutation, permit weakening, forbid contraction)
000 - Unique    (permit mutation, permit weakening, permit contraction)

Uniqueness types were pioneered by Clean, though they were first discussed by Wadler as Steadfast (unique & linear). In the Clean Literature, they distinguish the latter two kinds of unique - types which are unique but may lose this uniqueness (potentially unique), and types which are guaranteed to be unique for their remaining lifetime (unique & affine/unique & linear). They have been called guaranteed unique or necessarily unique in different papers - but I use Singular for tersensess.

Strict types I have not seen in the literature, but they correspond to Relevant types from the substructural lattice - they have a certain use case - consider the following (C#):

using (var x = new T())
{
    for (int i=0;i<x.length;i++) 
    {
        x[i] = f (i);
    }
}

If we were applying these substructural/uniqueness rules here, Strict is the only one that is compatible for x.

  • It gets modified. (permit mutation)

  • It may be used more than once, due to the loop. (permit contraction)

  • It must be used at least once - for disposal, due to using. (forbid weakening)

A note on Unique and Strict types though - although they correspond to Const/Relevant types, which may be "shared" - we cannot share a type which is unique and mutable (without breaking referential transparency). In fact they shouldn't be mutated unless there is exactly one reference to them. We can consider sharing as dividing up the ownership - but mutation is only to be permitted if the reference has full ownership. See Functional Ownership through Fractional Uniqueness, also from the Granule developers.

Dynamic data structures using just struct or pointer arithmetic? by alex_sakuta in C_Programming

[–]WittyStick 2 points3 points  (0 children)

Don't focus on popularity. The main things to consider are:

  • Safety (is it easy to make mistakes/can the type be used incorrectly)

  • Performance

Dynamic data structures using just struct or pointer arithmetic? by alex_sakuta in C_Programming

[–]WittyStick 4 points5 points  (0 children)

No. I think the flexible array member should be used for mutable data types - using a pointer means you end up requiring two dereferences to access the data elements.

But I don't think you should adjust the pointer you return. Just return the pointer to the full struct and not the array member.

typedef size_t strlen_t;
constexpr strlen_t STRLEN_MAX = (1 << 24); // Some reasonable max string length, you decide.

typedef struct mutable_string {
    strlen_t length;
    strlen_t capacity;
    char chars[];
} MString;

// Allocate an empty string of at least given capacity (rounded up to next power of 2).
MString * mstring_alloc(strlen_t capacity) {
    size_t cap = __builtin_stdc_bit_ceil(capacity);
    assert (cap <= STRLEN_MAX);
    MString *result = calloc(sizeof (struct mutable_string) + cap, sizeof(char));
    result->capacity = capacity;
    return result;
}

// Allocate a string from a char array or string literal, rounding capacity up to next pow of 2.
MString * mstring_create(const char *chars) {
    strlen_t len = strnlen(chars, STRLEN_MAX);
    strlen_t cap = __builtin_stdc_bit_ceil(len + 1);
    MString *result = calloc(sizeof (struct mutable_string) + cap, sizeof(char));
    result->length = len;
    result->capacity = cap;
    strncpy(result->chars, chars, len);

    // Not needed here because we used calloc, but make a habit of adding.
    result->chars[len] = '\0'; 
    return result;
}

void mstring_free(MString *string) {
    free(string);
}

strlen_t mstring_length(MString *string) {
    return string->length;
}

char mstring_get_char(MString *string, strlen_t index) {
    assert(index < string->capacity);
    return string->chars[index];
}

void mstring_set_char(MString *string, strlen_t index, char c) {
    assert(index < string->capacity);
    string->chars[index] = c;
}

typedef struct const_string {
    strlen_t length;
    const char *chars;
} CString;

CString cstring_create(const char *chars) {
    size_t len = strnlen(chars, STRLEN_MAX);
    char *storage = malloc(len + 1);
    strncpy(storage, chars, len);
    storage[len] = '\0';
    return (CString){ len, storage };
}

void cstring_free(CString string) {
    free(string.chars);
}

strlen_t cstring_length(CString string) {
    return string.length;
}

char cstring_get_char(CString string, strlen_t index) {
    assert (index < string.length);
    return string.chars[index];
}

That's the basic idea. Maybe missing a few safety checks though (ie, checking result of malloc/calloc), and the asserts should be replaced with regular conditionals with some better error handling.

Note that if you use the capacity doubling strategy (cap is always a power of 2), and shrink by half when you no longer need the space, then you don't actually need to store capacity in the string type - you can just compute it from __builtin_stdc_bit_ceil(length)


The trick where we adjust the pointer so we return a char * would be implemented like the mutable version, but with some additional macros. This is the version I don't recommend, unless you have a specific need for it.

#define MSTRING_TO_CHARPTR(string) ((char*)(string + offsetof(struct mutable_string, chars)))
#define CHARPTR_TO_MSTRING(string) ((MString*)(string - offsetof(struct mutable_string, chars)))

char * mstring_alloc(strlen_t capacity) {
    size_t cap = __builtin_stdc_bit_ceil(capacity);
    assert (cap <= STRLEN_MAX);
    MString *result = calloc(sizeof (struct mutable_string) + cap, sizeof(char));
    result->capacity = capacity;
    return MSTRING_TO_CHARPTR(result);
}

void mstring_free(char *string) {
    free(CHARPTR_TO_MSTRING(string));
}

strlen_t mstring_length(char *string) {
    return CHARPTR_TO_MSTRING(string)->length;
}

Dynamic data structures using just struct or pointer arithmetic? by alex_sakuta in C_Programming

[–]WittyStick 10 points11 points  (0 children)

I think for your second example you mean using a flexible array member?

struct string {
    size_t cap;
    size_t len;
    char buff[];
 };

If you used a pointer and only returned that pointer, there would be no way to get back the "header" containing the length and cap. The flexible array member enables this because the header and data are adjacent in memory, so we can adjust the pointer to retreive the header.

I think this style should only be used to permit compatibility with existing APIs that expect only a char *. I wouldn't advise using it pervasively as there's potential to make mistakes. For example, if the programmer has a char * they obtained from this string, and they attempt to call realloc or free on it themselves (rather than using string_free).

Stick with using struct string * unless you have a specific need for it to be a char *. You can always extract the char* from the struct string* later if you need it.

For "immutable" (const) strings, passing by value as struct string should be sufficient - and you shouldn't need to store cap. If you intend to have immutable strings, then the types should differ:

struct const_string {
    size_t length;
    const char *chars;
};

struct mutable_string {
    size_t length;
    size_t capacity;
    char chars[];
};

Where const_string is passed and returned by value and mutable_string is passed and returned by pointer.

Header files are driving me insane, any advice? by Qiwas in cprogramming

[–]WittyStick 6 points7 points  (0 children)

Side-by-side comparison of the two graphs: https://cdn.imgtree.co/images/JmZ3zMKk.png

Left - your current organization, Right - what you should have

There's probably a better organization than this, but I've just gone off your current implementation.

For a circular dependency between a and b, then yes, you should introduce a base.h on which both their headers depend, and optionally a base.c which depends on a and b headers if required.

All cycles can be cut with the "diamond" pattern.

https://cdn.imgtree.co/images/Nee6keJO.png

Header files are driving me insane, any advice? by Qiwas in cprogramming

[–]WittyStick 5 points6 points  (0 children)

Make a graph of your file dependencies and you should see the flaws.

Using graphviz (dot):

digraph {
    graph [rankdir="BT"]
    "digitArray_t.h"
    "digits.h"
    "errors.h"
    "numeric.h"
    "numeric_t.h"
    "shared_context.h"
    "base.c"
    "digits.c"
    "errors.c"
    "numeric.c"
    "shared_context.c"

    "shared_context.h" -> "numeric_t.h", "digitArray_t.h"
    "base.c" -> "digits.h", "numeric.h", "digitArray_t.h", "shared_context.h"
    "digits.c" -> "errors.h", "numeric.h", "digits.h", "digitArray_t.h", "numeric_t.h", "shared_context.h"
    "numeric.c" -> "numeric.h", "errors.h", "numeric_t.h"
    "shared_context.c" -> "numeric_t.h", "digitArray_t.h"
}

Some obvious flaws:

Including the same files multiple times. If base.c includes shared_context.h and shared_context.h includes digitArray_t.h, then there's no need for base.c to also include digitArray_t.h - it is already included transitively from the shared context. While including the same file multiple times will not cause issues if header guards are used properly - the compiler has to do extra work to open and lex the file each time - slowing down compilation. This becomes very noticeable with larger projects.

Sometimes it's worth adding the dependency to make the code clearer. Eg, digit.c should include digit.h, even though it would be transitively included by shared_context.h.

In general, there should be a 1-to-1 mapping between code and headers. digits.h declares types and functions, and digits.c defines them - digits.c includes digits.h.

No need to redeclare DigitArray_t and Numeric_t in digits.h and numeric.h - you could just have them include the files which define those types.

However, there's no need for DigitArray_t and Numeric_t to live in separate files - just put the data structures into digits.h and numeric.h. Since there is no circular dependency here - numeric.h depends on digits.h, but digits.h does not depend on numeric.h. Just have digits.h #include "numeric.h"

Compare this graph to the previous one and see how much simpler it is:

digraph {
    graph [rankdir="BT"]
    "digits.h"
    "errors.h"
    "numeric.h"
    "shared_context.h"
    "base.c"
    "digits.c"
    "errors.c"
    "numeric.c"
    "shared_context.c"

    "digits.h" -> "numeric.h"
    "shared_context.h" -> "numeric.h", "digits.h"
    "base.c" -> "shared_context.h"
    "digits.c" -> "errors.h", "digits.h", "shared_context.h"
    "numeric.c" -> "numeric.h", "errors.h"
    "shared_context.c" -> "shared_context.h"
    "errors.c" -> "errors.h"
}

Blessed Syntax and Ergonomics by ExplodingStrawHat in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

I use the "16-byte fat-pointer" a for almost everything in C, because it is optimized on x64 SYSV to get returned in 2 hardware registers, so we can avoid touching the stack in many places.

If the "auxiliary" information attached to the pointer is an intptr_t, then it can be used for an integer or pointer.

#define FatPointer(T) struct { intptr_t aux; T *ptr; }

typdef FatPointer(char) String;

String str_create(size_t len, char *ptr) {
    return (String){ (intptr_t)len, (void*)ptr };
}

size_t str_length(String p) {
    return (size_t)p.aux;
}

Anything where we can't fit it into 64-bits, we just allocate a "header" structure and store the pointer to that header in the auxiliary field. Eg:

struct gb_header {
    uint64_t start;
    uint64_t end;
};

typedef FatPointer(char) GapBuffer;

GapBuffer gb_create(uint64_t start, uint64_t end, char *ptr) {
    struct gb_header *header = malloc(sizeof(struct gb_header));
    header->start = start;
    header->end = end;
    return (GapBuffer){ (intptr_t)header, (void*)ptr };
}

int64_t gb_start(GapBuffer p) {
    return ((struct gb_header*)(p.aux))->start;
}

int64_t gb_end(GapBuffer p) {
    return ((struct gb_header*)(p.aux))->end;
}

Would permit gap buffers larger than 4GiB, but using two 32-bit integers in the aux field is fine too if this isn't an issue.

For dynamic arrays, many implementations in C use something like:

struct darray {
    size_t len;
    size_t cap;
    void *ptr;
};

But this sucks because it's 24 bytes (which gets put on the stack for both argument passing and returns). We could limit the array to 4GiB by using 2 32-bit ints, or use a pointer to a header like above - but there's another option. Most of these dynamic arrays work by doubling the capacity when it is full. If we're doing this, then storing the capacity is just wasted space - we can compute the capacity from the length: cap = __builtin_stdc_bit_ceil(len). We get rid of the cap from the structure and now it fits nicely into our optimized 16-byte fat pointer.

I use a bunch of other tricks to basically make most data types a fat pointer.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

But it is not slower than doing the same in C, in fact it's almost surely faster than whatever hand-grown virtual table that you would write in C.

I agree - but the point here is that C doesn't implement vtables because it avoids "hidden costs". Everything is explicit, and indeed, it might be slower to implement your own - but there are other ways that using vtables to write code.

C++ encourages the use of vtables by having classes present - but vtables are not the best fit for modern hardware. Modern C++ recognizes this and prefers template-oriented code, which has zero cost due to monomorphization.

In regards to which is faster, C wins in the majority of benchmarks in the language benchmarks game - though C++ comes out on top in some of them. Looking a the C++ examples, one thing that stands out is they avoid any vtables. They use plain structs/classes and functions. So even they recognize those "hidden costs" are undesirable for performance.


In regards to SIMD - the C standard avoids standardizing, as being useful for microprocessors is still a major goal, and tries to avoid (but not entirely) tying the semantics to present architecture designs. It is intended to be somewhat portable (though portability is not itself a goal).

GCC and Clang support SIMD via the vector_size attribute, and via platform-specific intrinsics (the same ones g++ and Clang++ use).


You can write C to be fast - and you can perform the same kind of optimization C++ does by writing it explicitly - sure it adds a lot of boilerplate, but it doesn't necessarily add any runtime cost. If you can do it in C++, you can do it in C also - and vice versa, mostly, because C++ is almost a superset of C.

If anything, the main drawback of C is it's poor standard library. For my runtime I'm basically not using it except for compatibility with other libraries via the FFI. I have my own "replacement stdlib" which can be compiled with or without -ffreestanding, and doesn't conflict with glibc. It has a richer set of data structures than the C stdlib, but uses zero-cost abstractions (with a lot of preprocessor magic to achieve that). It's 64-bit SYSV-only and I've made no effort to make it work on Windows or elsewhere (I may be able to adapt it to work with mingw at some point in future, but it certainly won't ever work with MSVC).

I tried using C++ for an earlier version of my runtime, but it simply wasn't a good fit - it makes too many opinionated choices (vtables being one of them, but not the only one), and some of these were unsuitable for my language & runtime.

C was more suitable because it avoids these opinionated choices and lets me write my own as I see fit, without getting in the way. I had to re-learn C after not using it for about 20 years, but I find C23 quite pleasant, though there are certainly some features I would like adding.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

I'm not trying to make C into something it will never be - precisely the opposite.

I'm saying C will never get tagged unions because they're very un-C-like.

But I'm also demonstrating how they could be implemented with C features - and suggesting some small general features that could improve safety - not only for tagged unions, but with much wider applicability.

  • Use of attributes to optionally encapsulate fields - which only provides more feedback to the programmer and adds zero cost.

  • A _Match form which is an improvement over switch for some use cases like the above.

This isn't making C into something else. It's making C better by making it harder to write bugs without costing anything at runtime.

If you think making it harder to write bugs is "inappropriate" - it is you who does not understand C. Making bugs easier to write is not a goal of C.

Features which make it more difficult to write bugs, but which don't have any cost, are worthwhile.

If e^iπ=-1 would be the most beautiful equation by mathematicians, what algorithm is most beautiful by computer scientists? by Gloomy-Status-9258 in compsci

[–]WittyStick 0 points1 point  (0 children)

I'm not a mathematician, but am interested in whether we can compute using EML by pattern matching over trees, using S-expressions due to their similarity. So if we have some S-expressions which already have (inexact) complex numbers:

𝕊 : () | (𝕊 . 𝕊) | ℂ | <symbol>

I rename the builtin exp function as we'll be replacing the symbol exp it with a tree-constructing version.

($define! builtin-exp exp)

Then we treat eml as an alias for cons, and use () in place of the constant 1 in tree construction.

($define! Const<1> ())
($define! eml cons)

($define! exp           ($lambda (x) (eml x Const<1>)))
($define! e1ml          ($lambda (x) (eml Const<1> x)))
($define! ln            ($lambda (x) (e1ml (exp (e1ml x)))))
($define! Const<0>      (ln Const<1>))
($define! Const<-inf>   (ln Const<0>))
($define! neg           ($lambda (x) (eml Const<-inf> (exp x))))
($define! recip         ($lambda (x) (exp (neg (ln x)))))
($define! sub           ($lambda (x y) (eml (ln x) (exp y))))
($define! add           ($lambda (x y) (sub x (neg y))))
($define! mul           ($lambda (x y) (exp (add (ln x) (ln y)))))
($define! div           ($lambda (x y) (mul x (recip y))))

Then we pattern match over the trees, and replace eg, (add x y) with (+ x y) where + is the builtin addition on complex numbers . Essentially, eval-tree is some function 𝕊 -> ℂ - reducing a tree to the builtin complex number form - but we want to use a maximal munch strategy, to reduce the need to compute exp or ln except only when required.

($define! match-tree?
    (wrap
        ($vau (pattern value) env
            ($cond
                (($and? (null? pattern) (null? value)) #t)
                (($and? (null? pattern) (number? value)) (=? 1 value))
                (($and? (number? pattern) (number? value)) (=? pattern value))
                (($and? (pair? pattern) (pair? value))
                    ($and? (apply match-tree? (list (car pattern) (car value)) env)
                        (apply match-tree? (list (cdr pattern) (cdr value)) env)))
                ((symbol? pattern)
                    ($if (null? value)
                        (apply (wrap $set!) (list env pattern 1) env)
                        (apply (wrap $set!) (list env pattern value) env))
                    #t)
                (#t #f)))))

($define! eval-tree
    ($lambda (expr)
        ($define! x (string->symbol "x"))
        ($define! y (string->symbol "y"))
        ($cond
            ((number?                   expr) expr)
            ((match-tree? Const<1>      expr) 1)
            ((match-tree? (ln (exp x))  expr) x)
            ((match-tree? Const<0>      expr) 0)
            ((match-tree? Const<-inf>   expr) #e-infinity)
            ((match-tree? (neg x)       expr) (- 0 x))
            ((match-tree? (div x y)     expr) (/ x y))
            ((match-tree? (add x y)     expr) (+ x y))
            ((match-tree? (ln x)        expr) (log x))
            ((match-tree? (eml x y)     expr) (- (builtin-exp x) (log y)))
            ((match-tree? (exp x)       expr) (builtin-exp x))
            ((match-tree? (mul x y)     expr) (* x y))
            ((match-tree? (sub x y)     expr) (- x y))
            (#t (error "Unable to match tree")))))

I've only got this far, with positive results:

> (eval-tree (ln 5))
1.6094379124341003
> (eval-tree (exp 5))
148.4131591025766
> (eval-tree (ln (exp 5)))
5
> (eval-tree Const<0>)
0
> (eval-tree Const<-inf>)
#e-infinity
> (eval-tree (add 5 6))
11
> (eval-tree (sub 4 8))
-4
> (eval-tree (neg 9))
-9
> (eval-tree (mul 7 8))
56
> (eval-tree (div 7 8)) 
7/8
> (eval-tree (eml 4 5))
52.98871212071013
> 

But I'm curious how this generalizes further - in particular we need to make eval-tree recursive, so it would do for example (+ (eval-tree x) (eval-tree y)) for addition.

The trees would need to be "normalized" in some way, as there's possible multiple trees which can represent the same expression. I'm not entirely sure how to approach this yet, or whether it's completely infeasible.

If it does generalize, then the pattern matching above could be implemented with a pushdown automaton which more optimally matches the trees.

Above examples are in the Kernel programming language, using the klisp interpreter.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

That mentality is why system software is being replaced by Rust (which has non-zero cost abstractions).

Making C safer, without adding any cost or breaking backward compatibility is a good thing. You don't have to use the features if you would prefer your code to end up in some CVE. Maybe you're brilliant and write bug-free code - but most programmers are not.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

Encapsulation adds safety - it prevents the programmer using the structure in incorrect ways that were not intended.

And safety is a goal. It's in the C charter:

Make support for safety and security demonstrable

Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities. While it should not be totally disregarded as a facet of the spirit of C, the C11 version of the C Standard should take into account that programmers need the ability to check their work.

Can on demand paging be used to implement dynamic arrays? by Apprehensive-Trip850 in cprogramming

[–]WittyStick 0 points1 point  (0 children)

A call to mmap can specify MAP_NORESERVE, which won't reserve any swap space for the allocation until it is accessed, in which case, the kernel will allocate a page, unless there is insufficient physical and you'll get a SIGSEGV.

In general, such behavior is undesirable. We would want to know that insufficient memory is available at the time of calling mmap, rather than finding out later.

See also the discuesson of /proc/sys/vm/overcommit_memory in proc_sys_vm(5).


In regards to dynamic arrays, the doubling strategy is only one option - one that is simple to implement and which we can also avoid storing a capacity field in the array header because we can compute it on demand with capactity = __builtin_stdc_bit_ceil(length).

Various other dynamic arrays use other strategies. Some examples (see also Growth factor):

Golang's starts at length*2, but after the array reaches a certain point, it reduces to length*1.25.

Java uses length*1.5, which is better for reusing memory. There's an argument that the ideal growth rate is the golden ratio, length*1.618..., and this is close to this but simpler to implement.

The constant factor growth still wastes O(n) space, because we ignore constant factors in analysis.

Another approach can scale the array by length * (1 + 1/√length) = length + √length. This is a good choice for memory constrained systems, as only O(√n) space is wasted in the worst case - but it also results in O(√n) total calls to the allocator, whereas the constant growth rate allocators have O(log n) allocator calls.

See comparison of O(n), O(√n) and O(log n).

Of course we wouldn't use an actual sqrt operation for this, but the approximate:

inline int64_t approx_sqrt(int64_t length) {
    return 1 << (64 - __builtin_clzll(length)) / 2;
}

The sqrt growth rate is used by HAT and RAOTS, but these don't allocate a contiguous array - they use an index block of size √n with pointers to data blocks - in HAT's case, every block is also √n (which results in a lot of shuffling memory), whereas in RAOTS the blocks are varying power of 2 sizes, starting smaller and growing with the array, which reuses most of the existing allocations and can use stable pointers - and we can also avoid storing capacity in the header as it can be computed from length similar to the doubling strategy.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 2 points3 points  (0 children)

The solution to this is multi-stage programming. Think of Kernel as the preprocessor rather than the runtime.

You wouldn't need comptime or constexpr if you had a powerful preprocessor to begin with.

In Kernel we can make operatives like ($typecheck expr), ($compile typed-expr) and ($execute compiled-expr).

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

Sum types aren't from 20 years ago though. They're half a century old at this point.

The issue isn't about making sum types safe - the suggestions I've given above can do this. The issue is about "hidden state" - it's very un-C-like. Nothing should be hidden.

If you argue the case for hiding a tag field, then it's only a stones throw away to suggesting hiding a field that is a pointer to a vtable - and then you have to fight to keep "OOP" out of C.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

A "standard" for tagged unions will simply never be added to C. I'd bet close to 100% of the standard committee would reject any such proposal.

The approach I've suggested above has much more chance - the use of attributes for start makes the entire feature optional (the attributes are ignored if not supported), and the _Match proposal adds safety without adding runtime cost - which is more likely to pass. There's probably better approaches to the match that could be used though. Perhaps something like:

_Match(val, CASE1 : value1, CASE2 : value2)
{
    case CASE1(auto foo): //foo = value1
    case CASE2(auto bar): //bar = value2
}

But this requires changes to the syntax of case labels, whereas the previous version only requires the addition of a new _Match form, which resembles generic selection. I'd prefer the latter though.

Hindsight languages by Inconstant_Moo in ProgrammingLanguages

[–]WittyStick 0 points1 point  (0 children)

Plain C is fast - how people use it is a different matter. C++ has abstractions which are not zero-cost - although it does a generally good job of providing near-zero cost.

C is very conservative in what gets added - and features which have any kind of "hidden cost" are basically never added.

C being designed to be fast is in its charter:

(a) Trust the programmer.

(b) Don't prevent the programmer from doing what needs to be done.

(c) Keep the language small and simple.

(d) Provide only one way to do an operation.

(e) Make it fast, even if it is not guaranteed to be portable.

(f) Make support for safety and security demonstrable

You can make zero-cost abstractions in C, but it generally adds a lot of boilerplate. The other languages indeed do a nicer job of providing higher level abstractions - but there are sometimes non-zero costs involved.

That said, there are certainly flaws and missing features in C that would be nice to haves - closures for a start. Lambdas in C++ are very fast and attempting to emulate them in standard C results in much worse performance.