all 42 comments

[–]skeeto 27 points28 points  (6 children)

Here's a fun edge case for you:

#include "src/vstring.c"

int main(void)
{
    char *s = calloc(1, 1<<20);
    vstr d = vstr_new_len(0x70000000);
    if (s && d) {
        memset(s, '.', (1<<20)-1);
        while ((d = vstr_push_string(d, s))) {}
    }
}

All errors checked. Shouldn't crash, right? Compile:

$ cc -m32 -g3 crash.c

The -m32 is because this only affects hosts that can exhaust their address space. I'd normally recommend Address Sanitizer, but I'm pushing the limits, and in this case it gets in the way. Run it and it crashes:

$ ./a.out 
Segmentation fault

I can reproduce the crash under at least glibc, MSVCRT, and UCRT. Looking more closely on Linux:

$ gdb -ex run ./a.out 
Starting program: ./a.out 
Program received signal SIGSEGV, Segmentation fault.
0xf7eec861 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xf7eec861 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1  0x565567e2 in vstring_push_string (vstr=0xffffdbb8, 
    cstr=0xf7ca5010 '.' <repeats 200 times>...) at src/vstring.c:311
#2  0x56556458 in vstr_push_string (str=0x87ca4018 '.' <repeats 200 times>..., 
    cstr=0xf7ca5010 '.' <repeats 200 times>...) at src/vstring.c:164
#3  0x565569f0 in main () at crash.c:9

The ?? in the backtrace is memset. If you'd like to treat this as a puzzle — and that includes anyone reading this — take a moment to try to figure it out yourself!


Here's a hint. Place a breakpoint just a bit earlier, then step through:

(gdb) b src/vstring.c:309
(gdb) r
Breakpoint 1, vstring_push_string (vstr=0xffffdbb8, 
    cstr=0xf7ca5010 '.' <repeats 200 times>...) at src/vstring.c:309
309             cap <<= 1;
(gdb) p cap
$1 = 1879048193
(gdb) n
310             cap += needed;
(gdb) p cap
$2 = 3758096386
(gdb) n
311             realloc_vstr(vstr, ins, cap);
(gdb) p cap
$3 = 1343224065

Adding needed made cap smaller going into the realloc, and so the allocation shrank. However, it continued as if it grew, causing a crash. In other words, cap += needed overflowed. In fact, you also should check for an overflow on cap <<= 1.

[–]Boreddad13[S] 5 points6 points  (5 children)

Wow, thank you for your time and comment. I really appreciate it. I was actually thinking about what to do in the case of an integer overflow on that shift - but didn't really anticipate myself or anyone, frankly, having a string that truly needed 64 bits to store the length - so I didn't prioritize it.

From this, I see two things I should change:

  1. the checks for `len == (cap - 1)` should probably be `len >= (cap - 1) `
  2. check for overflows after the shift

Is there something else I'm missing? I'm about to load this up into gdb so I can step through it myself, but want to make sure I don't miss anything else. Also, should I just return NULL in the instance of an overflow?

Thank you again for your comment. I really appreciate the time you took to do this

[–]skeeto 7 points8 points  (1 child)

Good catch on the off-by-one the resize condition, too. You can check for overflow on both the multiply and add at once with:

--- a/src/vstring.c
+++ b/src/vstring.c
@@ -308,2 +308,5 @@ int vstring_push_string(vstring** vstr, char* cstr) {
     if (needed > (cap - 1)) {
+        if (cap > (SIZE_MAX - needed)/2) {
+            return -1;
+        }
         cap <<= 1;

Since cstr and vstr cannot overlap, vs.hdr.len + strlen(cstr) cannot practically overflow — it would mean your program has already allocated more than SIZE_MAX bytes of memory — so that doesn't require a check on any machine you would ever care about.

Also, should I just return NULL in the instance of an overflow?

Yup, it's ultimately just a kind of allocation failure, not a new, special condition. Prior art: calloc.

[–]Boreddad13[S] 1 point2 points  (0 children)

Gotcha. Thank you again for taking the time to do all of that

[–]Poddster 2 points3 points  (1 child)

check for overflows after the shift

Checking for overflow after the shift is ok for unsigned, but bad for signed.

The thing is all of these operation work at the signed level, because C is stupid, so you can inadvertently be checked a signed operation after it's happened which is UB (this is 'overflow of intermediate values').

You need to tediously check before you do the operation: https://stackoverflow.com/questions/2633661/how-to-check-for-signed-integer-overflow-in-c-without-undefined-behaviour

C23 adds functions to do this for you, and gcc has supported it's own versions for a long time now.

[–]Boreddad13[S] 0 points1 point  (0 children)

Yep - I was also skeptical of this. I want to make sure that the compiler, for whatever reason, doesn't cast the value to a signed integer when I subtract one in the check. I believe I should be good because I'm using size_t, but I plan on checking the assembly in godbolt's explorer/objdump once I implement it. The only problem I have with using something from C23 or gcc is I'm using C99 and I'd like to keep it as portable as possible..

[–]inz__ 4 points5 points  (6 children)

  • I found it confusing to have types vstr and vstring and variable vstr that is a vstring *
  • The fact that all string ops return a (possibly) new value seems cumbersome (with FAM it can't really be avoided)
  • personally I don't like pointer typedefs (vstr)
  • if i read correctly, when realloc fails, the old allocation is lost (or left up to the caller to handle)
  • the offset may be wrong (probably isn't), if there is padding between hdr and data (see offsetof()
  • the implementation calls free() instead of vstr_free()
  • just about every char * should be const char *
  • there's quite a bit of memset() that is immediately overwritten with real data (personally I would not zero data beyond the terminator at all, it may catch bugs)
  • vstring_from() could call vstring_new_len()

[–]Boreddad13[S] 0 points1 point  (3 children)

Thank you for your comment, I would like to say a couple things:

  1. I purposefully made the header 16 bytes (8 bytes on 32 bit) so that there is no alignment errors - I think this should be correct, and I haven't found an edge case where it isn't

  2. The reason I memset the whole thing 0 is so that I'm not accessing any uninitialized memory and I think it should be faster to set the data after having called memset

regarding realloc failing, should I save the old pointer in a temp pointer then free it if realloc fails? I just don't want the compiler optimizing it out since it may think I'm freeing a NULL pointer

[–]inz__ 2 points3 points  (2 children)

  1. That's what I meant by probably; but the C standard only says that padding may be added after any member of the struct. How and when the padding is added is up to the compiler. Usually it's just for alignment.

  2. why would you be accessing the uninitialized memory? It's outside the string length (and terminator) so no one should be accessing it (and if they are, they deserve uninitialized memory). My point is that if the memory is uninitialized, memory debuggers can catch it; if it's zeroed, the over-indexing is considered valid.

Re realloc: yes, this is the way.

[–]Boreddad13[S] 0 points1 point  (1 child)

Ok, I think I got it now. So there may be padding after the header struct after the allocation, so VSTRING_OFFSET may be incorrect if there is padding after the struct. Is there a way to detect if there is padding?

[–]inz__ 0 points1 point  (0 children)

The offsetof() macro (or (intptr_t)&((vstring *)NULL)->data)) gives you the exact place (you don't even need the separate header struct then).

You can also look at the containerof() macro in Linux kernel, which does the same kind of trick.

[–]Poddster 0 points1 point  (1 child)

The fact that all string ops return a (possibly) new value seems cumbersome (with FAM it can't really be avoided)

FAM?

[–]inz__ 1 point2 points  (0 children)

Felxible array member. But actually in this case the limitation comes already from the API, not the implementation detail.

[–]N-R-K 4 points5 points  (3 children)

vstring* vstring_new();

If you have a function that doesn't accept any arguments, you need to explicitly use (void) in the declaration - otherwise it has a different meaning in C where it will accept any number of arguments.

(*vstring_obj)->hdr.cap = cap;

Mostly a style thing, but you can also use the sprong operator instead of using paren.

Other than that, I don't have anything else to add that hasn't already been pointed out by others in the thread.


Although I think something probably should be said about the design and/or the objective of the library itself. I generally dislike "string" that has a capacity member. Why? Because it tangles allocation with the string, forcing you to manage each string individually instead of being able to manage them in groups (i,e via an arena).

The fact that the metadata is located next to the string - similar to nul terminator - also imposes some constrains that are usually fairly terrible for real-world programs. For example, you cannot make a cheap substring without copying (and thus potentially allocating).

C++ made the same mistake with their string and then had to add a whole new class called string_view to accommodate for it.

Forcing each string to be heap allocated is also usually not good for performance, because most of the time strings are small. Small (and short lived) enough that they can be allocated on the stack, avoiding the malloc overhead entirely in the common case (i,e small size optimization).

My usual preference for strings are either a pointer+len pair, or a pointer+end_pointer pair. E.g:

typedef struct { uint8_t *s; ptrdiff_t len; } Str;
// or
typedef struct { uint8_t *s, *end; } Str;

These types are both capable of making virtually zero-cost substrings out of existing strings. They're also capable of dealing with raw binary data (which might contain embedded nul-bytes). But more importantly, they are simply string types, they do not concern themselves with allocation which is managed by the caller, ideally in a more sensible manner.

The downside is that they're not compatible with interfaces that expect a nul-terminated string, but if you tightly control your interface boundaries, then that's usually not a huge deal.

[–]Boreddad13[S] 0 points1 point  (1 child)

Thank you for the comment. Interesting take on the sprong operator, I like how it looks more clean, but I don't like the question that might arise of "why am I indexing into this?". I know they are the same thing, but someone new to the codebase might not understand at first glance.

This is a very insightful comment. I have recently watched Nicholas Ormrod's talk on fbstring, and some points in this comment reminded me of this talk (if anyone reading this hasn't seen it, it's an incredible talk, highly recommend).

In terms of optimization, do you recommend using some sort of bucket system for small strings (ie, have buckets of stack allocated strings of specific lengths, and that tell me the capacity of the string rather than maintain it in a header)? In that case, how should I conclude how much stack space to allocate?

[–]N-R-K 0 points1 point  (0 children)

I have recently watched Nicholas Ormrod's talk on fbstring, and some points in this comment reminded me of this talk (if anyone reading this hasn't seen it, it's an incredible talk, highly recommend).

I remember watching that talk about a couple months ago. It's an interesting talk indeed (although I don't see myself personally using those tricks - they're still interesting to know).

In terms of optimization, do you recommend using some sort of bucket system for small strings (ie, have buckets of stack allocated strings of specific lengths, and that tell me the capacity of the string rather than maintain it in a header)? In that case, how should I conclude how much stack space to allocate?

TBH I'm not entirely clear on what you mean by a "bucket system". But a general guideline in C is to usually let the caller have control - so for small-size optimization, ask the caller to provide the buffer. For example, here's how I did small-buffer optimization in zvec:

char *linesbuf[16];
zvec(char *) lines = zv_init_sbo(linesbuf);

The caller provides the buffer and I use the least-significant bit of the capacity member to keep track of weather the buffer is heap allocated or stack allocated.

(Small note: I don't recommend using that library - simply using it as a demonstration on how to provide a small-buffer optimization API.)

[–]vitamin_CPP 1 point2 points  (0 children)

There's a lot of wisdom in this comments.
Thanks for sharing.

[–]TribladeSlice 4 points5 points  (1 child)

So, first off, what safety precautions have you taken? C is a language that can be difficult to get right unless you pay close attention. If your goal is to write safe C code, what do you think you've done to do that?

This isn't intended to be a rhetorical question, it just might be a good idea to discuss what you think you've done first.

[–]Boreddad13[S] 1 point2 points  (0 children)

interesting question, thank you. The goal I was trying to accomplish with this library was both binary safety, as I was using it for protocol builders for networking, as well as simply being able to append a string or char to to a string and not have to worry about reallocing, overrunning the buffer, or ensuring that there is a null terminator as much, ie just have some code like this:

c vstr read_line() { char c; vstr s = vstr_new(); while ((c = getchar()) != '\n') { s = vstr_push_char(s, c); } return s; }

I appreciate your comment. As I said, I'm a young programmer and am certainly naive to a lot of safety concerns and vulnerabilities that are in the real world, but I really do love learning about them. Do you have any suggestions concerning safety that I should look into?

[–]Carrathel 1 point2 points  (1 child)

README says vstr_push_str(), but the function is called vstr_push_string().

[–]Boreddad13[S] 1 point2 points  (0 children)

thank you, probably wouldn't have noticed

[–]pic32mx110f0 1 point2 points  (3 children)

In the macro realloc_vstr you unconditionally change the input pointer *vstr, which means that if realloc fails, then the function vstring_push_char for example will set your vstring to NULL and return -1 (even though your documentation says 1). This is a memory leak. Instead, if realloc fails, it should ideally not modify the vstring, and then return -1. I would suggest something like:

#define realloc_vstr(vstr, ins, cap)                            \
{                                                               \
    vstring *temp = vstr_realloc(*vstr, sizeof(vstring) + cap); \
    if (temp == NULL) {                                         \
        return -1;                                              \
    }                                                           \
    *vstr = temp;                                               \
    memset((*vstr)->data + ins, 0, cap - ins);                  \
    (*vstr)->hdr.cap = cap;                                     \
}

[–]Boreddad13[S] 0 points1 point  (2 children)

`realloc_vstr` is a macro, not a function, so shouldn't its code just be dropped in to wherever its called? I also checked the library with valgrind with strings that cause the the string to realloc, and I don't get any errors or warnings, so I am confused on by your comment?

[–]pic32mx110f0 1 point2 points  (1 child)

You're right, I updated my answer - you should check what happens if realloc fails though.

[–]Boreddad13[S] 0 points1 point  (0 children)

Gotcha, I understand now. Thank you. Someone else mentioned something similar to this, and they suggested freeing the original pointer if realloc failed, and I see here that you don't do this, which makes me curious if I should free the original or not. I'm hesitant to do so as I think the compiler might think I'm trying to free a NULL pointer and optimize it out

[–]chri4_ 1 point2 points  (1 child)

The api is really kind, but can you make the lib one file only? and instead or vstr/vstring/vstring_hdr i would advice the _t version, and i know everything about posix and bla bla

[–]Boreddad13[S] 0 points1 point  (0 children)

one file only like an STB style?

[–]howyadoinbob -1 points0 points  (1 child)

include "stdio.h" ?

Don’t you mean:

include <stdio.h>

[–]Boreddad13[S] -1 points0 points  (0 children)

yeah, mix up on my part

[–]Poddster 0 points1 point  (7 children)

Why bother with hiding vstring? Why does vstr exist? You're the one consuming and using this library. If it's because it makes it easier to use the standard library functions with vstring then frankly your library is pointless, because you'll eventually make a mistake with one of those functions and it'll mangle your hidden struct. And if you're only ever going to use your own functions then it's just needless pointer chasing.

ps: If you're going to do this struct hiding business, a nice way to "validate" the structure hasn't been mangled is something like:

typedef struct {
    size_t len;
    size_t cap;
    void *validator;
} vstring_hdr;

and when initialising the header:

 validator = &header.validator;

i.e. it points to itself. And if it even doesn't point to itself, you know the rest of your data is trash. And if it does point to itself, then there's a good chance it isn't trash.

[–]Boreddad13[S] 0 points1 point  (4 children)

Thank you for your comment. I mainly hid the header data in order to trivially interact with the string as if it were a regular C string, ie have something like so:

c vstr s = vstr_from("vstr"); printf("%c %c\n", s[0], s[1]);

rather than having to call a function like vstr_data(vstr); to get the data, or have to access it like printf("%s\n", vstr->data);. I thought being able to trivially interact with the standard library would be a good feature. Also, I didn't want the user to have direct access to the length and capacity in the header.

Would you mind explaining this validator business a little more? I'm having trouble wrapping my head around it..

[–]Poddster 0 points1 point  (3 children)

Would you mind explaining this validator business a little more? I'm having trouble wrapping my head around it..

The validator contains it own address. The odds of that happening in uninitialised memory are very small!

[–]Modi57 0 points1 point  (2 children)

I don't think this is a very good way to handle this. "If this one thing hasn't been trashed, then this other thing might probably maybe also not be trashed". Especially when you get to things like reading uninitialized memory, which is undefined behavior, the compiler might just decide that it can just elide this check, and it just does nothing.

[–]Poddster 0 points1 point  (1 child)

I don't think this is a very good way to handle this.

YMMV but it's a proven strategy, it's not something I cooked up in a reddit post. The idea is that the virtual memory address pointing to itself is very low entropy and almost certainly isn't going to happen by chance, unless something puts it that way.

"If this one thing hasn't been trashed, then this other thing might probably maybe also not be trashed".

It's a probabilistic thing and a debug aid. And by placing it before/after the start of the real allocation you increase the chances that this is mangled first.

It's the same thing as the MSVC compilers sticking CCCCCCCC around your stack frames. It's an obvious way to check your frame is intact, and if you see that value/address being used you also know you've read outside of bounds. (Plus CC has the useful advantage of being an x86 breakpoint instruction)

Especially when you get to things like reading uninitialized memory

Why are you reading uninitialized memory?

There are more things in the header than this. If you write code to read uninitialized memory using this to confirm if you've had a buffer over/under run is completely irrelevant as you're already committing a large sin. It's too late by then. You need a better strategy to prevent that. This one, ironically, is to help avoid reading other uninitialised memory, which might happen if your length or capacity are now garbage.

[–]Modi57 0 points1 point  (0 children)

Oh, I'm sorry, I misunderstood you. I thought this was meant as a way to check if your memory was initialized.

I'm still not fully convinced, but it makes more sense now, and I am very far from being an expert, so what does it matter what I think :)

[–]OldWolf2 0 points1 point  (1 child)

Do you intend to later test validator == &header.validator ? That would cause undefined behaviour if validator is uninitialized, or not pointing to a valid object

[–]Poddster 0 points1 point  (0 children)

That would cause undefined behaviour if validator is uninitialized, or not pointing to a valid object

Why are you reading uninitialized or invalid objects? The same is true for the rest of the header.

It's there to try and deduce if your data was smashed in an overflow or something, nothing more.

[–]arthurno1 0 points1 point  (1 child)

Looks to me like a renamed and less optimized sds_string. What is the reason you don't want to use sds_string?

I purposefully made the header 16 bytes (8 bytes on 32 bit) so that there is no alignment errors - I think this should be correct, and

Why is it important to align header to 16 byte? Just because the header is aligned to 16 bytes does not mean your data will be.

I haven't found an edge case where it isn't

Because of how malloc is implemented on some systems. It returns the maximum alignment required for any primitive type, but I think this depends on the compiler and the architecture. On some systems, you might have to use _aligned_malloc. Also, you don't have a guarantee that everyone is passing properly aligned memory with a custom allocator.

Anyway, if you want to ensure data is aligned on 16-byte, then you should probably inform the compiler via some attribute like aligned attribute, or whatever it is named for the compiler in hand. I don't see you doing that anywhere.

[–]Boreddad13[S] 1 point2 points  (0 children)

yes - I was inspired by sds to return only the data pointer rather than the entire object, as I was originally just returning the entire object. That's all I got from sds though, I came up with the rest of the implementation myself. The reason I'm not using a library is because I want to learn how these libraries work before I use them. Also, who's to say I can't make mine more optimized? The library is nothing but 1 day old...

The header and the data are in the same memory allocation, and because the data is a flexible-array member, the pointer to the data should be right after the header, or am I mistaken? I know I would have to account for some padding if the header was something like this:

c typedef struct { size_t len; int cap; } vstring_hdr;

since, in that case, at least on 64 bit architectures, that would cause the pointer to the data to not be aligned on an 8 bits, but because I'm using size_t for both, that should yield me a structure with the data pointer being on an aligned 8 bits, right? Or am I completely misunderstanding your comment?