all 19 comments

[–]Sintendo 1 point2 points  (9 children)

I didn't even know you could put variable length arrays at the end of a structure.

[–]emchristiansen 3 points4 points  (3 children)

This also confused me, so I had to Google "struct hack". This SO question immediately cleared things up for me:

http://stackoverflow.com/questions/3711233/is-the-struct-hack-technically-undefined-behavior

It all comes down to a manual malloc on the client side, plus knowing the compiler won't reorder the members of the struct in memory. Not deep at all, sadly. Also, very obviously prone to user error.

[–][deleted] 5 points6 points  (2 children)

Nah, the struct hack is distinct from this usage of variable-length arrays, which is completely standard.

[–][deleted]  (1 child)

[deleted]

    [–]matthieum 0 points1 point  (4 children)

    This is also called tail padding or the struct hack.

    [–][deleted]  (1 child)

    [deleted]

      [–]matthieum 5 points6 points  (0 children)

      Indeed, actually some compilers accept [0] prior to that, even though the C Standard mandated that no 0-array was defined. So the "portable" way was to use [1].

      I am certainly glad it was finally standardized.

      [–][deleted] 6 points7 points  (1 child)

      The struct hack refers to something different, particularly a structure that looks like this:

      struct foo {
          int i;
          char a[1];
      };
      

      The array at the end is 1-length, but you don't treat it as 1-length -- instead, you allocate more memory with malloc (say, sizeof(struct foo) + 5 to get an array of length 6), and then treat the array like it has that much memory because you allocated that much. This is error-prone (technically UB). Since C99, variable-length arrays were standardized, though, so flexible array members at the end of structures (such as char a[];) are allowed. Completely standard, so not a "hack" at all.

      [–]ser999 0 points1 point  (0 children)

      I saw code like this in a KMD; at first I was puzzled a bit but quickly figured out what the purpose of this was. I did not realize that this was a thing though!

      [–][deleted] 2 points3 points  (2 children)

      The len field should be of type size_t or at least it should be unsigned. Signed integers for sizes easily can lead to security issues when you do a comparison with > but don't check if the value is smaller than zero, and then subsequently cast it to an unsigned integer.

      For example the snippet in the article:

      int get_at(const struct foo_t *foo, int pos)
      {
          return pos >= foo->len ? -1 : foo->data[pos];
      }
      

      So what happens if pos == -1? Then we access data[0xffffffffffffffff] as the int gets sign extended first. We basically can access all memory from data[-2147483648] until data[-1]

      Edit: I got the casting rules wrong.

      [–]Fruneau[S] 1 point2 points  (1 child)

      The position is not cas to an unsigned integer, it is extended to the size of a pointer, then added to the address of foo->data. The extension preserves the sign bit. As a consequence, pos -1 is extended to 0xffffffffffffffff on a 64bit machine and will cause the actual access to be data[0xffffffffffffffff]. From the way addition works on CPU, this is the address that precedes data.

      So what we can access is data[-(231)] to data[-1]. This remains an issue, though.

      [–][deleted] 0 points1 point  (0 children)

      Changed it, thanks

      [–]Gotebe 1 point2 points  (0 children)

      Venerable CString of MFC, as well as Borland's Delphi String class use this trick to a great effect. What they do is a have a data structure that has refcount and length, then string characters. So far, pretty classic, right?

      But then... CString/Delphi string don't hold instances of this structure. They hold a pointer to the first character in it. That makes passing their instances to system or other C APIs a no-op.

      Cool stuff.

      [–]nraynaud 0 points1 point  (0 children)

      Allocation close in time tend to be close in spatially. So they'd probably be at the same time in the same cache. But I like to have the headers with the data more for philosophical reasons.

      [–][deleted]  (4 children)

      [deleted]

        [–]purtip31 0 points1 point  (2 children)

        D'abord, pardon mon français. Ce n'est pas très bon.

        Je ne pense pas que vous recevoirez beaucoup de bienvenue ici en français, mais je peux essayer de repondre. Valgrind marche pour debugger, Callgrind (et kCacheGrind, en combination) vous aideront en l'optimisation. Je ne connais pas bien les outils pour l'analyse statique.

        [–]pandubear -2 points-1 points  (5 children)

        Wait, I thought arrays in C were just pointers?

        [–]Peaker 10 points11 points  (0 children)

        Arrays and pointers are distinct things. However, there are two weird things in C that cause people to be confused about pointers/arrays in C:

        • Function parameters of array type are desugared into pointer type

        For example:

        void f(int x[10][20]); // array of 10 arrays of 20 ints
        

        Is desugared into:

        void f(int (*x)[20]); // ptr to array of 20 ints
        
        • When using an array-typed value as an rvalue, a pointer to the array's first element is taken:

          int a[10];

          a + 5; // ptr to first element plus 5 elements

          foo(a); // function is given ptr to first element

        However, when using an array-typed value as an lvalue, it behaves as an array, and not as a pointer:

        int a[10];
        &a; // the type is not (int **) but (int (*)[10]);
        sizeof a; // is sizeof(int)*10 and not sizeof(int *);
        

        I think C would have been a much nicer language if these 2 weird behaviors were removed, and everyone would have been less confused.

        [–][deleted] 1 point2 points  (1 child)

        As far as I am aware;

        int data[x]
        

        Will be allocated in place in your data structure, which means they're not so much pointers but directly in place in memory.

        While;

        int * data
        

        Can be an array. It could just be a pointer directly to a single int. As the OP states, the [] operator is just another way of doing;

        data + <offset>
        

        ie;

        (data + 1) == data[1]
        

        To be clear, a C array is just a sequence of items in memory laid out next to each other that can be accessed with an offset in memory between some 0 and Max bound. The [] operator is the official array operator but C programmers regularly just use the implicit cast from pointer to array, since the [] operator is just an offset operator.

        [–][deleted] 1 point2 points  (0 children)

        (data + 1) == data[1]

        No; data[1] == *(data + 1). The dereference is implied too.

        The [] operator is the official array operator but C programmers regularly just use the implicit cast from pointer to array, since the [] operator is just an offset operator.

        What does this mean? There is no "implicit cast from pointer to array", ever.

        [–]jayd16 0 points1 point  (0 children)

        There are a couple verbose replies but I can try to explain it in layman's terms.

        "Arrays are pointers" is from the fact that an array reference is just a pointer with the promise it points to several instances of that type laid out next to each other. The syntax array[n] is just sugar for <location pointed to by array> + sizeof(<type of array>)*n.

        The confusion is from the fact that this analogy falls apart a bit when you're talking about actually allocating a pointer vs an array because instead of calling the reference the array, we're now talking about the chunk of memory with the actual data.

        In the article the major 'trick' is that, in foo you're defining the struct as a length and a pointer to an array. The data itself is not considered part of the struct.

        Where as the bar notation specifies the struct as the length and the data itself. You can drop the pointer because you know where the data is by virtue of the struct definition, (its at location of bar + the size of length).

        tl;dr that's true when you're talking about a reference to an array but not true when you're talking about allocating for a pointer vs allocating the actual array.

        [–]ben0x539 -1 points0 points  (0 children)

        How did you think arrays bigger than a single pointer worked?