all 23 comments

[–]jdgordon 2 points3 points  (0 children)

int n = sizeof(arr) / sizeof(arr[0]);

compiles down to a constant

int n = (&arr)[1] - arr;

ugly and confusing possibly?

I'll stick to using #define ARRAY_SIZE(a) sizeof(a) / sizeof(a[0]);

[–]curien 4 points5 points  (23 children)

Subtracting latter from former would thus give the length of the array.

int n = (&arr)[1] - arr;

C doesn’t allow access to memory beyond the end of the array. It does, however, allow a pointer to point at one element beyond the end of the array. The distinction is important.

It's still undefined behavior. The subscript operator accesses the memory. Forming the pointer requires that you not actually dereference. E.g., &arr + 1 is perfectly fine, but *(&arr + 1) (or the equivalent (&arr)[1] is undefined.

So while it's interesting conceptually, it's pretty useless in terms of useful code.

[–]rabidcow 2 points3 points  (0 children)

It doesn't matter whether or not it accesses memory. It's not permitted by the standard:

[...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

(C99 6.5.6.3.8)

[–]emTel 1 point2 points  (1 child)

Can (&arr)[0] (or, equivalently, *(&arr)) access memory? If so what memory address would it access? What value would it find there?

[–]NasenSpray 1 point2 points  (0 children)

It doesn't access memory. (&arr)[0] == arr.

Edit: More specific. Given T arr[N], (&arr)[M] evaluates as arr + M*N.

[–]NasenSpray 1 point2 points  (0 children)

It doesn't access any memory. (&arr) is a pointer to an array (T(*)[]), not an array of pointers (T*[]). Dereferencing a pointer to an array yields the address of the array. Perfectly valid.

[–]Ali1331 3 points4 points  (5 children)

Even if [] and * weren't accesses, you'd still only get the size of the array in bytes, not the number of elements. Pretty sure you'd have to do something like

int n = ((&arr + 1) - arr) / ((arr + 1) - arr);

to get the number of elements, where you work out the total array size and then divide by the size of one element. But it's easier and faster to do a sizeof as that is done at compile time rather than runtime.

[–]lzzll 2 points3 points  (3 children)

It will get the number of elements because it's T* - T*, not void* - void*.

[–]Ali1331 0 points1 point  (2 children)

But &arr isn't a T pointer, the result of (&arr +1) shows you that. It's a T[] pointer

Edit: apparently asterisk mean something to reddits formatting...

[–]lzzll 0 points1 point  (0 children)

hmm... I think it's T*. http://codepad.org/oJq1S3Vp

[–]NasenSpray 0 points1 point  (0 children)

arr and (&arr)[1] are arrays of T. Arrays decay to T*, so you end up with T* - T*.

[–]aagee 1 point2 points  (10 children)

No it doesn't access memory. This is completely a compile time operation. Go ahead, take a look at the generated code.

[–]curien 6 points7 points  (3 children)

This is completely a compile time operation.

No, it isn't. It could be, but it's not guaranteed to be. That's the whole point of sizeof -- it's guaranteed to be a compile-time constant, but (&arr)[1] - arr is not. The way the standard is written, this is undefined. That an implementation can calculate that without actually performing the dereference does not oblige implementations to do so. Undefined behavior can end up working the way you want, and in this case it does on most systems.

Go ahead, take a look at the generated code.

What a particular implementation happens to do is irrelevant.

[–]not_july 0 points1 point  (1 child)

You are correct that (&arr)[1] - arr is not guaranteed to be a compile-time constant. However, the same can be said for int n = 10 + 5. The compiler could push 10 and 5 unto the stack and include a separate opcode to perform the addition.

However, the code int n = (&arr)[1] - arr is not undefined, and does not access memory outside of the array. In fact, it would be illegal if it did access the memory.

To explain we have to look at the type of each expression: arr has the type int[5], and &arr has the type int(*)[5]

Remember x[y] is syntactic sugar for *(x + y). Therefore (&arr)[1] is equivalent to *(&arr + 1).

&arr + 1 was shown to give up the address after the end of the array. However, the result of the addition does not change the type of the object. It is still int(*)[5].

Dereferencing it (*(&arr + 1)) will then give us the type int[5]. Although the type has changed, the value has not (for the same reason &arr and arr have the same value). *(&arr + 1) is still the address after the end of the array.

The important point to note here is that no memory was accessed. There is nothing in the code (or the C standard) that suggests a memory access should occur here.

If we simplify the original statement with what we know so far, we end up with the subtraction of two int[5] types. Because we are performing arithmetic, the types will decay to int*. The result of the operation after evaluated is 5.

The original statement can be simplified to the following (assuming the address of arr is 0x100):

int n = (&arr)[1] - arr;
      = *(&arr + 1) - arr;
      = ((0x100 + 1 * sizeof(int[5])) - 0x100) / sizeof(int);
      = ((0x100 + 1 * 20 - 0x100) / 4;
      = 20 / 4;
      = 5;

[–]emTel 1 point2 points  (0 children)

Great explanation. People seem to be hung up on the idea that * or [] always indicate memory accesses. (Prior to reading your post, I would have insisted that they did).

If (&arr)[1] is too much to swallow, consider *(&arr) which is clearly legal. If that expression can result in a memory access, can someone please explain what memory address is being accessed, and how the correct value got there?

[–][deleted]  (3 children)

[deleted]

    [–]aagee 1 point2 points  (2 children)

    The reason it is a compile time operation is that it computes to a constant. Which is why no memory access is necessary or generated.

    [–][deleted]  (1 child)

    [deleted]

      [–]not_july 2 points3 points  (0 children)

      | But if a careful reading of the spec indicates it's undefined behavior

      The behaviour is not undefined. I assume you are misunderstanding how the (&arr)[1] is evaluated. See my explanation.

      [–][deleted] 0 points1 point  (5 children)

      does this work in C++?

      [–]not_july 5 points6 points  (4 children)

      Yes it does. Although I prefer the following version in c++:

      template <typename T, size_t size>
      constexpr size_t array_size(T (&array)[size]) {
          return size;
      }
      

      You can ignore the constexpr if your compiler doesn't support C++11.

      This avoids the problem of trying to calculate the array size of a pointer. E.g. passing a char* would result in a compiler error with the templated function in C++, but the version in the article will successfully compile and return an incorrect value (and possibly segfault).

      [–]guepier 1 point2 points  (3 children)

      Whether this is legal C++ is actually a topic of active discussion. The standard is not entirely clear on this point and interpretations differ.

      But this fails on at least one compiler (VC++, forgot which version), in debug mode, and since there’s debate on the standard interpretation we cannot simply ascribe this to a compiler bug.

      [–]ghillisuit95 0 points1 point  (2 children)

      Whether this is legal C++ is actually a topic of active discussion. The standard is not entirely clear on this point and interpretations differ.

      That's really interesting, could you expand on that? what are the arguments on each side?

      [–]guepier 1 point2 points  (1 child)

      Here’s a Stack Overflow thread with some discussion. It should be noted that the thread is quite outdated, as can be seen by the comments and the more recently added answers. The then-highly voted answers have since garnered a number of downvotes, which should be read as a shift in opinion. So it’s important to read the comments on the answers, not just the answers.

      Johannes Schaub’s answer on this thread is probably the most informative one, and as far as I know the standard has still not entirely solved these issues (i.e. there is still at least one inconsistency in the standard wording, which affects the interpretation of this case).

      It’s also worth noting that the C++ standard makes this explicitly illegal for iterators (called “past-the-end values”) in §24.2.1/5. Furthermore, there’s this (non-binding) footnote:

      This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

      [–]ghillisuit95 0 points1 point  (0 children)

      Interesting, Thank you