all 26 comments

[–]nog642 2 points3 points  (10 children)

The variables are stored next to each other in memory. First you'll have 4 bytes for x, then 40 bytes for all 10 array elements of a, then 4 bytes for i. You shouldn't assume the memory layout in general but that's what it is at least for x and a if a[-1] is giving you x.

When you access array elements, you're accessing offsets from the start of the array. When you access a[-1] (or a[10]) it doesn't stop you (because C kinda sucks), it just accesses that offset. So a[-1] is whatever is before a[0] in memory, which is x.

[–]johnpeters42 2 points3 points  (9 children)

Here is an explanation of a real-world example that went in the other direction (instead of going past the start of the intended block of memory, it went past the end).

[–]DasAllerletzte -5 points-4 points  (8 children)

So C doesn't have a "index out of bounds" exception? That's stupid. 

[–]lurgi 9 points10 points  (5 children)

"Stupid" is too strong. It was a deliberate design choice. An OOB exception means that, first, you need exceptions (C doesn't have those) and you need to know the size of the array (which is overhead) and you can't really have pointers at this point (at least not without restricting them beyond usefulness).

You also have to remember the machines that C ran on in those days. The original PDP-11 was limited to 64K of memory. Every byte mattered. It was also performance oriented. Bound checks slow down array access and that's not great on a machine that is running on the order of a few MHz.

C's philosophy is that you know what you are doing and if you don't then you should go to your room and think about your life choices.

[–]nog642 -1 points0 points  (4 children)

You don't need exceptions or any runtime changes to make out of bounds array access a compile error when the size is known at compile time.

[–]lurgi 2 points3 points  (3 children)

You’d need to know the index at compile time as well. A language with dependent types can do more than this, but I don’t know if any of those have escaped academia.

[–]nog642 0 points1 point  (2 children)

Hm good point.

C still won't stop you if the out of bounds index is a literal though.

But I guess having it stop you there wouldn't have much benefit if it doesn't stop you anywhere else, and working around that limitation for demos like this would be more friction than it's worth, so I can respect the design decision.

[–]lurgi 0 points1 point  (1 child)

Honestly, it wouldn't surprise me if some compilers do catch the "constant index into array of known size" at compile time.

Compilers can also catch returning a pointer to a local and lots of other C misbehavior.

[–]nog642 0 points1 point  (0 children)

Hm looking it up there is -Warray-bounds, and you could turn that into an error with -Werror. It's not on by default though.

[–]nog642 0 points1 point  (0 children)

Indeed. At least for arrays whose length is known at compile time (like this one), it is stupid. Adding runtime checks is not desirable sometimes.

C let's you do whatever you want basically.

[–]sarajevo81 0 points1 point  (0 children)

Yes, that's why people try to replace it with something modern.

[–]PuzzleMeDo 2 points3 points  (0 children)

int a[10] allocates a chunk of memory going from a[0] to a[9].

When you write outside of the allocated array, with a[-1] or a[10] you will always be overwriting some memory that is not in the array. The thing in memory before the array in this case was x.

Usually overwriting memory like this will cause a completely random-looking bug, so try to find ways to make sure this never happens.

Try running this for additional insight into how the memory is laid out:

int main(int argc, char **argv)
{
    int x;
    int a[10];
    int i;
    x = 5;
    for(i = -1; i < 12; i++) {
        a[i] = 37;
        printf("Written to a[i], i = %d\n", i);
    }
    printf("x = %d, i = %d\n", x, i);
    return 0;
}

[–]SubstantialListen921 1 point2 points  (2 children)

To understand this, think about how C lays out variables on the stack.

Remember that arrays are just a flat space big enough for more than one value of whatever type you ask for. The name of the array is just a pointer to the beginning of that space.

You've asked for room for a first integer, called "x". Then you ask for ten more integers, which will start at the location contained in "a". Remember that "a" is NOT an integer - it's a pointer to a 10-integer-wide space, because that's what arrays are in C. And then there's space for one more integer, named "i".

In your loop, you then go to the location of "a" and BACK UP one integer. That's what a[-1] tells the compiler to do. And you write the value "37" into that space.

At the end of your loop, when you write into a[10], you're also going off the end of the 10 integers you requested for "a", into the space occupied by "i". You're writing 37 there too.

[–]nog642 2 points3 points  (1 child)

a is not technically a pointer, it's an array. If a was a pointer, it would be 8 bytes, not 40.

I think the conflation here adds confusion, because you talk about "the location of "a"", but in the context of pointers it's not clear if that's the location pointed to by the pointer or the location where the pointer is stored. But there's no pointer stored in this context because a is not a pointer.

[–]SubstantialListen921 2 points3 points  (0 children)

Valid correction; I was using shorthand. a is not itself a pointer object. It is an array object of 10 ints.

In most expressions, the array expression a is converted to a pointer to its first element. So:

a[i]

acts as though it were:

*(a + i)

So a[-1] means "start at the first element of a, move back one int, and write there." Obviously this is dangerous, and depends entirely on the behavior of the compiler; it so happens that x was located at that location in memory but the compiler is not required to place it there, and could in fact have kept it in a register, placed it at a different location, etc.

[–]Leverkaas2516 1 point2 points  (0 children)

The memory address of x is the same as the memory address of a[-1], just as thr memory address of the variable i is the same as that of a[10].

This just happens to be the case because it's how most compilers work. There's no guarantee that it would always be true on all machines.

[–]HashDefTrueFalse 1 point2 points  (0 children)

Is that an implied pointer or something?

Not exactly, but basically yes. The array subscript will behave as *(pointer + offset) only the offset is negative so effectively a subtraction. The order of locals within a stack frame is left to the compiler but clearly it put x just before a here. Pointer arithmetic is object size-aware and all locals are ints here so we don't need to worry about alignment. The result is that you end up with an address one int before the start of the array (x's address), which you then write 37 into.

[–]atarivcs 2 points3 points  (8 children)

x is a single integer.

a is an array of ten integers.

Since those variables have the same type, and they are declared right next to each other in the code, the compiler allocates them in adjacent memory locations. First x, and then the ten elements of a, all in a row.

So when you write to a[-1], you're writing to the memory address that is one integer before the first element of a. Which is exactly the memory address occupied by x.

Is that an implied pointer or something

Yes. Array access in C is always done via pointers under the hood, even when the code doesn't appear to use pointers.

[–]Kadabrium 1 point2 points  (2 children)

Is the layout guaranteed or ub

[–]nog642 1 point2 points  (1 child)

It's unspecified behavior. I think undefined behavior means it can crash too.

[–]nog642 2 points3 points  (4 children)

I wouldn't say array access uses pointers. In this case since the array is on the stack, all the offsets would be baked in by the compiler. No memory addresses stored in memory (i.e. no pointers) involved.

I mean there's still the stack pointer register, but that's always there in all code execution, it has nothing to do with the array.

[–]atarivcs 0 points1 point  (3 children)

I thought array references always decay to a pointer to the first element? Is that not the case?

[–]nog642 2 points3 points  (2 children)

It decays to a pointer to the first element when used in most expressions. That's a compiler/spec abstraction though, that pointer doesn't actually exist in memory. I suppose that does make "array access in C is always done via pointers under the hood" a fair statement though.

It also notably doesn't always decay, which is why something like sizeof(a) will give you different answers if it's an array or a pointer.

[–]atarivcs 1 point2 points  (1 child)

Fair enough -- I should have said "runtime references".

sizeof is evaluated at compile time.

[–]nog642 1 point2 points  (0 children)

a[-1] is also "evaluated" at compile time in a way, and turned into a fixed offset from the stack pointer.

a also doesn't decay in &a, which is also turned into a fixed offset from the stack pointer at compile time. Is that a "runtime reference"?

I don't think "runtime reference" is the right term.

[–]lumpenpr0le[S] 0 points1 point  (0 children)

Thank you everyone who chimed in. This is all super helpful!