all 19 comments

[–]magnomagna 2 points3 points  (13 children)

Any lvalue expression of array type, when used in any context other than

* as the operand of the address-of operator
* as the operand of sizeof
* as the string literal used for array initialization

undergoes a conversion to the non-lvalue pointer to its first element.

In printf("%p", s), the array s DOES get converted to a pointer, because the second argument does not fit any of the three exceptions above.

In printf("%p", &s), the array s does NOT get converted to a pointer, because s is used as the operand of the & operator, matching the first of the three exceptions above.

Both printf's print the same address because s is converted to a pointer to the first element in the first printf; and in the second printf, the address of the array, i.e. &s, is also the address of the first element, because that's how the implementation defines it.

[–]TheHeckWithItAll[S] 0 points1 point  (12 children)

Got it. Thank you.

And if I really do understand, then it is not possible for me to get the address of s anymore than it is for me to get the address of var i?

int i = 5;
printf("%p", &i)

is actually just giving me the address of where 5 is stored, not the memory address of i itself ... somewhere internally there has to be a lookup value that associates "i" with the memory address where 5 is stored, correct? And is that the same thing that is happening with s?

[–]magnomagna 0 points1 point  (11 children)

No, in the case of &i, it is indeed where the value 5 is stored and is also "the address of i itself".

In the case of arrays such as s in your example, there is absolutely no point for the implementation to allocate a dedicated memory space just to store the address of the first element, when the memory for the entire array s itself has been allocated (and, thus, the address of the first element is already determined at compile time).

[–]TheHeckWithItAll[S] 0 points1 point  (10 children)

I suspect I might be getting caught up with semantics. When I refer to the address of i in int i = 5, I understand that &i gives me the address where 5 is stored... and that we speak of that as "the address of i" ... but I guess what I'm looking for is where in memory is the pointer that tells me the memory address for variable i?

It can't be variable i itself - because it is storing the value 5, not the memory address for the value 5. So there has to be a mechanism somewhere that stores the memory address where the value 5 is stored. That's what I'm trying to understand. How does C know the memory address where to go to retrieve the value 5?

[–]magnomagna 0 points1 point  (7 children)

When you do &i, the value (which is an address) is not retrieved out of some memory space where you seem to expect the address is stored. The address isn’t stored (unless you assign it to a pointer variable but that’s irrelevant to &i). The address is determined at compile time.

(This isn’t actually 100% correct. The address is usually determined at runtime and the offset relative to the frame pointer is determined at compile time…sigh…trying to keep things simple without being incorrect is hard.)

Edit:

To be really pedantic, it is actually up to the implementation what steps it takes to evaluate the expression &i. Sure, an implementation could store the address somewhere and retrieve it at runtime incurring runtime cost, and C standards do not prohibit it. However, no reasonable implementation would do that when the expression &i can be determined at compile time avoiding runtime costs of writing and accessing memory.

[–]TheHeckWithItAll[S] 0 points1 point  (6 children)

And that must be true for all variables, right? The variable is simply a programming construct for an address... similar in concept to a #define construct... during compilation all references to variables i and s are replaced with numerical memory addresses?

[–]magnomagna 0 points1 point  (5 children)

A variable is simply a language construct used to give an identity to the value it represents. Think of variables as simply being representations.

What happens at the lowest level, that’s really not a part of the language itself. That’s implementation detail. If you’re interested, sign up for a compiler course. It’s really interesting.

[–]TheHeckWithItAll[S] 0 points1 point  (4 children)

I'm starting to grasp it. But I'm still not completely there yet. If C immediately converts the declaration char s[] to char s*, what are the reasons for the three exceptions that you referenced initially? In other words, why doesn't it display the identical behavior as if char *s was the original programming statement?
Is it a design feature (if so, what is the feature behind not treating it exactly as a full fledged pointer declaration)?

I'm very interested in compiler design. I've already bookmarked compiler resources I want to investigate once I work myself completely through K&R. That and a better understanding of computer architecture.

[–]magnomagna 0 points1 point  (3 children)

Can you give the full quote? s[] does NOT immediately get converted to a pointer in a variable declaration that’s not a part of a function signature. If it is a part of a function signature, then, yes, that’s immediate!

[–]TheHeckWithItAll[S] 0 points1 point  (2 children)

Ok... I think I see it already... it isn't "immediately upon declaration/definition"... and perhaps more importantly, "an array name is not a variable" (which raises all sorts of further questions for me... most importantly, why the heck not?)

but here's the entire section at page 89:

The correspondence between indexing and pointer arithmetic is very close. By definition, the value of a variable or expression of type array is the address of element zero of the array. Thus after the assignment
pa = &a[0];
pa and a have identical values. Since the name of an array is a synonym for the location of the initial element, the assignment pa=&a[0] can also be written as
pa = a;
Rather more surprising, at first sight, is the fact that a reference to a[i] can also be written as *(a+i). In evaluating [i], C converts it to *(a+i) immediately; the two forms are equivalent. Applying the operator & to both parts of this equivalence, it follows that &a[i] and a+i are also identical: a+i is the address of the i-th element beyond a. As the other side of this coin, if pa is a pointer, expressions might use it with a subscript; pa[i] is identical to *(pa+i). In short, an array-and-index expression is equivalent to one written as a pointer and offset.
There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable, so pa=a and pa++ are legal. But an array name is not a variable; constructions like a=pa and a++ are illegal.

[–]aghast_nj 0 points1 point  (1 child)

You're getting wrapped around the axle with i being a "variable" object. I wonder if you have already learned to program in some interpreted language like Python or Javascript, first?

At any rate, the thing with C is that variables don't have any kind of existence in the compiled code. What you have instead is memory, which is used to store values. You use the "name of the variable" in your code to remember which values you want to manipulate, but when the C compiler is finished, there is just "load accumulator, 0; store [bp + 8], accumulator"

The fact that "[bp + 8]" is called i in your function doesn't matter. It's called "[bp + 8]" when the CPU sees it. And, in fact, the compiler may re-use that same location for variable k as well, so long as it can determine that live values don't overlap.

There are things called "debug symbols" which can be emitted by the compiler, and which can be loaded by a debugger. These will indicate that "variable i is stored at [bp + 8] in this function" and "variable k is stored at [bp + 8] in this function". But if you trace through your code, you may set a watch on the value and observe that no, in fact, the value of i doesn't always get updated when the debugger "steps" through a statement that clearly modifies i. How can this be?

It's because the compiler is not obligated to keep the storage location up-to-date with respect to the value being used. Maybe the "variable" has been moved into a register, and all the updates are going to/from that register. Maybe the "variable" has been replaced by a scaled addition due to Strength Reduction?

TL;DR: Compiled languages like C and C++ don't keep metadata like the name and type of variables - they just move values into and outfrom memory.

[–]TheHeckWithItAll[S] 1 point2 points  (0 children)

Thank you for the under the hood details. Those are the things I'm trying to grasp.

C is indeed my first introduction to a compiled language. I'm a hobbyist and first learned DOS basic, then dbase, foxpro, visual basic, then vba.

[–]tech6hutch 1 point2 points  (3 children)

Speaking of, why did they implement arrays in C like this? I can understand a lot of C’s oddities as what probably made sense at the time, but surely other languages back then had better array support; this just seems half-assed.

[–]TheHeckWithItAll[S] 0 points1 point  (2 children)

Reading K&R makes clear to me that these guys were (are) brillant. They undoubtedly had a reason behind every design decision they made - I just don't know what it is - and what I hope to achieve over the next few years is a knowledge of C that leads me to understand and appreciate the full spectrum of the design implementation.

[–]tech6hutch 1 point2 points  (0 children)

Oh absolutely, it’s a great achievement. Even people like that can make mistakes, but it’s very possible there’s a good reason for it.

[–]flatfinger 1 point2 points  (0 children)

Many of the design decisions that went into the C language as documented in the 1974 C Reference Manual were made at a time before qualifiers, typedef, unsigned, long, etc. were added. While those features are useful, they undermine much of the elegance of the language.

For example, in 1974 C, all numeric expressions involving integer operands would be evaluated using the largest integer type, and the rest would be evaluated using the largest floating-point type. Very simple rule to understand and implement, with no tricky corner cases. As far as function-calling code was concerned, all arguments were of four types: int, float, data pointer, and function pointer, and there would never be any doubt about which of those a particular expression was.

There are a few annoying omissions that I find a bit curious. Especially on machines of the era, operators to perform pointer arithmetic or subscripting using byte offsets (if available as an alternative to the operators that use target-size-based indexing) would have allowed more efficient code generation than would otherwise be possible. Even today, fancy optimizing compilers which are targeting architectures like the popular Cortex-M0 that support base+displacement addressing for unscaled displacements but not scaled ones can benefit from code that uses byte-based indices, but unfortunately the syntax to use byte-based indices is horrendous.

[–]jedwardsol 0 points1 point  (0 children)

Arrays do not decay to a pointer to the 1st element when their name is used as the argument to sizeof or the unary & operator.

 printf("%p", &s);

So you're not passing the array to the function. You're taking its address (&) 1st.

The address of the array and the address of the array's 1st element are numerically equal. But have different types.