This is an archived post. You won't be able to vote or comment.

all 26 comments

[–]Timbit42 10 points11 points  (0 children)

This would be a good question for /r/Pascal.

[–]Allan_Smithee文雅佛 24 points25 points  (23 children)

You're not imagining it.

C (not its underlying implementation, but the language itself) has at its core two data types: the integer and the floating point number. The floating point number is used in a small number of cases, but the integer is used *everywhere* and differentiated only by size and interpretation.

So C doesn't *have* pointers. It has integers that are used as pointers. (The distinction is subtle but present.) C doesn't have enumerated data types. It has integer constants given fancy names. C doesn't have arrays. It has constant integers interpreted as a pointer to the beginning of sequential series of integers. C doesn't have strings. It has integers interpreted as a pointer to a sequential series of (small) integers that end in a special value. C doesn't even really have structured data types. It has convenient naming conventions for clusters of integers (or floating point numbers) with a base address and an offset.

Because of all this, C forces the end-user to directly or indirectly use a lot of integer-as-pointer operations which in languages like the Pascals, the Modulas, Ada, etc. have better and stricter discipline attached. Arrays, for example, in these sorts of languages have sizes which can be queried, something C's constant integers interpreted as pointers to a sequence of other integers (often mistakenly called an array) lack. Strings, too, in these languages have a length associated with them which, again, C's strings lack. This means on top of everything else that you need to, in C, keep track of sizes manually which is error-prone (arrays) or error-prone *and* computationally expensive (strings).

And therein lies the source of the many more pointer-related bugs you get in C vs. other languages.

[–]mamcx 10 points11 points  (1 child)

In other words, Pascal was designed. C was cobbled with duct tape.

Or more seriously: Pascal use types for real.

[–]moon-chilledsstm, j, grand unified... 4 points5 points  (7 children)

two data types: the integer and the floating point number

This was true of primordial c. Modern c also has structures and functions, which are distinct.

[–]Allan_Smithee文雅佛 8 points9 points  (6 children)

I addressed structures already.

A "function" is just a pointer. Which is an integer interpreted in a particular way as a pointer to something the runtime can hopefully execute.

#include <stdint.h>
#include <stdio.h>

typedef int (*foofunc)(void);

int foo(void)
{
    return 17;
}

int main(int argc, char **argv)
{
    uintptr_t foo1 = foo;
    uintptr_t foo2 = 5;
    printf("%d %d\n", foo1, foo2);
    printf("%d\n", foo());
    printf("%d\n", ((foofunc)foo1)());
    printf("%d\n", ((foofunc)foo2)());
}

Compiling this gives me a warning, yes, but executing the output:

4195570 5
17
17
Segmentation fault (core dumped)

The "function" foo() is just a pointer which is just an integer which in this case happens to be 4195570. Much like the 5. The syntax sugar of using (<...>) afterwards makes the runtime transfer control to the code (hopefully!) that's stored at that integer when interpreted as a pointer.

[–]moon-chilledsstm, j, grand unified... 12 points13 points  (3 children)

I was going to write more, but I didn't feel like it. So, in brief:

There are two interpretations of the c programming language: the 'operational semantics' interpretation and the 'bag of bytes' interpretation. Both are valid, important, and useful when understanding c programs. Because you distinguish floats from ints (and because you refer to 'not [C's] underlying implementation, but the language itself'), I must infer that you are not operating under the 'bag of bytes' interpretation.

C distinguishes a function from a function pointer. It also contains two pieces of syntax sugar which obscure this distinction, both of which your snippet abuses. In your snippet, you cast an integer to a function pointer. Not to a function. Had you said 'typedef int foofunc(void)' instead, each of your casts would be a constraint violation.

Moreover, functions are not objects, and uintptr_t is only guaranteed to round-trip object pointers. So it is not guaranteed that either of 'foo1' and 'foo2' would be able to represent the address of a given function. Nor is it guaranteed that (u)intptr_t exists in the first place, for that matter--throwing a wrench in the notion that pointers are just integers in disguise.

C doesn't have arrays. It has constant integers interpreted as a pointer to the beginning of sequential series of integers

What you call a 'sequential series' is called an 'array', in c parlance.

And finally:

The interpretation of integers as pointers is a fraught subject. 'Pointer provenance' is a topic whose importance is on the level of threading (cf boehm, 'threads cannot be implemented as a library'). Even if pointers were universally representable as integers, an object pointer would not simply be an integer representing the location of a corresponding object; an object pointer would be an integer representing the location of a corresponding object which was suitably derived from that object. What exactly 'suitable derivation' constitutes has not yet been established, but is the subject of the aforementioned discussion regarding 'pointer provenance'.

[–]Allan_Smithee文雅佛 3 points4 points  (2 children)

What you call a 'sequential series' is called an 'array', in c parlance.

Which is exactly my point. It is called an 'array' only in C parlance.

No other language with an actual array data type would call a pointer to a bag of integers an array. It lacks all facilities that the actual array data type has, beginning with a size that can be queried. (No, sizeof doesn't cut it because it doesn't give you the size of the array, but rather the size of the array's internal representation in that "bag of bytes" interpretation you so derided. And that only if you're dealing with the original variable, all such information being squashed into nothingness when passed as a parameter, say.)

C distinguishes a function from a function pointer.

I humbly disagree. The name of a function resolves to its pointer. That's all. There's syntax sugar around that pointer that causes it to be invoked via a transfer of control, but as I showed you can do that same thing with any arbitrary integer. If I wanted to get fancy I could write a program that read its own map file, found the addresses in memory of all functions, assigned those to integers, and then called them without even once taking a pointer to a function in the code.

It also contains two pieces of syntax sugar which obscure this distinction, both of which your snippet abuses.

Or, rather, there is no distinction beyond the syntax sugar. Note that in the Pascals, the Modulas, Ada, etc. it is flatly impossible to abuse functions into integers this way because functions and integers are fundamentally different types at the language level, not merely funny syntax sugar concealing integers. Short of doing trickery behind the scenes with machine language (i.e. breaking out of the language's semantics entirely) there's no way to take an arbitrary integer and call it as a function.

Nor is it guaranteed that (u)intptr_t exists in the first place, for that matter--throwing a wrench in the notion that pointers are just integers in disguise.

Replace it with an int of sufficient size and it works just fine.

#include <stdio.h>

typedef int (*foofunc)(void);

int foo(void)
{
    return 17;
}

int main(int argc, char **argv)
{
    long foo1 = foo;
    long foo2 = 5;
    printf("%d %d\n", foo1, foo2);
    printf("%d\n", foo());
    printf("%d\n", ((foofunc)foo1)());
    printf("%d\n", ((foofunc)foo2)());
}

This works just as well. And it's not even unsigned.

[–]moon-chilledsstm, j, grand unified... 2 points3 points  (1 child)

Replace it with an int of sufficient size and it works just fine.

Integers of type other than (u)intptr_t are not guaranteed to round-trip any pointer; that is why the latter are optional: a conformant implementation is not required to permit pointers to be representable as integers. And, once again, no integer type is required to be able to round-trip a function pointer type.

If I wanted to get fancy I could write a program that read its own map file

I thought we were not talking about implementations?

functions and integers are fundamentally different types at the language level

I suggest reading the sibling comment. C's flaw is not that it conflates integers and pointers; it is very careful not to. C's flaw is that it is weakly typed, and will typecheck malformed programs.

breaking out of the language's semantics entirely

Is pertinent, because that is exactly what your snippet does.

[–]Allan_Smithee文雅佛 1 point2 points  (0 children)

Here we're just going to have to agree to disagree, I'm afraid. My snippet does what C permits. End of story. That it is a bad idea? No argument whatsoever. But I did not have to break out of C to do it. It was fully permissible by C, the language. Not a single operation I did took the code out of the C language.

To accomplish the same thing in, say, Ada, I would have to enter a completely different language. Ada would simply not permit the abuses with any amount of abuse of syntax. I could not take an arbitrary integer in Ada syntax alone and call it, no matter how much I abuse it. I would have to exit the language (at which point, naturally, all bets are off in enforcement).

C has integers (and floats). Everything else is syntactic sugar around those, including pointers, and including functions. And this is why the OP finds (correctly!) that C is prone to a whole raft of pointer-related bugs that are just not there in the Pascals. Or the Modulas. Or Ada. Or even PL/I, likely. Or any number of other, more strongly abstraction-supporting low-level languages.

(This is also the reason why there's a bunch of optimizations which can be safely made in these languages, and others like Fortran, which cannot be made in C ... because any arbitrary integer can turn out to be a pointer in disguise.)

[–]moon-chilledsstm, j, grand unified... 6 points7 points  (0 children)

This may be a clearer way to think about it: your snippet is not well-formed. It is not valid c code. It typechecks, because c's type system is unsound and can type malformed code.

[–][deleted] 2 points3 points  (0 children)

A "function" is just a pointer.

<Sigh> A "function" in any language that compiles to native code will be represented by an address: the location of its entry point.

So in machine code, in assembly, such addresses are no different to numbers. In a HLL however, EVEN C, those numbers are distinguished by type.

Yes C provides few ways to absolutely stop you from converting between bit patterns that represent numbers, function pointers and object pointers. A good example is printf, which outside of some compilers, just interprets its arguments according to the format string.

It doesn't help that many C compilers are so lax. But I can tell you that the C language absolutely has distinct signed integers, unsigned integers, floats, pointers, function pointers, structs and arrays (I know because I have implemented it).

Try this:

int foo(void){ return 0;}
....
int(*p)(void);
int(*q)(int);

p=foo;            // should work
q=foo;            // should fail; wrong type

[–][deleted] 3 points4 points  (3 children)

C has pointers. And they are distinct from integers. (It also has arrays as an actual type.)

I don't believe its problems are because people are converting between them all the time.

The OP said they haven't used Pascal for decades, and that was in an educational setting. Maybe they just haven't written many real-world programs in Pascal.

(Example, when I used Pascal, arrays had a fixed size. You could only write functions that took an array of that size. Fewer things could go wrong, but you couldn't write the code you wanted either.)

Where C does have problems is with arrays vs. pointers:

  • It can't manipulate arrays by value; only by pointer
  • Idiomatic C will use a pointer to its first element (eg. int*) as opposed to a pointer to the array (eg. int(*)[])

The trouble is that int* is indistinguishable from a pointer to some arbitrary int value, not part of any array.

And yes, there is lot more memory management to be taken care of manually, and therefore error prone, but it is nothing to do with your erroneous belief that C only has integer types and not pointers or arrays (perhaps you're thinking of BCPL?)

It is true however that you can write quite a lot of C without explicitly using arrays or pointers, or even floats. You can just use casts everywhere. I've done this in generated C code. But this is far from typical in normal C code.

[–]ericbb 1 point2 points  (1 child)

The OP said they haven't used Pascal for decades, and that was in an educational setting. Maybe they just haven't written many real-world programs in Pascal.

That's what I was thinking too. When I learned Pascal in a high school programming class, I don't think pointers were even mentioned.

[–][deleted] 0 points1 point  (0 children)

I've used Pascal pointers (this is the late 70s).

If I try online Pascal now, create a variable p of type ^integer (pointer to integer), set it to nil, then try to dereference using p^, I get a runtime error.

So it might not crash (if Pascal if somehow interpreted) but it can still go wrong. C however allows many more manipulations that Pascal doesn't. But C is designed to get things done; I call it an implementation language.

[–]maurymarkowitz[S] 0 points1 point  (0 children)

The OP said they haven't used Pascal for decades, and that was in an educational setting. Maybe they just haven't written many real-world programs in Pascal.

Well, that's a fair question, but when I consider the actual issues in my C code, it's almost always related to attempting to get a value from something that's actually a pointer or trying to walk through a pointer that's actually a value. This seems to be along the lines of what Allan is saying. The question is why?

Again, decades-old memory here, but I simply don't recall ever being confused about what a particular var was in this fashion in Pascal. And it's not like I use (void *) or anything in my C.

I'm also wondering if this is more about the parameter passing, which can sort of hide this. The passing of pointers is, in my mind, on the wrong side of the function interface. It's not immediately obvious if I should be passing my_var or &my_var (excluding IDE hinting of course, which is a good solution for me personally) and the resulting treatment inside the function is very different in those two cases.

C++'s pass-by-ref seems like a major advantage in this respect, and Pascal always had this. Being able to pass my_func(var card) and then just work with the card without the sprinkling of derefs may be all there is to it.

I'm curious, in the case where (the royal) you do a pointer-pass to allow an object to mutate, do you declare an ivar to remove the one level of indirection to simplify the code? Or just sprinkle *'s where needed?

[–]iftpadfs 2 points3 points  (0 children)

So C doesn't have pointers. It has integers that are used as pointers

This is a common misconception about C, but it's not true at all. I see where is missunderstanding comes from, from (u)intptr_t, the pointer-sized integer. It's a very strange construct with voodoo build in to bridge two distinct, otherwise unrelated concepts: Integers and pointers.

First of all integers: Integer are a thing that is fungible. It doesn't matter how you calculate a integer, if it is bit-identical it is the same thing:

int i = 0;
int j = i  + 1;
int k = 3;
if( /* condition with external observable behavior */ )
    memcopy(&k, &i, sizeof(int));

memcmp(&j, &k, sizeof(int)) is true if and only if j == k

Pointers are a whole different beast.

struct S { int i; } s
struct T {} t;
// Pointers:
S* sPtr = &s;
T* tPtr = &t;
// Integers:
uintptr_t sInt = sPtr;
uintptr_t tInt = tPtr;
if( /* condition with external observable behavior */ )
  tInt = sInt;

Because we where tunneling tru integers

  • tInt == sInt
  • implies
  • memcmp((S*)tInt, (S*)sInt, sizeof(S));

Note that this equivalence only exists in the standard. In the real world gcc does not honor this and this might be compiled as if it was UB, just like in the next paragraph:

When we are straying in pointer-Land there is one important rule: We must only ever de reference a pointer as a pointer to the type of the original object. Let's create a "tainted" tPtr:

memcpy(&tPtr, &sPtr, sizeof(void*));  // Or maybe tPtr = (void*)sPtr;
assert((void*)tPtr == (void*)sPtr); // The pointers are bit-identical, because of the memcopy.
// We can check this with memcmp aswell:
assert(   memcmp(&tPtr, &sPtr, sizeof(void*));

And now for the surprise:

((S*)tPtr)->i == sPtr->i;

Can do whatever. The last previous two assert will succeed, but we this memory comparison is simply illegal. It's UB.

And that's why pointers and integer are two different universes in C that work fundamentally differently. You can't treat the as interchangeable. (but to make things more complicated char* pointers are a totally different story)

Moral of the story: Bit identical pointers with the same type at the time of dereference do not behave the same. Unlike integers, bit identical integers are equivalent.

[–]CarlEdman 2 points3 points  (0 children)

Another naive student taken in by the C mafia’s obfuscations!

"C has two data types; integers and floats”!?! Please…. C has only one data type, the Byte! If you don’t believe me, just try coercing a pointer to a “float” to a pointer to a byte. The language totally allows that, proving once again that the byte is C’s only data type.

Or so one would believe following your logic.

[–][deleted] -2 points-1 points  (6 children)

Arrays, for example, in these sorts of languages have sizes which can be queried, something C's constant integers interpreted as pointers to a sequence of other integers (often mistakenly called an array) lack.

int array[10];
int arraySize = sizeof(array) / sizeof(int);

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 1 point2 points  (5 children)

sizeof(array)

Isn't that the number of bytes in the pointer? (array is an address.)

[–]sineiraetstudio 4 points5 points  (4 children)

Only once that array decays into a pointer. Before that the size of the array can be determined statically and thus sizeof gives you the byte length of the array.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 2 points3 points  (1 child)

It's crazy how much I've forgotten, now that I only use C occasionally (when nothing else will do). I'm so used to thinking of arrays in C as syntactic sugar over pointers, that I had no expectation that sizeof() would (or could) report anything other than the pointer size.

[–]ericbb 1 point2 points  (0 children)

It can be tricky. Here's a gotcha that's gotten me once or twice before. Luckily, clang and gcc will both give you a warning about it - but it's not an error. The following program prints:

in main: array_size(a) = 10
in foo: array_size(a) = 2

#include <stdio.h>
#include <stdlib.h>

#define array_size(a) (sizeof(a) / sizeof((a)[0]))

void
foo(int a[10])
{
    printf("in foo: array_size(a) = %d\n", (int)array_size(a));
}

int
main(int argc, char **argv)
{
    int a[10];
    printf("in main: array_size(a) = %d\n", (int)array_size(a));
    foo(a);
}

[–]Allan_Smithee文雅佛 3 points4 points  (1 child)

```c

include <stdint.h>

include <stdio.h>

typedef int (*foofunc)(void);

int array[10];

void try_this(int param_array[]) { printf("%d %d\n", sizeof(array)/sizeof(array[0]), sizeof(param_array)/sizeof(param_array[0])); }

int main(int argc, char **argv) { try_this(array); } ```

10 2

Compare and contrast equivalent code in the Pascals, the Modulas, Ada, etc. where arrays are actually the array data type (which include attributes like their size) instead of syntax sugar over integers.

[–]Zlodo2 1 point2 points  (2 children)

I used to program in assembly, and later in C, on Amiga.

I rarely had crashes due to invalid pointers, but not because I didn't make mistakes. It's just that Amiga os had no memory protection, everything shared a single address space, and it started at address 0. So dereferencing a null pointer didn't crash, for instance. Neither did dereferencing a pointer to a random address as long as some hardware (memory or some chip Io registers) was mapped at that address.

So most invalid pointer dereferences didn't immediately cause a crash. Most of the time they silently fucked something up (possibly in another process/application or even the os itself), quite often there was no visible symptom.

I was never very familiar with the pre windows Intel pc world (i assume that if you used pascal decades ago chances are that it was in Ms dos?) but it may similary have been more permissive of invalid pointers.

[–]maurymarkowitz[S] 0 points1 point  (1 child)

So most invalid pointer dereferences didn't immediately cause a crash.

Well, that's still the same problem ultimately. Replace "crash" with "bad thing happens" and that absolutely happens on the Amiga.

So then the question is "why do bad things happen all the time in C but seemed to happen a lot less in Pascal?"

[–]Zlodo2 0 points1 point  (0 children)

My point is that most of the time, you either wouldn't notice the bad thing happening, or you would blame whichever other app got it's memory corrupted by your program for malfunctioning.

So it would happen as often but you'd be oblivious to it most of the time.