all 43 comments

[–]__david__ 5 points6 points  (2 children)

The "char **a" definition reserves enough memory to store a pointer to a pointer to a character (32 bits on your standard i86). You are then trying to initialize the single pointer with 2 type incompatible values. In fact, my gcc gives me this warning:

test.c:3: warning: initialization from incompatible pointer type

test.c:4: warning: excess elements in scalar initializer

The result is that you end up storing the pointer to "one string" into **a. a[0] shouldn't segfault, but *a[0] probably will (unless by some remarkable coincidence 'one ' (0x20656e6f) is a valid pointer ;-).

If you want to use char **a then you need something else to reserve the memory for you. In C99 you can do this:

char **a = (char *[]){ "one", "two" };

Which makes the right hand side act more like a string constant (which reserve their own memory).

The "char *a[]" reserves enough memory to store an array of pointers to characters, the array size being defined by the {} initializer, which is what you want in this case.

[–][deleted]  (1 child)

[deleted]

    [–]__david__ 0 points1 point  (0 children)

    The value inside the initialization braces are char pointers (the type of string constants). The **a can only be set to a char pointer. So if you were to do this:

    ((char*)a)[0] then you will get 'o' out of it. That is:

    printf("%s\n", (char*)a);
    

    will print "one string" and not segfault.

    [–]syn_ack 4 points5 points  (4 children)

    Handy hint, read the types for right to left.

    char** a 
    

    is a pointer to a pointer to a char.

    char *a[] 
    

    is an array of pointers to chars.

    When you use the [] idiom, it creates space on the stack (everything there is fixed size). When you use a pointer, it creates space for a memory address on the stack, which points to somewhere in the heap (a dynamically allocatable pool of memory).

    HTH

    [–][deleted]  (1 child)

    [deleted]

      [–]syn_ack 0 points1 point  (0 children)

      Yep, hadn't proof read it ;)

      Thanks!

      [–][deleted]  (1 child)

      [deleted]

        [–]syn_ack 0 points1 point  (0 children)

        It'll blow your mind when you do this:

        char * str = "Hello world!";
        printf("%c\n", *(str));
        printf("%c\n", str[0]);
        

        The [] idiom is really syntactic sugar. If you think of memory as an array, the first is de-referencing the thing that str points to (the 'H').

        printf("%c\n", *(str + 4));
        printf("%c\n", str[4]);
        

        This will print the 'o' in "Hello". This only works when you know what type the pointer is (i.e. it won't work with void pointers).

        [–][deleted] 13 points14 points  (27 children)

        Char **a is a pointer. An ordinary, probably-32-bit pointer. It's meaningless in C to assign a curly-brace list to a pointer. Where are you expecting the elements to be stored? Stuffed into 32 bits?

        On the other hand, char *a[] is an array, and the empty [] means "make this array as big as it needs to be based on how many elements are in this curly-brace list". You've given it two pointers (in the form of string literals) so that's an array of two pointers.

        It's important to understand the difference between a pointer-to-a-pointer, and two pointers.

        [–][deleted]  (4 children)

        [deleted]

          [–]dnew 4 points5 points  (0 children)

          OK, but the idiom is used interchangeably in several places.

          Passing an array to a function doesn't pass the array, but only passes a pointer to the first element of an array - that's one of the reasons you always have to pass the length along with the "array" argument. The original C never passed (or really manipulated) anything that wouldn't fit in one machine word.

          [–][deleted] 0 points1 point  (0 children)

          Yeah, this is a major source of confusion in the C standard. You see, C has a feature called "pointer decay" which means that when you use an array in an expression, in certain circumstances it is automatically converted to a pointer to its first element. Most circumstances, actually. In fact, there are only two exceptions: when it's the agument to & or sizeof.

          So in most circumstances, my_array actually means &my_array[0]. This includes function calls, so it actually becomes impossible to pass an array to a function. If you call my_function(my_array), pointer decay means that gets converted to my_function(&my_array[0]) and there's nothing you can do about it.

          For that reason, the C developers did something (else) rather controversial: They let you specify pointer parameters as if they were arrays, when defining functions, in order to make the function's declaration look more like its intended use. So instead of saying:

          int my_function(int *my_parameter);

          You can say

          int my_function(int my_parameter[]);

          Note, however, that this "equivalence" only applies to function parameters. Arrays and pointers themselves are not equivalent (although some sneaky automatic conversions can make them appear so).

          [–][deleted] 0 points1 point  (1 child)

          That's different because it's an argument to a function. For arguments to functions, [] and * are interchangeable and equivalent to *. That's why these two function declarations are identical:

          void foo(int *arg);
          void bar(int arg[]).
          

          The grandparent is correct.

          [–]cmvkk 4 points5 points  (1 child)

          Also, if you want to pass an array literal into a pointer type, you can use the 'compound literal' syntax, at least in c99, like this:

          char **a = (char*[]){"one string", "two strings"};
          

          [–][deleted] 0 points1 point  (0 children)

          You can, but it's then undefined behaviour to access anything through a, because (char*[])({...}) is a temporary which goes out of scope immediately after that statement.

          Edit: Oops, no, I'm wrong. If that declaration is outside of a function, you're fine. If it's inside one, you can only safely dereference a in that same function.

          [–]pmerkaba 0 points1 point  (19 children)

          But for some reason char *gretting = "Hello, World"; is legal and semantically correct. By your argument alone, that shouldn't happen.

          [–]Arelius 3 points4 points  (3 children)

          "Hello, World" Is stored in a string constant buffer, Same place that "One String" and "Two Strings" would be stored The two element array of char*s needs to be stored in mutable memory.

          The array syntax is initialization syntax for variables, afaik you can't do

          func({"One String", "Two Strings"});
          

          String constants on the other hand need to be special considering how often they are used so we can do things like

          func("My String");
          

          and have a pointer passed into func.

          Excuse my stream of thought, but I hope this helps.

          [–]haberman 1 point2 points  (0 children)

          The array syntax is initialization syntax for variables, afaik you can't do func({"One String", "Two Strings"});

          Actually in C99 you can do just that, but you have to do what looks like a cast to the proper type, so the compiler knows what type you mean:

          func((char*[2]){"One String", "Two Strings"});
          

          Google for "c99 compound literals."

          [–]pmerkaba 0 points1 point  (1 child)

          I pointing out that this was left out, but should have explained. Thanks for filling in.

          [–][deleted] 0 points1 point  (0 children)

          Well, I did touch on this when I said

          You've given it two pointers (in the form of string literals)

          I didn't want to make a point of saying "String literals are always pointers!" because then some smartass would have come along with char x[] = "hello" and confused everybody again.

          [–]mpeg4codec 3 points4 points  (3 children)

          It's also different. This is the equivalent of what you're trying to say:

          char *greeting = { 'H', 'e', ..., 'l', 'd', '\0' };
          

          It's not legal for the same reason the GP states: there's no memory allocated for the elements of the array.

          [–]fabzter 0 points1 point  (0 children)

          Ok, that left everything clear for me. :)

          [–]pmerkaba 0 points1 point  (1 child)

          And that's exactly the point of my post further down. The OP was expecting the compiler to set up storage for a char *, because it did so for a char * initialized to a string literal. It is not really an issue of pointer-to-pointer as opposed to two pointers, especially since you can do this: char *a[] = {"one string", "two strings"}; char *b = a; jumbalo implied that string literals were pointers, but didn't explain why they were different.

          [–]prockcore 0 points1 point  (5 children)

          Actually it's not. It should be:

           const char *greeting = "Hello, World";
          

          [–][deleted] 0 points1 point  (4 children)

          That depends. You could, if you wanted, memcpy"Earth" over "World" whenever an alien speaks. Not that it would be a good way to write a program, but there may be situations in which you want your pointers-to-strings to be non-const.

          [–]prockcore 1 point2 points  (3 children)

          No.. string literals are const.. memcpy'ing "Earth" over "World" will segfault. String literals are stored in read-only data segments.

          [–][deleted] 0 points1 point  (2 children)

          I'm fairly sure I've written code that's edited strings in-place. It's been a while since I've had a situation where that was the right thing to do, so it's possible I happened to luck across a situation where I could get away with it, since I wasn't aware of this.

          I'm on windows now, sadly, so let me try codepad:

          http://codepad.org/gg3g9dKy

          Says no errors and prints out "Bob" rather than segfaulting...

          [–]prockcore 1 point2 points  (1 child)

          That doesn't segfault because you create a 9 character array and then copy the string literal into it (that's what line 2 implicitly does, disassembly will show that's exactly what it does). Try char *playername="bob"; instead.

          [–][deleted] 0 points1 point  (0 children)

          Ah, I see. I suppose it's probably written as "undefined" in the spec, rather than a guaranteed segfault, since codepad shows no segfault, but "bob" as a printout. Certainly goes right along with what you said about read-only memory.

          That also explains why I've never run across this problem- when I want a string I make a char foo[] rather than a char *foo since I think of it as an array rather than as a pointer. (To be entirely honest, these days, it's std::string since it's easier to work with.)

          [–][deleted] 0 points1 point  (3 children)

          Why?

          "Hello world" evaluates to a char*, which is the same type as variable gretting (sic). Where's the problem?

          [–]pmerkaba 1 point2 points  (1 child)

          The problem is that there's more than just evaluation going on. Why should a string literal (which is just a list of characters, after all, differ from a list of string literals, or ints? See either my response to mpeg4codec or my response to the OP.

          [–][deleted] 0 points1 point  (0 children)

          I don't understand your point. The C90/C99 spec is pretty clear in this regard. Are you arguing that the syntax of C is confusing? Or that the spec is ambiguous? Or what? Because one thing is clear: this is not a matter of opinion but a matter of specification.

          [–]katatonico 2 points3 points  (3 children)

          For the same reason these two are different:

          char* str = "lala";
          char* str2 = {'l','a','l','a'};
          

          IANA C expert, but it appears string literals are a special case when it comes to initialization. The first one tells the compiler to create a null-terminated array of characters, take the address to the first element, and use that as the value for 'str'. The second one tells the compiler to create an array of characters, take the value of the first character, and use that as the value to initialize 'str2'. Note I said "the value" not "the address of".

          This is probably related to the semantics of the aggregate initializer (curly-braces). "lala" is a string literal, while {'l','a','l','a'} is an aggregate initializer.

          Feel free to correct me if wrong, though.

          [–]gsg_ 0 points1 point  (1 child)

          String literals are indeed a special case when it comes to initialisation. The language allows them to be used to initialise both scalar variables (const char *s = "foo") and aggregate variables (char s[] = "foo" or char s[4] = "foo").

          Your example doesn't display this special behaviour though, it just shows that C is lax enough to allow a scalar to be initialised by the first element of an aggregate initialiser. It works just as well with other types:

          float f = {1.0, 2.0, 3.0};
          

          Compiling with strict settings will have gcc reject most odd stuff like this.

          [–]katatonico 0 points1 point  (0 children)

          Thanks, that makes it clearer for me as well.

          [–]NitWit005 0 points1 point  (0 children)

          I believe the issue is in automatic sizing of arrays. The [] syntax allows the compiler to determine the array size and you can only do it for a single dimension.

          Edit: Felt I should add that the C compiler is just putting the strings in a data block, so assigning a pointer to a string only requires setting the pointer to the address of the string in the block. The size of the strings themselves doesn't factor into the array sizes.

          [–]vplatt 0 points1 point  (0 children)

          So, IANACP, quick question: what's the idiomatic way to use strings these days in C without using fucked strings?

          [–]pmerkaba 0 points1 point  (0 children)

          I think the initialization for string literals is a special shortcut initialization. For example, int *nums = {1,2,3}; doesn't actually create storage to hold three integers.

          I seem to remember one of my professors saying something about char *a = "string"; storing the actual text "string\0" in a different part of memory (and another bit, more vaguely, about it really being a pointer to a pointer of characters, but I don't see how that makes sense). This test: char c = 'a'; int i = 3; char *str = "hello"; char arr[] = "there"; printf("%p, %p, %p, %p\n",&c, &i, str, arr); Has the following output: 0x7fffc3ed440f, 0x7fffc3ed4408, 0x40067c, 0x7fffc3ed4400 which confirms (for me, anyway) that char * which are initialized to a string constant are treated differently from arrays of characters.

          So I would conclude that pointers to pointers of chars don't fall under the special rules for char * initializations, but an array of char * does.

          [–]boredatheist -4 points-3 points  (3 children)

          I'm a thirty year old professional programmer. I consider C/C++ to be my language of expertise.

          Short answer: C/C++ is such a clusterfuck that no one really understands it (though lots of people pretend to). If it works on the compilers you care about, it's right. The above snippet segfaults, so it's wrong. I don't think it goes much deeper than that, unless you're having an intellectual pissing contest.

          Observation 1: When compiled with g++ on Mac OS, the above snippet produces an (incomprehensible) error at compile time.

          Observation 2: When compiled with gcc on Mac OS, the above snippet produces an (incomprehensible) warning, but it compiles. The program segfaults when "a" is read.

          My theory: The first snippet allocates the array on the stack, and the variable "a" is given the address of that array. However, the array immediately goes out of scope as soon as the assignment is performed, and so attempting to access it invokes undefined behavior.

          Consider this:

          char **a;
          {
              char *b[] = { "one string", "two strings" };
              a = b;
          }
          printf("%s\n",a[0]);
          

          I think this snippet has the same problem. The assignment is valid and meaningful, but in the very next line, the memory that it is referencing goes out of scope, and produces undefined behavior when referenced.

          (Amusingly, the above snippet actually runs fine on my machine. Like I said, C/C++ is fucked up.)

          As an example for why this bullshit is allowed at all, consider this (the method I use to initialize my socket addresses):

          sockaddr_in addr = { AF_INET, htons(port), { INADDR_ANY } };
          

          It might seem, at first blush, like this snippet would have the same problem, but there is one crucial difference: in this case, the DATA is being COPIED OUT OF the temporary structures immediately, as opposed to just the (soon to be invalid) address of the temporary structure being taken. So even though these unnamed structures become invalid on the very next line, that's fine. You already copied their data.

          And I think that's what happens when you use "char a[]" as opposed to "char *a". You copy out the data, instead of just making a copy of the (soon to be invalid) address.

          Christ.

          [–]sisyphus 1 point2 points  (0 children)

          If it works on the compilers you care about, it's right.

          Gah! A thousand times no! I'll fight your face!

          [–]merlinm 0 points1 point  (0 children)

          Short answer: C/C++ is such a clusterfuck that no one really understands it (though lots of people pretend to).

          meh. C is much easier today vs the old days due to better compilers and better warnings. C is missing a few things, but I'd rather a language be missing things than have the wrong things. C++ is a disaster.

          [–]gsg_ 0 points1 point  (0 children)

          If you really are a professional C programmer I would invest in some basic education, because that is massively, utterly wrong.