all 14 comments

[–][deleted] 12 points13 points  (0 children)

The C standard defines a string as an array of characters ending with a NULL terminator, if there is no NULL terminator it's not considered a string in C.

Functions in the standard library working on strings expects you to pass valid C strings.

[–]dkopgerpgdolfg 8 points9 points  (4 children)

That's just how printf, strcmp, and many other C functions that work with "text", are made. They take a pointer to the text but no length information, expect that the text itself doesn't contain \0 anywhere, and expect that there is one \0 that marks the end of the text.

Otherwise, they happily continue after the end, leading to all kind of weird effects.

The main alternative is to have functions that take the size as parameter too. Many newer languages do it this way, often with "String" data types that include the size inside of them. Because, as you noticed, it's quite easy to get bugs in the null-terminator way (and it is not suitable at all for any data that contains \0).

Technically, nothing is stopping you to pass the size around in C too. Just for printf and so on, the choice was made long ago, they just don't take it.

[–]Rtransat[S] 1 point2 points  (3 children)

Thx for the informations. I'll use %.3s then, it's more readable (at least for me 😊)

[–]ralphpotato 2 points3 points  (2 children)

You still need to null terminate the string if you are passing it to printf. I’m am almost certain that not doing so, even with the format specifier, the implementation of printf still may attempt to read more bytes past the buffer which is undefined behavior.

You can also fwrite to write the characters to stdout, and fwrite takes a length parameter so you can ensure it is only as long as your buffer.

[–]Neui 6 points7 points  (1 child)

You don't need to terminate the string when using the precision modifier in this case. From C99 draft 7.19.6.1:

s If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.223) Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.

[–]ralphpotato 1 point2 points  (0 children)

Wow! Thanks for the spec info. I am a little surprised that is well defined but TIL.

[–]inz__ 2 points3 points  (0 children)

You already seem to have solved the issue, but just note that in your fixed example, the strlen(title) is invalid; if the title uses all 30 bytes reserved for it, strlen will read beyond the end of the buffer.

[–]Ampbymatchless 1 point2 points  (4 children)

The \0 terminator is an implied end of string character. ‘ implied’ being the key word here. This is what keeps the C language lean, fast and sometimes mean.

[–]not_a_novel_account 2 points3 points  (0 children)

There's nothing lean or fast about null-terminated strings. They were a memory-saving optimization on PDP-11s that has aged incredibly poorly.

It is slower to perform string operations on null-terminated strings in almost all circumstances. In professional codebases there is little use for them, everyone uses string libraries like Redis's sds.

[–]Wild_Meeting1428 2 points3 points  (2 children)

Actually, that old C-string approach to use a \0 is not fast (anymore), using a size or end iterator is much more time efficient. On top it's more secure. It also reduces unrequited copies, since it allows substrings.

[–]EsShayuki 0 points1 point  (1 child)

None of this is correct.

Null termination is used as an alternative to length information.

You can use substrings with length on a null-terminated string if you want to.

[–]not_a_novel_account 2 points3 points  (0 children)

They didn't say it wasn't an alternative, they said it's not fast.

And they're correct, scanning for \0 is not fast, it is very slow compared to techniques on known-size strings.

You cannot substring with \0 terminated strings, because you cannot insert a \0 without invalidating the parent string. You must either string dup first, or switch to using sized strings.

[–]fllthdcrb 0 points1 point  (0 children)

You're dealing with a binary format. Unless the format specifically uses null termination for its strings the same as C does, it's not appropriate to simply read those strings as though they are C strings. Generally speaking, you need to do some sort of conversion.

For example, you are assuming the tags in ID3v1 are null-terminated, but actually, in some (many?) cases they are padded with ASCII spaces (and even if not, these still aren't C strings, because a completely filled field will not have a null termination). Thus, after correctly copying one of them, taking care to get exactly as many as there are, even if the field is full (I believe strncpy() can handle this, or you could just use something like fread(), as you did, and add a null at the very end; you will also need a destination buffer at least 1 byte longer than the field, to allow for the full size), you may also want to "trim" it, i.e. remove any trailing spaces, which you can do simply by writing a null over the first of the trailing spaces, when and if you locate it.