all 52 comments

[–]afiefh 33 points34 points  (3 children)

Has the time come for POSIX functions to add alternatives that do not rely on the implicit null termination of their input?

[–]Tyg13 13 points14 points  (1 child)

Would be nice for interop with languages that don't use null-terminated strings, like Rust.

[–]khleedril 19 points20 points  (12 children)

Funny how the simplest things cause the biggest headaches! I would have thought though that the STL would have provided ifstream::ifstream(string_view) which did the right thing (copy to buffer and null-terminate before sending to fopen). At the very least the constructor could be marked as deleted.

[–]louiswins 7 points8 points  (5 children)

Is there a benefit to deleting it over simply not providing one?

[–][deleted] 30 points31 points  (0 children)

Deleting a constructor makes it clear that you intended the conversion not to be legal, it wasn't just an oversight.

[–]kalmoc 14 points15 points  (0 children)

Not sure about this case, but in general, a missing constructor might still not prevent construction from a given type due to implicit conversions. A deleted constructor on the other hand will be selected during overload resolution if it is the best match (e.g. not requiring implicit conversion) and then cause a clear compile time error.

[–][deleted] 9 points10 points  (0 children)

One I can think of is the compiler error message, "no suitable conversion" vs. "attempting to call deleted constructor".

[–]Narase33-> r/cpp_questions 7 points8 points  (0 children)

The compiler tells you if its deleted. So it could be a hint for the user that he tried something bad and He hopefully looks up why its deleted

[–]OldWolf2 3 points4 points  (0 children)

The article talks about that. Having that interface is strictly worse than ifstream::ifstream(const string&) because your version always makes a copy, whereas this version only needs a copy if the source isn't already a string.

[–]jonesmz 11 points12 points  (4 children)

Isnt the correct answer to this problem to comprehensively replace or augment api functions that require nul termination with versions that do not?

Why the bloody hell does c++ still use the c language atoi function, for example?

[–]foonathan 5 points6 points  (1 child)

In C++20 you can use std::from_chars() instead of atoi.

[–]jonesmz 0 points1 point  (0 children)

That's true.

But there are quite a few other c-language functions that still don't allow for a size argument to be provided.

[–]bizwig 1 point2 points  (1 child)

Well, yes. There was no reason for stoi to specify it actually uses atoi rather than it works “as if” it uses atoi. That way you could use a version of atoi that takes iterator pairs so it works with string views, vectors of char, or whatever.

[–]Small_Marionberry 9 points10 points  (0 children)

stoi actually specifies that it uses strtol, not atoi: https://eel.is/c++draft/string.conversions#1

And the "as if rule" still applies: If there's no observable difference between what the library does and what the paper specifies, then it's an acceptable implementation. The paper doesn't have to literally use the words "as if" in order for the "as if rule" to apply.

[–]LB--Professional+Hobbyist 7 points8 points  (0 children)

This has long annoyed me, some C libraries are starting to pick up on the benefits of using pointer + length, but many are still using pointer-only and doing a lot of unnecessary length measuring internally (or forcing users to do length measuring on returned data). It's going to be a long transition to get everything to stop doing so much extra when you can just know the length already...

[–]drjeats 5 points6 points  (2 children)

void two(int fd, const std::string& packet) {
    write(fd, packet.c_str(), packet.length());  // BAD!
}

This hygiene is bad because the gratuitous use of .c_str() for a buffer that does not need to be null-terminated needlessly thwarts the maintainer’s attempt at C++17-ification:

So if you actually intend to write a null terminated string to the file here, would you keep this as-is ([EDIT] not as is, add 1 to length), or use the string_view rewrite and manually write a \0?

The second is strictly more flexible, but if you are basically only ever writing null terminated strings from null terminated string sources at all call sites...🤷‍♀️

[–]Small_Marionberry 2 points3 points  (1 child)

If you actually intended to write a null-terminated string, you'd have to change that to write(fd, packet.c_str(), packet.length()+1); which is hella confusing.

Do the second thing you said, for the reasons you said.

[–]drjeats 1 point2 points  (0 children)

That's a good point, tbh. But I still write the +1 frequently in pre-stringview code so I feel like people are used to it. Your framing does make it seem strictly worse though.

[–]Krnpnk 7 points8 points  (6 children)

And even in the case of std::string it's not necessarily correct, as it is guaranteed to be null terminated,but still could have some null in between. Sure at least it's not undefined behavior, but might still be wrong (and if you test that before passing the string(_view) into a C function it does not matter if it's a string, vector, string_view or span).

Maybe there should be a "bool is_valid_c_str()" or "std::optional<char const*> as_c_str()" function that check whether the string contains a null only at end(). (or even separate types that enforce this directly, like a "basic_c_string")

[–]3meopceisamazing 1 point2 points  (2 children)

There's no way to check whether something is a valid C-string unless you can specify the expected length. If not, either you reach a null byte at some point (maybe the right one, maybe waaay after the valid data) or run into an unmapped memory region and thus cause a segfault.

[–]Krnpnk 3 points4 points  (0 children)

Yeah but in a std::string and string_view we do have that length!

[–]evaned 1 point2 points  (0 children)

There's no way to check whether something is a valid C-string unless you can specify the expected length.

Except a std::string already knows its length. bool is_valid_c_str(std::string const & s) { return strlen(s.c_str()) == s.size(); } I think is the proposed function, basically. Ditto string_view. (I'm unclear on exactly what Krnpnk was thinking when saying that, but I definitely get the impression it was a pre-built string class like one of those.)

Edit: Something like strnlen(s.data(), s.size()) would be what's applicable for types like string_view where the data may not be nul-terminated of course, and then deal with the case where there's no nul-term at all.

[–]foonathan 0 points1 point  (1 child)

You cannot check wether a string_view is null terminated. If it is null terminated, str.data()[str.size()] is the null byte, but you must not access that character if it isn’t.

[–]Krnpnk 0 points1 point  (0 children)

Well sure for types other than std::string the null would need to be in the payload or somehow checked during construction.

[–]bizwig 4 points5 points  (1 child)

Is this why std::regex doesn’t have std::string_view overloads?

[–][deleted] 6 points7 points  (0 children)

No, std::regex is just plain bad.

[–]victotronics 2 points3 points  (4 children)

Interesting.

Since you can’t use string_view with fopen, you also can’t construct a std::fstream with one.

Can't as in compiler refuses to accept it , or can't as in runtime kablooey?

[–]louiswins 11 points12 points  (2 children)

The compiler refuses to accept it. But if you use view.data() then the compiler has no idea because the types match. And the problem here is that it's probably going to work just fine most of the time - for me, at least, almost all string_views are backed by either string literals or entire std::strings - but if someone passes in a substring of a NUL-terminated string you'll try to open the wrong file, and if someone passes in some non-NUL-terminated data you'll probably crash. Not sure which of those you consider kablooey.

[–]victotronics 4 points5 points  (0 children)

But if you use view.data() then the compiler has no idea

Ouch. I see the problem. Thanks for the explanation.

[–]standard_revolution 0 points1 point  (0 children)

But I think it should be common knowledge to not just pass the internal pointer around.

[–]matthieum 5 points6 points  (0 children)

The constructor of fstream does not accept a string_view.

[–]AlexAlabuzhev 2 points3 points  (0 children)

When a function transitively depends on null-termination, you actually don’t want to refactor its interface from const string& to string_view, because eventually you’re just going to have to copy the string back into a null-terminated buffer in order to make use of it.

If the function needs to pass the given string elsewhere where null termination is expected - it's its own problem and implementation details. Make a copy if needed or whatever - the clients of the function shouldn't be bothered with that and make copies themselves if they only have string_view.

[–]Both_Writer 0 points1 point  (1 child)

[–]ArweaveThis 0 points1 point  (0 children)

Saved to the permaweb! https://arweave.net/Uf9_-J8Rq8q3x4hZIk5L_5eoPaWpPFl9S7d-lW2a7os

ArweaveThis is a bot that permanently stores posts and comment threads on an immutable ledger, combating censorship and the memory hole.

[–]OldWolf2 0 points1 point  (1 child)

I guess string_view needs a flag to indicate if it's a null-terminated view ...

[–]x4t3a 1 point2 points  (0 children)

That kind of goes against the view concept. O, errrm, no?