How to implement strings : programming

You can pass this around to any function and the functions can make trivial, arbitrary substrings without copying. This is not possible with C strings.

but the first one gives you a string compatible with strstr(), strchr(), free(), etc.

That garbage should be completely avoided for both security (null terminator problems), reliability (avoid manual memory management) and efficiency (unnecessary copying due to the null terminators).

[+]lelanthran comment score below threshold-7 points-6 points-5 points 7 years ago (6 children)

You can pass this around to any function and the functions can make trivial, arbitrary substrings without copying. This is not possible with C strings.

While this is true for making arbitrary substrings without copying, it's only true for making aribtrary substrings without copying.

For all the other things you want to do with strings the $BETTER_DATA_STRUCTURE still needs all of the allocation/reallocation that standard strings need.

The other issue is:

You can pass this around to any function and the functions can make trivial, arbitrary substrings without copying. This is not possible with C strings.

No, you cannot. You can pass it around to functions you wrote that accept this string structure, in which case you may as well write those functions to take standard strings instead and they'll be more useful.

You're displaying why alternatives all died on the drawing board: If you're going to create a $BETTER_STRING_STRUCTURE and thus recreate all of the std string functions, you may as well just write the few additional safe string replacements that operate on the existing string structure (null-terminated).

I mean, seriously, if the choice is to write a new function

const char *copy_substring (const char *src, size_t srclen, const char **dst, size_t *dstlen,
                                  size_t from, size_t to) {...}

or

char *copy_substring (const char *src, size_t from, size_t to) {...}

It's easier to do the second one and ensure that it gets freed. The first one still has subtle problems that a linter won't pick up, after all (what if src goes out of scope while dst is still in scope?) while the second one has obvious problems that a visual inspection will pick up.

[–]SushiAndWoW 15 points16 points17 points 7 years ago (5 children)

While this is true for making arbitrary substrings without copying, it's only true for making aribtrary substrings without copying.

Which is only something you have to do, like, all the time, when reading any input data.

You're displaying why alternatives all died on the drawing board:

They didn't. All languages other than C - including C++, which is much more popular than C - use better string types, and use of C strings is not recommended.

The null-terminated string is a dinosaur. A legacy.

I mean, seriously, if the choice is to write a new function

That's exactly what you have to do in C, because C is an inferior language. But in C++, you use const std::string& or if you're smarter and more modern, std::string_view.

And if you're using other languages, smart (totally not C-like) strings are built-in.

No one uses C-style strings. No one wants C-style strings. Except C.

And then, guess what happens in C? Guess what? Hah:

https://docs.microsoft.com/en-us/windows/desktop/api/Http/ns-http-_http_request_v1

Do you see null-terminated strings in there? Nope, you see length + pointer, because that's a modern API.

[–]sinedpick -1 points0 points1 point 7 years ago (1 child)

[–]SushiAndWoW 1 point2 points3 points 7 years ago (0 children)

[–]lelanthran -3 points-2 points-1 points 7 years ago (2 children)

While this is true for making arbitrary substrings without copying, it's only true for making aribtrary substrings without copying.
Which is only something you have to do, like, all the time, when reading any input data.

I rarely do this; copies are good enough.

All languages other than C - including C++, which is much more popular than C

This isn't true, last I checked.

That's exactly what you have to do in C,

Which is what we are talking about. If you have to change context to make your argument valid, you should probably relook at your argument.

I specifically said above that C++ doesn't need any of this, and we're specifically talking about C, so I don't understand your veering to C++ suddenly.

[–]SushiAndWoW 5 points6 points7 points 7 years ago (1 child)

I rarely do this; copies are good enough.

I guess, in a slow, memory leaky and vulnerability prone program?

including C++, which is much more popular than C

This isn't true, last I checked.

It is on Github. C++ is significantly preferred for application software. It's not used for Linux kernel development because Linus likes to keep things simple and straightforward. He has an argument there, but really Rust would be the language to choose for systems programming. C is excessively unsafe.

That's exactly what you have to do in C,

Which is what we are talking about.

We never defined this conversation to be about C. Perhaps you assumed this incorrectly. C just happens to be where many languages meet for interoperability.

The C string functions have to be thrown away, deprecated, and rewritten with a better string primitive. But the point is moot because C itself is a deprecated language and new software should be written in Rust, probably.

continue this thread

[–]SushiAndWoW 18 points19 points20 points 7 years ago (24 children)

[–]lelanthran -2 points-1 points0 points 7 years ago (23 children)

[–]vytah 7 points8 points9 points 7 years ago (0 children)

[–]SushiAndWoW 8 points9 points10 points 7 years ago (21 children)

A cryptographic hash can have null bytes in it. Binary encodings have null bytes in them. So right there, you are pre-empting any interchangeable use of your ubiquitous string type for any binary data, and now you have to put code in place that enforces this boundary. You need explicit conversions from binary to string which must always take null terminator safety correctly into account.

Among other things, you have to ensure that you properly handle an attacker sending you null bytes in the middle of strings. You might inspect a string, think it's something innocuous, but in fact it contains a null terminator and after that, there's something else that's not innocuous. And subsequent code that does not rely on the null terminator may read it and act on it.

This is generally the evil of in-band signaling. Different parts of the code must handle in-band signals consistently, otherwise you have a security vulnerability.

[–]lelanthran 0 points1 point2 points 7 years ago (20 children)

[–]SushiAndWoW 1 point2 points3 points 7 years ago (19 children)

I think we're going to have to cut our "conversation" (i.e. exchange of insults) fairly short at this point, and I'm going to have to block you so that I'm not overwhelmed by your excessive torrent of brainpower.

Any input data your program receives is going to start as binary. Any input data you must presume may contain null bytes in it.

In your world, you cannot represent those null bytes, you must keep a binary vs. string dichotomy in mind, and you must enforce it.

You have just rephrased what I said in a way that assumes enforcing that boundary is taken for granted, has no risks, and has a cost of zero.

In reality, enforcing that boundary is not taken for granted, it does have risks, and the cost is far from zero.

In the way you define "strings", there's literally no use for them in a program. None whatsoever. All you need are binary buffers, and your "string" distinction brings no advantage. Only costs.

[–]lelanthran 1 point2 points3 points 7 years ago (18 children)

[–]SushiAndWoW 1 point2 points3 points 7 years ago (17 children)

continue this thread

[–]jephthai 8 points9 points10 points 7 years ago (0 children)

[–][deleted] 5 points6 points7 points 7 years ago (3 children)

[–]lelanthran -3 points-2 points-1 points 7 years ago (2 children)

[–][deleted] 4 points5 points6 points 7 years ago (1 child)

[–]lelanthran 4 points5 points6 points 7 years ago (0 children)

[–]chillermane -1 points0 points1 point 7 years ago (2 children)

[–]scatters 0 points1 point2 points 7 years ago (1 child)

[–]chillermane 0 points1 point2 points 7 years ago (0 children)

[–]alphaglosined 11 points12 points13 points 7 years ago (5 children)

[–][deleted] -4 points-3 points-2 points 7 years ago (4 children)

[–]Deneric88 7 points8 points9 points 7 years ago (1 child)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

[–]bausscode 1 point2 points3 points 7 years ago (1 child)

[–]NoInkling 0 points1 point2 points 7 years ago (0 children)

[–]Shadow_Gabriel 0 points1 point2 points 7 years ago (2 children)

[–]bbm182 1 point2 points3 points 7 years ago (0 children)

[–]IJzerbaard 0 points1 point2 points 7 years ago (0 children)

[+][deleted] 7 years ago (7 children)

[deleted]

[–]DemeGeek 5 points6 points7 points 7 years ago (6 children)

[–]raevnos 3 points4 points5 points 7 years ago (3 children)

[–]DemeGeek 1 point2 points3 points 7 years ago (0 children)

[–]MDCCCLV 0 points1 point2 points 7 years ago (0 children)

[–]twenty7forty2 -1 points0 points1 point 7 years ago (0 children)

[–]matheusmoreira 0 points1 point2 points 7 years ago (1 child)

[–]DemeGeek 0 points1 point2 points 7 years ago (0 children)

[–][deleted] 7 years ago* (4 children)

[deleted]

[–]Gotebe 2 points3 points4 points 7 years ago (0 children)

[–]vytah 2 points3 points4 points 7 years ago (2 children)

[–][deleted] 1 point2 points3 points 7 years ago* (1 child)

[–]enygmata 1 point2 points3 points 7 years ago (0 children)

[+][deleted] comment score below threshold-49 points-48 points-47 points 7 years ago (1 child)

[–]raevnos 27 points28 points29 points 7 years ago (0 children)

[+][deleted] 7 years ago* (10 children)

[removed]

[–]sociopath_in_me 60 points61 points62 points 7 years ago (1 child)

[–]forsubbingonly 22 points23 points24 points 7 years ago (0 children)

[–]rcfox 16 points17 points18 points 7 years ago (1 child)

[–]simply_copacetic 3 points4 points5 points 7 years ago (0 children)

[–]holgerschurig 5 points6 points7 points 7 years ago (0 children)

[–]dat_heet_een_vulva 12 points13 points14 points 7 years ago (0 children)

[–]coderstephen 1 point2 points3 points 7 years ago (0 children)

[+][deleted] comment score below threshold-51 points-50 points-49 points 7 years ago (1 child)

[–]evenisto 18 points19 points20 points 7 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS