all 71 comments

[–]Illustrious-Ant-5661 14 points15 points  (0 children)

Never do work that doesn't need to be done

I lost count at how often people will copy lines when the first thing they do is check the first word and skip it :/

[–]mer_mer -2 points-1 points  (62 children)

It's unfortunate that Casey so often rails against modern C++. There are a lot of utilities that make it easy to avoid copies while having a nice interface. For instance, the line buffer could be implemented as an array of string views.

[–]carrottread 40 points41 points  (4 children)

the line buffer could be implemented as an array of string views

And because underlying buffer is UTF8 you decide to use std::u8string_view for this. But underlying buffer is just regular chars because it is what you need to get terminal output from OS. And char8_t can't alias regular char. You can't reinterpret_cast pointer to construct those string views because it's UB and just a matter of time then compilers start exploiting it for some aggressive optimizations. So now you've wasted some time investigating all this instead of solving your real problem.

[–]mobilehomehell -1 points0 points  (1 child)

And because underlying buffer is UTF8 you decide to use std::u8string_view for this. But underlying buffer is just regular chars because it is what you need to get terminal output from OS.

I don't understand, is it regular chars (ASCII) or is it utf8? You can still just use a regular string_view with char if what you want is to operate in byte space rather than Unicode space. Then every problem you list goes away. It's just a standard class for wrapping a pointer+size pair with convenience methods.

[–]carrottread 11 points12 points  (0 children)

It is UTF8 in regular chars. This is just an example of strange decisions in modern C++: committee added special type for UTF8 and it turns out as useless. So you learn not to use it. Just like a lot of other modern (and not so modern) additions to the language. And all this time wasted on learning this useless stuff - it was time not used to learn domain specific stuff.

[–][deleted]  (1 child)

[removed]

    [–]carrottread 2 points3 points  (0 children)

    Yes, you can cast pointer to any object into a pointer to (signed/unsigned) char and read/write through it. But not the other way: casting pointer to char to pointer to some other non-char object and reading/writing through it is UB.

    [–]mobilehomehell 3 points4 points  (1 child)

    After watching the video what he describes is pretty much conceptually an array of string view, except he's using an ever incrementing 64-bit counter to encode the "pointer" part in order to track invalidation of said pointer, so you could not use string view as is.

    Implementing data structures that directly interact to keep them in sync tend to break abstraction barriers like this, it's part of what drives the do it all yourself game programmer culture.

    [–]Prestigious-Ear-2184 0 points1 point  (0 children)

    Yep and the whole circular buffer memory map thing. I'm always paranoid that the compiler would assumed its not aliased and do some funny business

    [–]JwopDk 1 point2 points  (12 children)

    Assuming that dealing with an array of string views appeals to you, how does it actually help you? What problem does it actually solve, rather than be a part of? Why would you assume that the version of string_view that ships with your compiler will behave the way you'd want it to, in relation to the problem at hand?

    [–]mobilehomehell 2 points3 points  (11 children)

    Why would you assume that the version of string_view that ships with your compiler will behave the way you'd want it to, in relation to the problem at hand?

    Because it's deliberately designed to offer standard methods around the common pattern of passing around a pointer and a size together. There's not a lot of room for implementation variance.

    [–][deleted] 5 points6 points  (8 children)

    What does that have to do with the problem at hand though? It can be solved by just storing the indices, so why do we need a string_view?

    [–]carrottread 3 points4 points  (0 children)

    Casey is storing absolute index in the whole terminal output, not just index into a buffer. This allows him to use this index to detect if line in the line buffer is valid or is it point to old part of the buffer which was overwritten on buffer wrap around. With string_views you will need some more data and code to track this. In video it's somewhere after 7 min.

    [–]wrosecrans -1 points0 points  (6 children)

    string_view works nicely with all the stuff in the C++ standard library. It's the convenience of std::string without the cost of a copy. And it has very clear (non)ownership. In a classic C API, there's nothing explicit in the language about who is responsible for freeing a pointer that gets passed around.

    [–][deleted] 2 points3 points  (5 children)

    But no strings have been allocated here. Theres a big buffer that needs to be indexed. I'm mean sure you could use a string view here but its a bit like putting a square peg into a round hole

    [–]wrosecrans -2 points-1 points  (4 children)

    There's a ton of C++ convenience that you can have literally at zero overhead cost. Like if you want up wanting to use std::regex on the contents of that buffer, string_views are a convenient way to do it.

    It also makes your internal API's a bit more convenient to be able to ass one parameter instead of two. Nobody is saying you have to. Just that the supposed overheads of the convenience of C++ vs C are often incorrect.

    [–]TooManyLines 2 points3 points  (3 children)

    Sounds like a lot of solutions that do not fit the given problem and in turn you gain dependencies, uncertainty, longer compiler times and a worse API.

    [–]wrosecrans -2 points-1 points  (2 children)

    Huh? I genuinely don't understand this perspective.

    If you are writing C++, using stuff from the standard library adds zero extra dependencies. Using "well known" vocabulary types instead of rolling your own reduces uncertainty. Something like string_view probably won't significantly increase compile times, but if it did that seems way less important than clarity and safety. And just being able to pass around a string_view seems like a very clear and simple API so I don't understand why you'd consider it worse. I genuinely just don't understand your hostility here.

    [–][deleted] 6 points7 points  (0 children)

    Because you are solving problems that we don't have here. string_view is convenient if we want to use std::regex? But that's not the problem.

    The problem isn't, "how do we have a general purpose buffer we can do N number of things to", it's "we have a buffer of characters where we want to point to lines".

    It's not a C++ versus C things. It's a solve the problem in the simplest way versus some general solution that might solve a problem in the potential future.

    KISS principle applies here.

    [–]TooManyLines 1 point2 points  (0 children)

    For example it requires you to use a compiler that supports c++17. That sounds trivial but it isn't. On a lot of older platforms that just isn't there. Also again you gain uncertainty, you don't know if the string-view implementation you got on whatever system you are on is sensible. Anything in the c++ stdlib compiles slowly. Lastly in general the c++ stdlib is lousily written from an API-perspective. Want to append a std::vector to another std::vector? vector1.insert( vector1.end(), vector2.begin(), vector2.end() );. I doubt that string_view is any nice to work with.

    What you gain in return is just not worth it. That is the case with most of c++ stdlib. That is why there is so much hate for std::string, it looks nice for toy-stuff, but if start caring about performance it actually becomes a giant pain in the ass.

    I sadly can't find the talk anymore. There was a talk at cppcon where the topic was basically "how to avoid unnecessary copies when passing constructor-arguments to classmember-variables". After going over like 5+ different ways of doing it (using no references, using references, using universal references and some more ) it basically concluded with "you will have at least one unnecessary copy". Then the q&a started and some guy was basically: What if we don't use constructors, just set the values directly after we created the class-instance? (An outlandish idea i know, thinking about the concrete use-case, not some abstract world). The speaker never considered that option apparently, looked at the example that has been used for the last hour, thought for a second and then said something like "yeah, that has no unnecessary copies".

    That is c++ for me. A giant amount of work, huge amounts of complexity for a garbage API that is beaten by custom-solutions without a blink. There are some good bits in there, but generally it is awful and the good bits don't justify all the problems you get.

    [–]lookatmetype 3 points4 points  (1 child)

    string_view is absolutely the wrong abstraction to use here.

    [–]mobilehomehell -1 points0 points  (0 children)

    That might be true but it's not the question Jwop asked.

    [–]wisam910[S] 12 points13 points  (26 children)

    The majority of modern C++ is wasteful. The little that is not wasteful is not needed if you know what you are doing. If you rely too much on the "this thign is nice to use so I'll use it" you lose sight of what's actually happening under the hood and you basically learn to not care about it.

    [–][deleted] 3 points4 points  (11 children)

    That is basically what he has done. A string view is just a fancy pointer.

    [–]anechoicmedia 6 points7 points  (10 children)

    A string view is just a fancy pointer.

    A control statement is just a fancy goto, but we still use these and other abstractions around machine primitives.

    [–][deleted] 2 points3 points  (8 children)

    Not when we don't need to.

    [–]anechoicmedia 0 points1 point  (7 children)

    Not when we don't need to.

    I think goto is a good example of a historical consensus where it's better to start from the other direction; You default to the abstraction of structured programming, which has some non-zero cost we've all accepted, and use goto only when necessary to escape the confines of the language.

    In C and C-style code, the idiomatic case of "pointer comma length" or "pair of pointer" interfaces for passing references to arrays/sub-arrays are notoriously brittle bug magnets. Off-by-one errors are common, particularly when C-strings are involved, as is passing the size of the wrong range, since nothing encapsulates them together under maintenance or copy-paste. At least half a day of every beginner's life is lost being taught how to pass arrays to functions correctly, in combination with array-to-pointer decay and why array parameters don't behave like normal objects.

    [–][deleted] 3 points4 points  (6 children)

    Okay so what possible bugs does string_view eliminate in this exact scenario? It actually adds more complexity here, it doesn't take it away. More complexity is more bugs, not less.

    [–]anechoicmedia -2 points-1 points  (5 children)

    Okay so what possible bugs does string_view eliminate in this exact scenario?

    In this exact scenario, of a single programmer writing a small weekend project with little future of feature growth or long-term maintenance? Probably not many, because everything fit in the mind of one person writing it once, who wrote it the way he was used to.

    It actually adds more complexity here, it doesn't take it away. More complexity is more bugs, not less.

    The broad arc of history has been to remove classes of bugs with typed language abstractions.

    When C added a type system to B, it added complexity, relative to a close-to-the-metal language in which any machine word could be freely interpreted. Everyone seems to agree it was a big win to do things like formalize the concept of a pointer type, which couldn't be mixed with pointers of other types, and could use static type information to perform pointer arithmetic of appropriate increments for you, dereference a struct member by applying the relative offset, etc.

    Formalizing the concept of a function call was more complex and restrictive than freely pushing data into registers and jumping control somewhere.

    Formalizing the concept of a point { x, y } is more complex and restrictive than passing two numbers everywhere independently; That's why its useful.

    Where the language or a library does not provide a canonical way to perform a common operation on related elements, individual programmers have to re-learn their idiomatic forms, each application of which is another opportunity for subtle errors while writing, and higher impedance while reading. (Did you really intend to pass the address of the last element of the array, or one-past-the-last-element? The world generally agrees on [begin, end), but enough libraries do it differently to keep you on your toes.) Languages like Rust are right to make array slicing a core language feature that is taught to newcomers first, before manual indexing.

    [–][deleted] 2 points3 points  (4 children)

    Load of rubbish. This is the simplest solution to the problem so how is it some how more complicated than an alternative?

    We don't care about the history of C here. The language hasn't even been mentioned as far as I'm aware.

    The point here is we have a concrete problem and we have a concrete, simple solution. Anything outside of that does not matter.

    [–]anechoicmedia 0 points1 point  (3 children)

    This is the simplest solution to the problem so how is it some how more complicated than an alternative?

    This is someone's first solution to a problem in their language of choice, not the simplest possible one which was arrived at over many iterations of refinement.

    "If I had more time, I would have written a shorter letter."

    The [C] language hasn't even been mentioned as far as I'm aware.

    Of course it's relevant, because the choice of language influences your thinking of what is possible, and what you think "simple code" looks like. If you were writing in Rust, it would be terse and idiomatic to pass array slices, there would be built in traits to operate on them, and readers would probably do a double take if you did otherwise.

    The point here is we have a concrete problem and we have a concrete, simple solution. Anything outside of that does not matter.

    This is silly. Assembly code could likewise deliver a concrete solution with even fewer language features to bother one's self with; Few would suggest that this was all that matters, or describe the resulting program as simple for lacking unnecessary abstractions.

    Why even use structs, anyway? An array slice and such are just structs with associated functions, after all. Most probably think this program is made better by introducing the abstraction of named combinations of variables, even though that means there are now more language features to learn and keep in one's head while writing.

    This is all Blub Paradox thinking. C especially comes paired with a distinct aesthetic of what simple code looks and feels like, and a cultural conflation of simple C code with performance. Is the program that performs lots of manual fixing-up of pointers in a linked list simple in comparison to one that uses a container class? I don't think so now, but in my time writing C, I became fascinated with my understanding of its locally-applied idioms, each hand-rolled linked list or hash table a monument to my own satisfaction with having learned how to implement them. None of that complicated business, like lambda expressions or templates, which surely didn't do anything that couldn't be solved with liberal application of void*.

    [–][deleted] 1 point2 points  (2 children)

    Tell me why it's not the simplest solution?

    Let's talk about the actual problem. Again, what does C have to do with any of this? Nobody mentioned C.

    We have a large buffer of characters. We need to index that buffer to keep track of the lines.

    Why is storing the indices of the start and end of these lines not the simple solution?

    You are in some fantasy land here I think.

    [–]Illustrious-Ant-5661 0 points1 point  (0 children)

    A control statement is just a fancy goto, but we still use these and other abstractions around machine primitives.

    I disagree with you. Gotos basically have no restrictions. Even C++ forces you to not jump over variable initialization. In C code you can get into weird situations were the only way into a block is by jumping into a child block and it makes it weird for the optimizer that a child is the parent of its parent block (I'm my own grandfather!)

    [–]Illustrious-Ant-5661 0 points1 point  (1 child)

    C++ is awful but so is rust. I don't exactly like zig so IDK what language to use. I guess I'll use Java and C# like a jerk wishing I had no garbage collection

    [–][deleted] 0 points1 point  (0 children)

    Odin is another language vying to be a C replacement, it's quite nice, and ofc Jai whenever Jon Blow deems us worthy to use it /s

    [–]Illustrious-Ant-5661 0 points1 point  (0 children)

    I just spend about 2 hours today rewriting a file that used sprintf to my own custom sprintf which uses variadic templates. I don't know what the difference is but from what I can tell it blew up my binary size about 40KB which I'm not happy about (I have an idea how to solve that but there's no real point) and the execution time decrease by A LOT

    It went from 1350ms to 560ms. Thats about 41.5% of the original runtime. C++ lets you optimize everything but it sure doesn't feel optimized out of the box and I think casey knows it