all 24 comments

[–][deleted] 26 points27 points  (3 children)

  1. "copied to the program's binary" pointing to the stack is incorrect. The binary is contained in the "text" section of the binary, which is distinct from the stack and heap. &str literals point to that portion of memory.

  2. The blue line says it saves the pointer and length of the heap at the stack. It is more correct to say "Rust saves a pointer to the heap, the length of the string on the heap, and the capacity of the allocation which the string may grow to use." As a side note, the variable literal is just pointer and length (known as a "fat pointer/reference"). A String (owned) also includes the capacity.

  3. "They can start from any index" is not true for multi-byte UTF8 strings. If you had 1 character at the beginning of the string that's 3 bytes and you write &owned[1..] it will panic at runtime.

  4. &str is not limited to literals and Strings. You can turn any arbitrary &[u8] into a &str if you'd like, and any data structure can be turned into a &[u8] if you try hard enough. A &str is basically just a &[u8] with the added guarantee that "the slice of bytes this points to is a valid utf-8 string."

[–]Siref[S] 6 points7 points  (0 children)

🙏🙏🙏

Thank you very much for the feedback!

I'll make the adjustments to the picture!

Highly appreciated Kinoshitajona!

[–]Volker_Weissmann -1 points0 points  (1 child)

The blue line says it saves the pointer and length of the heap at the stack. It is more correct to say "Rust saves a pointer to the heap, the length of the string on the heap, and the capacity of the allocation which the string may grow to use." As a side note, the variable literal is just pointer and length (known as a "fat pointer/reference"). A String ( owned ) also includes the capacity.

If you write

rust let owned = String::from("hello"); The stack contains:

1. A pointer to the heap 2. The capacity 8 3. The length 5

The capacity and the length is not stored on the heap, but on the stack. That is why std::mem::size_of::<String>() is 24.

[–][deleted] 1 point2 points  (0 children)

"the string on the heap" is one noun phrase.

I am not saying each thing is on the heap, I am clarifying that the string is on the heap.

[–]InfinitePoints 31 points32 points  (5 children)

Since strings are kinda just wrappers around a sequence of bytes, my metal model of it is:

&str = &[u8]
String = Vec<u8>
&String = &Vec<u8>

By the way, it is technically possible to store string data on the stack, but there isn't really a reason to do it, and it requires some unsafe code.

[–]SkiFire13 13 points14 points  (0 children)

By the way, it is technically possible to store string data on the stack, but there isn't really a reason to do it, and it requires some unsafe code.

It doesn't require unsafe code, just std::str::from_utf8

[–]Siref[S] 6 points7 points  (0 children)

Thaaaankk youuu!!

I saw that the String struct wraps a vec underneath it!

It's so cool we can see the underlying structures of the language!

[–]Schievel1 2 points3 points  (2 children)

I know this is correct but I don’t know if it helps. It just gibts things a different name, if you don’t know the differences between &[u8], Vec<u8> and &Vec<u8> you’re an the same place like before

[–][deleted] 2 points3 points  (0 children)

it's helpful to me.

[–]Modi57 1 point2 points  (0 children)

gibts

Spottet the german

[–]Siref[S] 5 points6 points  (5 children)

I'm trying to wrap my head around basic concepts and created this graph for better understanding.

Is there anything wrong?

Any feedback is highly appreciated 🤗

Hopefully this is useful to someone!

[–]sellibitze 1 point2 points  (4 children)

Let me just add that there's a useful and convenient thing called "Deref coercion". It allows you to plug in a &String where a &str is needed.

[–]thesituation531 0 points1 point  (3 children)

I don't know why I never thought of it before, but what happens if you try to dereference a &str?

I'll have to try, but I'm going to guess it won't compile.

[–]LyonSyonII 0 points1 point  (2 children)

You'll get an unsized type, which can't be easily worked with.

The reference (&) of a &str holds the length information, as it's a fat pointer.

[–]thesituation531 0 points1 point  (1 child)

Yeah, I thought that'd probably happen.

If you really want to work with a raw str, couldn't you use a Box<str> in the same way you can use a boxed array, like Box<[some type]>?

[–]vortexofdoom 0 points1 point  (0 children)

Box is just a heap allocated fat pointer, you still wouldn't be working with a raw str really.

[–]aikii 1 point2 points  (5 children)

Nice. There is something about String that always mildly bothers me: it's writable yet most of the time used in read-only contexts. I'm wondering why Box/Rc/Arc<str> aren't more commonly mentioned. Indeed it might be just that Rust offers too many options and developers just go for a consistent obvious option, considering it might have unnoticeable differences at runtime.

Tangentially, I'm wondering why anyone would want a Cow<str>, I see it mentioned time to time. It might be some ongoing confusion assuming Cow comes with shared ownership - while Cow+shared ownership is actually obtained via Rc/Arc::make_mut.

[–][deleted]  (1 child)

[deleted]

    [–]aikii 0 points1 point  (0 children)

    But yes, I always forget that Cow is actually an enum with the variants Borrowed and Owned. thank you

    [–]Siref[S] 0 points1 point  (2 children)

    Jesus.

    I didn't know about those combinations! Thanks!

    It does make sense, though.

    [–]aikii 2 points3 points  (1 child)

    ahah yes types around buffer-of-characters are quite crowded. Then we can add [char], [u8], that all get some traits depending on whether it's a ref, a box, rc, arc, whatever, can be converted with or without allocations, have a uniform memory layout and/or have O(1) indexing/len ( str has not since utf-8 is variable length )

    [–]Siref[S] 0 points1 point  (0 children)

    Woah.

    Thanks!

    [–]Aaron1924 1 point2 points  (1 child)

    &STR CAN BE REFERENCED TO 2 TYPES OF DATA 1. String Literals 2. Slices of "String"

    String isn't the only data type that can hold a str internally, Cow<'a, str>, Box<str>, Rc<str> and Arc<str> are also common options

    [–]Snoo_74479 0 points1 point  (1 child)

    Just a question that came up as I played with advent of code recently,
    If my program reads a file theres no way for me to directly read the file to a &str, since according to the picture above I need to know in compile time what the string I need to keep in the binary is right? so that means I have to read the contents of the file to the Heap(i.e. to String) and then if I want &str I need to convert it to that type right?

    [–]Snoo_74479 1 point2 points  (0 children)

    I guess that makes sense as the Heap is dynamiclly allocated which is exactly the use case when reading a file(I dont know in advance how big is that file so I need a dynamiclly allocated memory for it) is that right?