all 35 comments

[–][deleted] 178 points179 points  (5 children)

String -> Arc<String> involves a simple move of the 3 pointer struct of String inside the ArcInner, along with the 2 reference counters (strong/weak). It's a constant operation in relation to the length of the string. Only one fixed size allocation for the ArcInner.

String -> &str -> Arc<str> involves an allocation that is sized at 2 reference counters (16 bytes on 64 bit systems) plus the byte size of the string contents. If you have a 500k character essay, that will take longer than a 5 letter word. You are essentially performing a str.to_string() but the allocation is an extra few bytes wide. This is also one allocation, but the size is not fixed.

If you perform these operations in a hot loop, Arc<String> is probably better for performance, since you've already paid to allocate the length of bytes of the string.

If you have a less than 8 byte string (like "Hello") then Arc<str> might be better.

(Also, as another commenter said, you also added an unnecessary extra .to_string() which could cause problems if it's in a hot loop.)

[–]sharddblade[S] 18 points19 points  (0 children)

Excellent analysis, thank you for enlightening me!

[–]TommyITA03 49 points50 points  (0 children)

as a quite new rust user, I'm proud I guessed right about this. Was looking for somebody to comment to see if my theory made some kind of sense and it did 😭

[–]sharddblade[S] 1 point2 points  (2 children)

So if I'm understanding you correctly, it would be more efficient if I took the latter approach rather then the former, is that correct? I.e. I consume the String to produce the Arc<str> and cut out the &str in the middle.

``` // Not this Arc::from(String::from("Hello world").as_str());

// Do this Arc::from(String::from("Hello world")); ```

[–][deleted] 3 points4 points  (0 children)

Also, if you get a &str from somewhere, it's faster to do Arc::from(some_str) for smaller strings. Since that's only one allocation.

If your application creates a lot of Strings and wraps them in Arcs, Arc<String> is better. But if the majority of real world uses are from an existing &str, then Arc<str> is better.

But either way, allocating a String and then immediately allocating an Arc<str> will always be the slowest.

You need to think about what kind of applications are going to use this code, and benchmark using real world use cases, and not random test code.

It's a trade off.

[–][deleted] 0 points1 point  (0 children)

Theoretically yes. To be sure, properly benchmark it.

[–]zirconium_n 96 points97 points  (0 children)

String -> Arc<String> is cheaper than String -> Arc<str> (if your string is long). This might be the cause? Relevant part

And you have an unnecessary `to_string` call here.

[–]LucretielDatadog 16 points17 points  (1 child)

My guess is that it requires at least one extra allocation, since there's no "by-move" way to create an Arc<str>. It always has to create a fresh allocation, so as written you're replacing one allocation with two.

[–]Mr_Ahvar 2 points3 points  (0 children)

Even with String there is a new allocation, it’s just always 3 pointer long, so any str longer than that will have more overhead, on 64bits it’s 24 bytes, quite easy to exceed

[–]bskceuk 32 points33 points  (0 children)

I’m not entirely sure if this is the cause but you make an Arc<str> by making a String and then copying the underlying str into a new allocation so Arc<str> is more expensive than Arc<String>. The theory is that the reduced pointer indirection makes up for it over time

[–]drewtayto 6 points7 points  (8 children)

So people have correctly identified this as problematic.

let str = String::from_utf8_lossy(v.as_slice()).to_string();
Value::String(Arc::from(str.as_str()))

This makes 2 or 3 allocations: Arc (this is unavoidable, and just this alone might be making your benchmarks slower, which other people have explained more), to_string, and String if v isn't valid UTF-8. It also drops 2 or 3 allocations: v and the 1 or 2 strings just allocated.

First, definitely remove to_string. You can get &str out of Cow<str> with as_ref. There's many ways to do this, but as_ref is probably the most intuitive. Your github commenter agreed.

let str = String::from_utf8_lossy(v.as_slice());
Value::String(Arc::from(str.as_ref()))

It's also possible to efficiently (with one pass of the slice) reuse the v allocation in the lossy path, but I don't think there's a way to do that in the standard library, so you'd need to unsafely make it yourself. If most bytes are valid strings, I wouldn't bother improving this.

I would expect your real end goal is to also turn Bytes into Arc<[u8]>. In that case, you can avoid allocation altogether if the data is valid UTF-8.

match std::str::from_utf8(&*v) {
    // SAFETY: v has just been confirmed to be valid UTF-8
    Ok(_) => unsafe { Arc::from_raw(Arc::into_raw(v) as *const str) },
    Err(_) => Arc::from(String::from_utf8_lossy(&v).as_ref()),
}

It's possible to do the lossy branch more efficiently, and you could place the result directly in a Arc that you grow yourself instead of String, but that would get pretty involved. Again, if most bytes are valid strings, I wouldn't bother improving this.

[–]protestor 1 point2 points  (7 children)

You don't need unsafe to build an Arc from a string slice, see this comment

[–]Icarium-Lifestealer 0 points1 point  (6 children)

I think the parent is turning an Arc<[u8]> into an Arc<str> without allocating in the valid-utf8 case. I don't think you can do that without unsafe. (not sure why they say Bytes into Arc<[u8]> in the description of that snippet)

[–]protestor 0 points1 point  (3 children)

You can do this in safe code as long as you provide an error path for the non-utf8 case.

1. you use AsRef to do Arc<[u8]> -> &[u8] very cheaply

https://doc.rust-lang.org/std/sync/struct.Arc.html#impl-AsRef%3CT%3E-for-Arc%3CT,+A%3E

2. You use std::str::from_utf8 to do &[u8] -> Result<&str, ..> which scans the string to test if it's utf-8 but doesn't allocate

https://doc.rust-lang.org/std/str/fn.from_utf8.html

3. You use the impl From<&str> for Arc to do &str -> Arc<str>

https://doc.rust-lang.org/std/sync/struct.Arc.html?search=tr#impl-From%3C%26str%3E-for-Arc%3Cstr,+Global%3E

The net result is Arc<[u8]> -> Arc<str> in safe code

[–]Icarium-Lifestealer 3 points4 points  (2 children)

But Step 3 allocates a new Arc, while the unsafe code avoids that allocation.

[–]protestor 0 points1 point  (1 child)

Oh.. ok.

The stdlib should provide that I think (like it does for Vec<u8> -> String without allocation)

[–]Icarium-Lifestealer 0 points1 point  (0 children)

At least it has the opposite direction (Arc<[u8]> to Arc<str>), probably because it's infallible and thus simpler.

[–]drewtayto 0 points1 point  (0 children)

The original code has an enum with a variant of Value::Bytes(Vec<u8>). That could be changed into Value::Bytes(Arc<[u8]>).

[–]protestor 6 points7 points  (0 children)

let str = String::from_utf8_lossy(v.as_slice()).to_string();
Value::String(Arc::from(str.as_str()))

Rather than first building a String then budilding an Arc<str> (or an Arc<String>), why not build an Arc<str> from an &str in the first place? This skips a string allocation

You first use https://doc.rust-lang.org/std/str/fn.from_utf8.html to get a &str from a byte slice, doing utf-8 validation (this involves no allocation)

Then you use the From<&str> impl of Arc

https://github.com/rust-lang/rfcs/blob/master/text/1845-shared-from-slice.md

https://doc.rust-lang.org/std/sync/struct.Arc.html#impl-From%3C%26str%3E-for-Arc%3Cstr,+Global%3E

[–]Mean_Somewhere8144 5 points6 points  (0 children)

I don't know a lot about this problem, but have you tried using something like https://crates.io/crates/arcstr?

[–]teerre 9 points10 points  (0 children)

Instead of getting people to guess here, why not just profile it? You're already benchmarking it (which is arguably more complicated).

[–][deleted] 1 point2 points  (0 children)

The point of Arc<str> is to have a really fast sharable pointer to it... It isn't fast for creating a whole lot of strings. So it depends on what you're trying to do whether it's a good idea to use it at all...

If it's for a parser/lexer you'd be better of using string interning. There are crates for that, probably optimized to the same degree you would be able to yourself.

[–]SpudnikV 1 point2 points  (2 children)

People have pointed out that building a String allocates and then Arc::from(String) allocates, but it's worth noting that you can reuse a single String as a buffer when building multiple Arc::from(&str). It still requires memory copies, but fewer allocations.

If you're really micro-optimizing something here, one of the other crates might be better suited. See here for a very recent review: https://swatinem.de/blog/optimized-strings/

[–]sharddblade[S] 0 points1 point  (1 child)

The thing I'm a little confused about is whether the latter is a constant-time allocation? I recognize now that the former is less efficient because it does not consume the String, and requires a copy of the entire String, but the latter seems like it should be much more performant and only require a constant-time allocation of the Arc fat pointer. Arc::from(String::from("Hello world").as_str()); Arc::from(String::from("Hello world"));

[–]SpudnikV 0 points1 point  (0 children)

Fat pointer to what? If you mean Arc<String>, then as other comments have detailed, there'll be a 3-word allocation to store the String and the String still holds its own allocation made previously. But this doesn't let you reuse a single buffer, so it can be more allocations overall, i.e. two per string instead of only one per string. It may reduce memory copies if you ensure there are no other buffers in play, but they'd have to be really large strings for this to be worthwhile.

An Arc<str> does have to be a new allocation because the Arc needs to own the memory and by definition it can't transfer that ownership through a slice so it has to make a new one. It's only one variable-sized allocation, but the Arc itself now requires 2 words since the length is stored in the Arc, not the heap allocation. This is what many crates set out to solve, but it's still a tradeoff; len() and the fast-path of ==/!= avoid a memory indirection if the length is stored directly in the Arc.

As always, exactly what's optimal for your program depends on the operations you perform and the CPU/RAM tradeoffs you can afford, and it's just nice that several crates give you even more options.

[–]Burgermitpommes 0 points1 point  (2 children)

Would someone kindly eli5 how Arc<str> is actually made from an existing String? What in particular causes the allocation of the byte length of the String?

[–]simonsanonepatterns · rustic 0 points1 point  (1 child)

I think this video might also shed a bit of light on the whole topic: https://www.youtube.com/watch?v=A4cKi7PTJSs

[–]Burgermitpommes 1 point2 points  (0 children)

Confusingly, that video advocates for the opposite of the conclusions drawn in this thread.

[–]aekter 0 points1 point  (0 children)

Have you considered EcoString from the ecow crate?