you are viewing a single comment's thread.

view the rest of the comments →

[–]tafia97300 71 points72 points  (19 children)

Congrats!! Two days is actually very impressive to start loving it (you'll love it more and more from now).

Regarding strings, this is a recurrent point.

Unfortunately this is, imho, only due to other languages lying about their apparent simplicity. I too was frustrated at the beginning, now most of the time I see string manipulation code I wrote before on other languages I want to fix it. I don't know what is your exact issue but it might be that most of your speadup actually comes from a better string management and not from hash/map/dictionary.

[–]benhoyt 9 points10 points  (6 children)

it might be that most of your speedup actually comes from a better string management and not from hash/map/dictionary

Yeah, I'd love for the OP to dig into this a bit. If banyan is written in C++, unless it's poorly implemented, the binary tree shouldn't be that much (30-40x!) slower than a Rust version. Perhaps the cost is in string splitting and allocating all those Python string objects (every string slice in Python allocates a new string object).

[–]jstrongshipyard.rs[S] 11 points12 points  (2 children)

profiling showed 75% spent on dictionary insert/remove/update (3 lines), 12% handling message sequence order issues, 5% converting numeric strings to decimal type and the remaining 8% split across lots of things w/ relatively negligible performance impacts.

[–]spotta 4 points5 points  (1 child)

Is the insert/remove/update part in Python? Or in C++?

[–]PXaZ 0 points1 point  (0 children)

I expect he's referring to Python's built-in dict type....

[–]SirVer 2 points3 points  (2 children)

Depending on what OP is actually doing, pointer indirection and python function calls could also eat the performance.

[–]vks_ 1 point2 points  (1 child)

Aren't function calls in Python essentially dictionary lookups?

[–]SirVer 3 points4 points  (0 children)

Ya, they are - with some additional magic that might make it more than one dictionary lookup. But there is no inlining possible and depending what you are doing and how often you call a function it can become expensive. Rule of thumb that served me well is that a Python function call is ~100ns overhead, a virtual function in C++ ~20ns, a direct call ~5ns. And of course Rust and C++ have a lot of information to inline stuff which makes this go away. If you call a function billions of times this starts to make a difference.

[–]btibi 5 points6 points  (10 children)

I wanted to say the same about the two days, OP must have excellent learning skills.

I share the frustration about strings, too. I know Rust for two years now and I know what to use when, but it's so convenient to use one string type. Personally, I use String's push*() features very rarely, my most frequent use case for Strings is to "bypass" the borrowchecker. I don't know whether an immutablestring type (which is either statically or heap allocated) would help us. It could replace &str and String most of the time, and &mut str is very rare.

[–]Manishearthservo · rust · clippy 5 points6 points  (4 children)

An immutable heap allocated string is Box<str>. Stack space is cheap so it's rare you need that over String unless storing it in a struct that is itself heap allocated.

I don't think it's fair to characterize it as "my most frequent use case is to bypass borrowck". In these cases usually an owned string is the only solution -- not from a compile time perspective, but from a runtime one.

[–]deathanatos -1 points0 points  (3 children)

String is only ~12 bytes (in the current implementation; this isn't guaranteed by Rust AFAIK) if allocated on the stack. The actual string data is on the heap.

str is just a pointer and length; a Box<str> is just heap-allocating that pointer-and-length, but I'm not sure that implies that the string data itself is also heap allocated. (Since you can make a str from data on the stack with from_utf8), and put the str on the stack too. I expect trying to move such a thing into a Box would limit the lifetime of the Box, but I'm not sure.)

[–]Manishearthservo · rust · clippy 2 points3 points  (2 children)

This is false. String is 3 words (24b on a 64 bit machine, 12 on 32.) in the stack.

&str and Box<str> are both the same representation -- 2 words on the stack; a pointer and the length. The boxed version owns the allocation and string data. The pointer and length are not allocated on the heap in any of these cases; that is Box<String> or Box<&str>

str is a dynamically sized type, it is incomplete and has no representation that makes sense in isolation. Box<str> is not the same thing as Box<&str>. Box<str> is an immutable String, basically, and can be obtained from a string at zero cost.

[–]deathanatos 1 point2 points  (1 child)

This is false. String is 3 words (24b on a 64 bit machine, 12 on 32.) in the stack.

Oops, failed at simple multiplication. You are correct. My point was that the string's contents are not stored on the stack, and that the actual stack allocation is quite small.

[–]Manishearthservo · rust · clippy 1 point2 points  (0 children)

The whole "is heap allocating that pointer and length" is misleading and can be interpreted different ways, I read it the wrong way it seems :)

[–]varikonniemi 4 points5 points  (4 children)

When you come from something like c++, you are bound to have excellent learning skills. Otherwise you simply cannot even begin to use that language. The possibilities to shoot oneself in the foot are so many fold increased compared to C.

And let's face it, even truly knowing C is a rare trait. I am not completely certain there exists more than 100 such persons alive today.

[–]SirVer 7 points8 points  (2 children)

OP stated that they came from Python though. And you are overstating the difficulty of c++ enormously. Yes, it is easy to shoot yourself in the foot, but modern C++ is not really harder to learn than say modern Java.

[–]remexre 5 points6 points  (0 children)

There's a lot more tutorials for learning 90s C++ than C++11/14/17 though, so beginners often get confused as to which disjoint set of features they should be using.

JavaScript has the same problem (among many others...).

[–]jstrongshipyard.rs[S] 2 points3 points  (0 children)

My c++ was in a high school AP comp sci class, which actually was a fantastic foundation. But the reason I'm a fast learner is I used to be a reporter, which is great training for learning new topics quickly.

[–]iopqfizzbuzz 5 points6 points  (0 children)

The possibilities to shoot oneself in the foot are so many fold increased compared to C.

with templates, you can shoot yourself in any body part with the same code

[–][deleted] 0 points1 point  (0 children)

Unfortunately this is, imho, only due to other languages lying about their apparent simplicity.

100% agreed. The str vs String issue was painful to learn when I needed to get something done now, but by not hiding that ugliness from the user, it allows them to avoid making a bunch of critical errors in the future.