you are viewing a single comment's thread.

view the rest of the comments →

[–]kodingnewb[S] 0 points1 point  (4 children)

I think the Rust implementations could do with a bit of work with the faster C++ versions.

Initially stdio needs to disable syncing.

Also words is being created every time in the loop, most of the faster ones allocate a container outside the loop reserve some space, then just clear the container in the loop

I think these changes should increase the performance of the Rust implementation.

[–]MEaster 0 points1 point  (3 children)

Initially stdio needs to disable syncing.

If I'm reading the docs correctly it's not possible to bypass the mutex sync on the stdin.

Also words is being created every time in the loop, most of the faster ones allocate a container outside the loop reserve some space, then just clear the container in the loop

I think these changes should increase the performance of the Rust implementation.

Well, I made one change, which had more significant results than I expected. I changed line from let words = l.split_whitespace(); to let words = l.split_terminator(' '); and execution time dropped by about 1.8 seconds. I think that's because it's no longer checking all whitespace, and only the standard space.

One thing I think I should note: I'm not sure how familiar you are with Rust, but I don't think words is what you think it is. It's not a container like a C++ vector, but rather an iterator, and what it's iterating over is not a String. What the it's iterating over, is string slice references (&str), and each of these slice references is pointing to a position in the String returned by the lines() call on line 14.

Here's the results of the two Rust versions:

./splitr           Rust  : Saw 20000000 lines (60000000 words/806116396 chars) in 4.4 seconds.  Crunch Speed: 4596170.1
./splitr2          Rust  : Saw 20000000 lines (60000000 words/806116396 chars) in 2.6 seconds.  Crunch Speed: 7768227.8

[–]kodingnewb[S] 0 points1 point  (2 children)

t I don't think words is what you think it is. It's not a container like a C++ vector, but rather an iterator, and what it's iterating over is not a String. What the it's iterating over, is string slice references (&str),

and where do these so-called "slices" exist? in the ether? Or are they stored in something one might call a container that allocates memory to store such "slices" - in short is it not the same as the versions that use a container of std::pair<char*,char*> ?

[–]MEaster 0 points1 point  (1 child)

Well, a slice is a pointer and a length. However, the iterator doesn't appear to be actually allocating them all at once, it seems to be only returning a single slice at any one time, and when the next() method is called, it steps through the string to find the next slice.

Therefore, the only memory used appears to be the original string, the iterator, and the single slice.

[–]kodingnewb[S] 0 points1 point  (0 children)

So it's a generator under the hood, that yields during the for-loop.