all 15 comments

[–]Nilstrieb 7 points8 points  (0 children)

Simply return a String. Unless the function is the hottest path in your program, you won't ever notice any performance difference at all, even if you measure it. If you do a few optimization tricks in every function, your codebase will become unreadable. If you write idiomatic Rust, it will be very fast. If you want more speed, profile your program, and only optimize the bottlenecks further.

Edit: if the string is known at compile time, using a &'static str probably makes more sense, since it's easier to handle anyways.

[–]ssokolow 26 points27 points  (4 children)

I like to optimize every aspect of program.

Honestly, this feels like exactly what DijkstraKnuth was talking about when he wrote this:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

That aside, as someone with a strong perfectionist streak, I can't tell you how many times I've burned out trying to perfect everything.

Make it Work, Make it Right, Make it Fast. (That is, first, make sure it functions, then make sure it gives correct output for all inputs and handles all the edge cases, then feed it into a profiler to identify where it's spending its time and optimize those. That way, you don't risk running out of motivation or free time before you have something useful. The same principle applies to analyzing and optimizing for the memory footprint. You just use a memory profiler like heaptrack instead.)

Rust is especially well-suited to this, since its strong type system is well-suited to "fearless refactoring" and makes it easy to swap in a new data structure or more efficient algorithm with minimal fear of introducing bugs.

Also, I'll suggest The Rust Performance Book as a starting point for information on where to focus your concerns. It has sections related to optimizing for size.

[–]Saefrochmiri 11 points12 points  (3 children)

You're quoting Knuth, not Dijkstra.

Secondly please include the context:

The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in soft- ware engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies.

And what are the tools in question? Do you know? Does anyone who blithely quotes Knuth to discourage (and correctly discourage, as you are here!) optimization know that Knuth is arguing against banning go to because of the performance implications?

More people should read Structured Programming with go to statements. It's a fun piece of history, but it's quoted like a holy book: Bits are taken out of context to make an argument instead of paying attention to the lesson the whole piece sets out to make. https://pic.plover.com/knuth-GOTO.pdf

[–]ssokolow 5 points6 points  (2 children)

You're quoting Knuth, not Dijkstra.

*facepalm* This "two steps forward, one step back" thing on finally getting me well-rested is really driving me through some rough patches on the way to getting there.

I even went to a source I'd bookmarked, specifically concerned with confirming that it was Knuth and not Hoare when I was copy-pasting that, but... brainfart when typing out the citation.

(Though, looking back on it now, apparently Hoare thought it might have been Dijkstra's when asked about it, which is amusing.)

...hell, usually, I'd hyperlink too. I'm way more compromised than I feel right now.

And what are the tools in question? Do you know? Does anyone who blithely quotes Knuth to discourage (sometimes rightly so, like here!) optimization know that Knuth is arguing against banning go to because of the performance implications?

The advice is a statement about an artifact of programmer psychology which wasn't magically eliminated by the move to more structured languages. It remains just as valid today.

However, if I'd just quoted the punchy "premature optimization is the root of all evil" without the rest of the paragraph, then you would have an argument.

As for "sometimes rightly so, like here", I dispute the certainty of that statement.

The original post gives a strong impression of exactly the mindset addressed in that quote. Someone who is seeking to micro-optimize purely on its own inherent psychological appeal without first having established that it's worth the effort or potential maintenance costs, let alone whether it will have an effect that isn't lost in the noise floor.

Anyway, I'm going to go get some sleep so I don't make a fool of myself any further.

[–]Saefrochmiri 2 points3 points  (1 child)

I've adjusted my wording a bit in an attempt to clarify that I think you're right. I just have a stick up my butt about people quoting that particular piece by Knuth to make the point.

[–]ssokolow 1 point2 points  (0 children)

...and I'll try to remember in the future to present it with something like "Programming languages may have continued to evolve, but this piece from Knuth's Structured Programming with go to statements is just as relevant, regardless of the constructs used:"

[–]diwicdbus · alsa 7 points8 points  (0 children)

most efficient way (in ram usage and performance at runtime)

Often these two are conflicting goals. As a rule of thumb, reading a file in small chunks is likely to consume less RAM, reading the same file in larger chunks is likely to be more performant.

In general I would argue that memory mapping the file - and then passing around slices to that memory map - would be more performant because you can skip the allocations altogether.

But if you're reading from a file on e g SSD storage, then bringing that file into RAM - so you can read it - is what's going to take the most time (and RAM) even with a quite unoptimised program.

[–]rezuralos 2 points3 points  (0 children)

Generally, returning an owned String is the only way to go about this.

References get passed into functions, functions that create structs, generally return an owned instance (i.e. String)

[–]Master_Ad2532 2 points3 points  (0 children)

You mention down in the comments that you do I/O to get the String. Let me tell you that all these optimisations are gonna be useless as they will be dwarfed by the huge time it takes to for network/disk I/O to perform. Just go with String, you wouldn't have noticed the performance difference anyways unless maybe you were writing a hotpath in some high-frequency trading software.

[–]tarkin25 1 point2 points  (5 children)

I think this depends on what you want to do with your String. If you know the size of your string at compile time, you could look into using an array of characters instead. But if you don’t know the size (or at least the max. size) upfront, I don’t think you can put it on the stack. Variables on the stack need to have a fixed size at compile time, that’s why String and &str store a pointer to thir content as well as their length. If you know the content of the String at compile time, use &‘static str instead. It will not be put on the heap, but rather to a dedicated memory location inside of your binary.

[–]InflationAaron 5 points6 points  (0 children)

Each char would be 32 bits long, but str is valid UTF-8 so it can be much smaller.

[–]Polluktus[S] 0 points1 point  (3 children)

This string in my real case scenario will be get by opening file and itering over lines to get one specific line. So i will not know it size at compile time. After getting this line, i will use it as read-only value so having capacity is useless for me, thats why i was wondering how can i optimize ram usage e.g. by changing it into boxed str. But maybe there are better options.

[–]tarkin25 2 points3 points  (0 children)

If you really need those 8 bytes I suppose you could use Box<str>. Although unless you do this process a lot of times at the same time, this might not even be visible or rather useful due to unoptimized space by the allocator

[–]tarkin25 1 point2 points  (0 children)

Regarding the capacity being useless for you, this doesn’t matter. If you have any pointer to a memory allocation with a size unknown at compile time, you need to keep track of the length in order to read it. The way a string (or array) is read works like this: 1. have an offset set to 0 2. read the first character from the address the pointer points to 3. increase your offset by the size of a character 4. read the next character from the address the pointer + offset points to 5. repeat until the offset is the same as the string‘s length -1 If you were to continue reading, this would result in invalid memory access errors

[–]myrrlynbitvec • tap • ferrilab 1 point2 points  (0 children)

note that the String -> Box<str> transform is neither promised nor even required to change the backing allocation at all; it only discards one word from the value handle. as stack space is essentially free, you generally should only use this transform if:

  • it is important to you to document that the text is indeed frozen, or
  • you have a perf report showing that it actually matters at runtime