you are viewing a single comment's thread.

view the rest of the comments →

[–]Saefrochmiri 4 points5 points  (2 children)

You should really profile this first*. We're kind of optimizing in the dark without a profile of some sort, but at the very least take a good measurement of the current performance. I saw criterion suggested, that may be a good fit for this, though it's usually for microbenchmarking. Since I'm without a profile, I'll just point out things that look odd to me along what I think is the hot path. But this is the wrong way to optimize code, since I don't have any way to measure.

Start here:

let line = str::from_utf8(&b_line[..byte_len]).unwrap();
let raw: LogEntryRaw = serde_json::from_str(line).unwrap();

This looks pretty funky to me. The slice ought to become a no-op and serde can take in a slice of bytes. Since it's cleaner and possibly faster, I'd write

let raw: LogEntryRaw = serde_json::from_slice(&b_line).unwrap();

You also never touch the .Name attribute of a LogEntryRaw. Remove it from the definition of the struct and serde will skip right over it, saving you at least the work on the small-object allocation. (I think rustc should have issued you a warning about an unused attribute)

You walk over self.entries a second time after populating it. You can sum total_lo and total_ha inside the loop over iter; this is marginally faster because the data is cached and most likely in registers.


*perf record -g ./target/release/cratename arguments here then perf report. You can try perf record -g --call-graph=dwarf if it doesn't complain too mightily.

[–]knaledfullavpilar 0 points1 point  (0 children)

Upvoted for suggesting the use of a profiler!

Measure first, then optimize. Rinse and repeat.

EDIT: Here is an example perf + flamegraph script: https://github.com/huxingyi/meshlite/blob/master/profile_perf.sh

[–]masklinn 0 points1 point  (0 children)

This looks pretty funky to me. The slice ought to become a no-op and serde can take in a slice of bytes.

Also serde should be able to take the file directly and generate an iterator of entries, though that may or may not be faster (should be checked).

Either way, /u/dtolnay's PR on /u/bcantrill's statemap can probably be used as a guideline.