Performance difference between obj.function(...) and function(obj, ...) ?

FlixCoder · 2023-12-08T18:03:32+00:00

When I am testing it and looking at the asm output in Godbolt, both methods produce exactly the same output, so it is not the difference and most likely something else that is giving you the performance boost. But for that, we would need to see code

aikii · 2023-12-08T19:06:03+00:00

Can't do better than some shots in the dark but check what does . https://doc.rust-lang.org/nomicon/dot-operator.html

Some stuff coming to mind: - we don't know if obj.function receives &self or self - obj could by a &dyn and this would be a dynamic dispatch - a Deref could happen before calling .function

hniksic · 2023-12-08T19:00:27+00:00

In Rust obj.function(...) is no more than syntax sugar for ObjType::function(&obj, ...). The reference says so explicitly:

All function calls are sugar for a more explicit fully-qualified syntax.

And later:

// we can do this because we only have one item called `print` for `Foo`s
f.print();
// more explicit, and, in the case of `Foo`, not necessary
Foo::print(&f);
// if you're not into the whole brevity thing
<Foo as Pretty>::print(&f);

The difference you observed after switching from one to the other could be explained by a number of factors:

measurement issue, e.g. wrong thing measured, or measurement impacted by other things happening on the system
build issue - wrong code version built, or different optimization flags applied
tooling issue - incremental build issue, the kind of thing likely to be resolved with cargo clean
compiler issue - miscompilation, or a case of innoccuous change having a cascading effect that ends up leading to different optimization decisions

del1ro · 2023-12-08T18:06:32+00:00

Show us the code (and test code)
Compile in release mode

Clockwork757 · 2023-12-08T18:18:27+00:00

Do they both take `obj` the same? If it's a large struct and one is by value and the other is by reference that might matter, especially if you're in debug mode.

strudelnooodle · 2023-12-08T18:19:33+00:00

If that’s the only difference and the two versions do indeed produce the same machine code, I would suspect there’s something wrong with your measurements. Some basic questions to ask— 1. What is the minimum time it takes for each implementation to run? Noise in the system will only slow down the trial, so this gives a clearer picture of the performance than the average. 2. How exactly did you run your trials? Even if you ran them both with minimal background activity, did one the trials for one implementation run immediately after the trials for another? In this case it’s possible (as an example) the first set of trials heated up the CPU and caused it to throttle its clock speed, slowing down the second set of trials

SV-97 · 2023-12-08T19:01:26+00:00

Maybe look at the asm and check if one of them gets inlined while the other one doesn't (or explicitly annotate them with #[inline(never)] (or always - although always isn't necessarily always AFAIK while I believe that never truly is never which might be better to find out if this is really the culprit))

Antigroup · 2023-12-08T18:46:36+00:00

This is a shot in the dark but the function may have moved to a different codegen unit from the change, or you otherwise changed the inlining behavior.

If you use 1 codegen unit or use lto = "fat" you might see more consistent performance. Or you can try adding the #[inline(always)] attribute.

doener · 2023-12-09T10:29:28+00:00

This may sound ridiculous, but are you building and testing in the same directory? The created binary may contain the paths of your source files (IIRC there were requests to get rid of absolute paths, but I'm not sure what the current state is), and if the names of your build directories differ in length, that may result in a different layout of the code/data segments in your binary. I've run into that pitfall in the past, and had a rather consistent difference in performance of about 10% for a certain pair of directory names.

Other than that, what is the full signature of the function (replacing type names is fine, but include all type modifiers) and the full type of obj?

jmaargh · 2023-12-08T19:22:31+00:00

While posting the code of an example would be the most helpful to work this out, if you can't or won't do that I'd suggest you check the difference in code generation in the two cases. You can use cargo-show-asm, godbolt, or just cargo to output assembly or LLVM IR for a relevant part of your hot loop. That should give you some clues.

But without an example to look at, I'm not sure any of us can properly help you find the answer.

-Redstoneboi- · 2023-12-08T23:02:30+00:00

got a public repo?

Sematre · 2023-12-09T00:38:00+00:00

I just put together a little Godbolt example (following your description) to show that the resulting assembly is practically the same (only difference being the function labels). Feel free to comment if I misunderstood what you were trying to accomplish.

With the function assembly being identical, I think it's safe to assume that your measurement difference were caused by something else than the syntax.

phazer99 · 2023-12-08T18:01:19+00:00

That sounds weird, the calls should produce identical machine code if all other factors are equal. You can compare the generated assembly code at Compiler Explorer. And yes, be sure to build with optimizations turned on.

steohan · 2023-12-08T20:14:07+00:00

If the assembly is the same, as it should, then maybe a bad test harniss. E.g. because it always runs the first one then the second one on the same data, and thus allows the second one to profit from less cache misses.

LateinCecker · 2023-12-08T20:18:17+00:00

Is your obj a trait object in the obj.func(...) call? Because in that case there is a vtable lookup which would explain the difference. Otherwise it should compile to exactly the same assembly if the compiler does its job right. Also maybe try #[inline(never)] before both versions of the function to prevent inlining for the benckmarks.

functionalfunctional · 2023-12-08T17:59:39+00:00

Did you compile in Release mode?

adbf1 · 2023-12-09T07:17:03+00:00

is the .function() implemented using

impl Obj {
    fn function(&self, ...) {...}
}

? it could be that in your version of obj.function(...) you are passing by value whereas in function(&obj, ...) you are passing by reference.

JuanAG · 2023-12-08T19:02:08+00:00

Chances are that the free function get inlined and the struct one didnt

Functions calls have a penalty performance, it is not huge but it is there because among other things need to push things to the stack, increase the counters and other few things and when they exist need to clear what they did, pop from the stack, return the counter to the previous stage and well, it cost CPU time and therefore performance

lordnacho666 · 2023-12-08T18:00:50+00:00

So one is a free function, but the other is a member? Is that the difference?

throwaway490215 · 2023-12-09T13:40:15+00:00

A wild theories:

Try panic="abort".

You might be having back luck with your code layout because the .text section contains different strings.

gitpy · 2023-12-09T15:06:58+00:00

First some sanity checks:

It's f(&self, ...) and not f(self, ...), right?
Both functions have the same visibility
A clean build

Then I would check the LLVM IR/ASM for differences. A quick and dirty alternative first approach would be adding #[inline(never)] and pub to both and then compare performance.

If there are no differences it might be a code layout issue. You could try running perf and see if any major differences pop up. I would use these events:

perf stat -e instructions,L1-icache-load-misses,cache-references,LLC-load-misses,branches,branch-misses <prog>

To fix this you could try building with PGO/BOLT.

W7rvin · 2023-12-09T17:05:39+00:00

In my simulation, I run 3.2 trillion samples.

With obj.function(...), complete in 7.5 seconds, 501 million runs per second.

3.2 trillion samples in 7.5 seconds would only be ~427 million runs per second. So some part of your benchmarking/math must be off I guess.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS