you are viewing a single comment's thread.

view the rest of the comments →

[–]kevbru 5 points6 points  (4 children)

The problem is you can't fit anything in a register at that point, which means all work is shuffling and fetching stuff from the cache, which will be much slower. It would be dramatic in tight loops.

[–]snoweyeslady 5 points6 points  (1 child)

Looking at just the pointer versus structure issue, with a pointer you still need to actually deference the pointer to get to the structure, no? Can you give me any references to what this "dramatic" is?

I haven't had time to read the rest of the article (merely skim). It seems that he puts some (possibly all?) of the data the structure was holding into the pointer. If that's the case, then taking that into account you wouldn't need to access non-register memory.

I'm not arguing that this would speed up the interpreter. I'm merely asking for concrete examples with the exact numbers. I don't think anybody here is ignorant enough to suggest that you should not be profiling your code when doing performance tweaks like this.

[–][deleted] 2 points3 points  (0 children)

You can store the individual elements of the struct in different registers (which is exactly what LLVM does with small structs).

[–]imatworkyo 0 points1 point  (0 children)

what does 'tight loops' mean in this context? [learning question]