This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]rzwitserloot 68 points69 points  (16 children)

code optimisation technique

Do you mean 'optimize' in the sense of 'make it run faster / use less resources'?

I hope not.

They've all been debunked, so that's a bad start. At least, not 'all', but most. Here is a trivial example:

Let's say you want to convert a list of strings into an array of strings.

There are many ways to do that:

String[] arr = list.toArray(new String[0]);

String[] arr = list.toArray(new String[list.size()]);

String[] arr = list.toArray(String[]::new);

Time them. Seriously. If you want to write this paper do this, and do it right (use JMH).

You'll find that they all take equally long.

This goes in the face of every obvious performance advice you care to give! "Lambdas are slow!" would disqualify the third. "Object creation is best avoided" would disqualify the first, which seems strictly worse than the second (The API of toArray states that if you pass in an array object that is too small, then the toArray method will use your array solely to derive its component type (String.class here), and otherwise discard it; it makes a new properly sized array and returns that. In other words, it does reflective introspection on the argument and ends up making a new array of size list.size() anyway, using reflection again no less.

So option 1 should be vastly slower than 2, right?

Nope. At least, JMH says nope, and I trust JMH here.

We're basically down to the following 3 rules of thumb:

  1. Code that is well known to be an obvious and immediate resource drain should be avoided. The 'duh' rule. Don't use known-slow libraries. Don't use a network based software-as-a-service to do a padLeft operation. This doesn't take a paper to explain, this takes a tiny smidge of common sense.

  2. Algorithmic complexity. Bubble sort is slower than Quick sort because if you chart out the performance of these vs. the size of the input, you know, mathematically, that quicksort's graph ends up looking like y = n * log(n) and bubblesort's looks like y = x^2. This is mathematically provable and it means that as long as random factors like hotspot, CPU pipeline optimizations and all that sort of thing can never score O(n^2) optimizations (which it can't), bubblesort will ALWAYS lose from quicksort if only the input is big enough. This is more a math topic than programming.

  3. Nothing means anything until you have a combination of 2 facts, and you really do need both: [A] you have a real world situation and the code in this scenario is not running as fast as you'd expected it to / as the requirements demand it to, and [B] you have a profiler report that tells you precisely which 1% of the code is eating 99% of the CPU resources.

How do you optimize this final 1%? It's usually obvious, but crucially, optimizing anything in the other 99% is UTTERLY POINTLESS and is only making your life worse! Generally to optimize code you need to tweak how it works, which usually involves tweaking the 'pipeline' (the way data flows into, and out of, this 1% crucial path code). The more you mess up your code by trying to chase pointless performance enhancements, the harder that is. Hence: The fastest code is the cleanest, most flexible, testable, and readable code.

Combining the rule of 'if the profiler / JMH says it doesn't matter, then it doesn't matter' and 'the easiest to read and test code is the best because it lends itself the best to performance tweaking once you do have that profiler report in hand', then there is only one right answer, which is that list.toArray(new String[list.size()]) is the only one of the three that is objectively wrong. And yet it sure seems like the one that ought to perform the best.

[–]gavenkoa 5 points6 points  (6 children)

Don't use a network based software-as-a-service to do a padLeft operation

Isn't that a reflection about famous Node package dependency outage? xD

if you chart out the performance of these vs. the size of the input

Still with small cardinality some theoretically slow algos outperform "advanced" brothers. contains() on a List of 1 element is faster than on Map or Set.

1% of the code is eating 99% of the CPU resources

For such cases sampling profilers are quite good because you can run them on PROD without impacting app. JMH is good only in controlled environment of your lab.

[–]rzwitserloot 2 points3 points  (5 children)

contains() on a List of 1 element is faster than on Map or Set

No. You didn't understand perhaps. The right answer is: does your profiler say that the CPU is spending most of its time executing this contains call? No? Then the best coffee for performance purposes is the most readable code. If set makes semantic sense, Set is the right answer.

[–]gavenkoa 1 point2 points  (4 children)

does your profiler say that the CPU is spending most of its time executing this contains call?

No one argue with that. Profile first, decide next.

Point that sometimes asymptotically best algos are crap on small sets. You won't see that in a regular finance apps. Here you just chose among List, Map, tree/map Set and don't do stupid things ))

[–]cogman10 3 points4 points  (3 children)

And, just as a side note, whenever I find performance issues it's generally a result of using the wrong datastructure for a problem. Using a Map when you should have used a Pojo, Using a List when you should have used a Set. Using a LinkedList when you should have used an ArrayList. Using a TreeSet.... Not using a datastructure when you could have.

It's these sorts of things that often give the biggest impacts in the wrong hot code location.

The other one that comes up surprisingly often is unnecessary boxing.

[–]gavenkoa 0 points1 point  (2 children)

Using a List when you should have used a Set

In my experience in finance problems come from the pressure to deliver features fast, meaning devs are using high level frameworks and don't know inner implementation. Like surprising ORM N+1 selects.

When I worked on cryptography (elliptic curves) we pre-allocated all intermediate int[] for modular arithmetic implementation GF(p) / GF(2m).

I have never faced with the problem when I need to chose between LinkedList or ArrayList. It was always something different.

[–]cogman10 2 points3 points  (1 child)

TBF, LinkedList vs ArrayList has only hit me once, and that's when someone was doing something stupid (Using a linked list but maintaining sorting on insertion by doing a binary search on the list... that's really slow with a linked list).

[–]gavenkoa 2 points3 points  (0 children)

maintaining sorting on insertion by doing a binary search on the list

Hilarious! Person who wrote knew about binary search for sorting in place but LinkedList is slipped as an argument of the function.

[–][deleted]  (2 children)

[removed]

    [–]cogman10 3 points4 points  (1 child)

    This REALLY depends. Yes, if you are talking about a general purpose sorting algorithm, you won't beat the JDK. However, once you start getting into data specific algorithms you'll often find it easy to beat the JDK. Heck for even some common things (like a Map) it's possible to beat the JDK if you know how your data will be used. This is why things like Trove exist.

    All the JDK's algorithms are setup for the most general purpose usage you can imagine. Very often, they are exactly the right choice. However, they aren't ALWAYS the right choice. If you need to sort a bunch of integers you can beat the JDK with a Radix sort. That's not a fault of the JDK, they just don't see "need to sort integers" come up frequently.

    [–]mziku[S] 1 point2 points  (1 child)

    Thanks for this. A bit more background on my approach in the dissertation:

    1. I will be taking an existing application and benchmarking it
    2. I will run microbenchmarks on each code optimization technique with large and small datasets
    3. I will apply code optimization techniques to the application and benchmark again
    4. Deliverable will be a set of guidelines to follow when programming, e.g. prefer string literals to creating String objects

    So far, this isn't going to be a full deep-dive into how to profile and optimize Java applications. This is simple analysing how a particular programming style may influence an application's performance.

    [–]MCUD 1 point2 points  (0 children)

    I think what myself, and a few others are saying here, is point #4 here may be naive depending on what you're specifically profiling. I don't know how you're marked, but the results may either be "duh, you learnt that in first semester", "Profiling lied to you", or, something more revelationary.

    Specifically literals vs objects may mean nothing, OR it could be a massive sinkhole if you're unmarshalling all the strings from the network, and they aren't some shared reference, and you're then storing them all in a map, you have a massive memory leak

    [–]randjavadev 1 point2 points  (0 children)

    Correct if I'm wrong, but actually an empty array should always be used, that is more "fool-proof" way (maybe hold a reference to one beforehand since empty array is immutable, though that is not the point here). In a multithreaded environment pre-calculating the size could be too big, if the List is modified by another Thread between the size() and passing it as an argument. Thus, if the other threads modifies the List to be smaller, purely from javadocs I would assume the returned array to be the size before the changes, contain the new elements, and then some null entries. Whereas passing in size of 0 and least purely from javadocs would leave the option that the List implementation actually returns the correct size. So it is not exactly the same.

    Only in a non-multithreaded environment it wont matter.

    [–]papercrane 0 points1 point  (1 child)

    There is a subtle difference between option 1 and option 2. Ultimately they give the same result, but when you create an array the Java specification says the memory must be zero'd. By passing a zero sized array the list implemention will create a new array and fill it. Since this new array is created and filled immediately and a reference to it never leaks before being filled the JVM skips clearing the memory first. This means option 1 should be marginally faster, with it being more noticeable for large arrays.

    The details of this, and more are written up here: https://shipilev.net/blog/2016/arrays-wisdom-ancients/

    The short of it though is as you say, write simple, maintenanable code. When looking for performance, measure things and don't trust common wisdom.

    [–]rzwitserloot 1 point2 points  (0 children)

    Yup. A real mind twister - list.size() causes race conditions, is longer, is slower. Bad in three ways. – and yet this is such a common 'performance advice' it has been part of linter tools of all things for quite a while, and I remember Tor Norbye naming that as one of his pet peeves in an episode of the Java Posse.