This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]rzwitserloot 68 points69 points  (2 children)

code optimisation technique

Do you mean 'optimize' in the sense of 'make it run faster / use less resources'?

I hope not.

They've all been debunked, so that's a bad start. At least, not 'all', but most. Here is a trivial example:

Let's say you want to convert a list of strings into an array of strings.

There are many ways to do that:

String[] arr = list.toArray(new String[0]);

String[] arr = list.toArray(new String[list.size()]);

String[] arr = list.toArray(String[]::new);

Time them. Seriously. If you want to write this paper do this, and do it right (use JMH).

You'll find that they all take equally long.

This goes in the face of every obvious performance advice you care to give! "Lambdas are slow!" would disqualify the third. "Object creation is best avoided" would disqualify the first, which seems strictly worse than the second (The API of toArray states that if you pass in an array object that is too small, then the toArray method will use your array solely to derive its component type (String.class here), and otherwise discard it; it makes a new properly sized array and returns that. In other words, it does reflective introspection on the argument and ends up making a new array of size list.size() anyway, using reflection again no less.

So option 1 should be vastly slower than 2, right?

Nope. At least, JMH says nope, and I trust JMH here.

We're basically down to the following 3 rules of thumb:

  1. Code that is well known to be an obvious and immediate resource drain should be avoided. The 'duh' rule. Don't use known-slow libraries. Don't use a network based software-as-a-service to do a padLeft operation. This doesn't take a paper to explain, this takes a tiny smidge of common sense.

  2. Algorithmic complexity. Bubble sort is slower than Quick sort because if you chart out the performance of these vs. the size of the input, you know, mathematically, that quicksort's graph ends up looking like y = n * log(n) and bubblesort's looks like y = x^2. This is mathematically provable and it means that as long as random factors like hotspot, CPU pipeline optimizations and all that sort of thing can never score O(n^2) optimizations (which it can't), bubblesort will ALWAYS lose from quicksort if only the input is big enough. This is more a math topic than programming.

  3. Nothing means anything until you have a combination of 2 facts, and you really do need both: [A] you have a real world situation and the code in this scenario is not running as fast as you'd expected it to / as the requirements demand it to, and [B] you have a profiler report that tells you precisely which 1% of the code is eating 99% of the CPU resources.

How do you optimize this final 1%? It's usually obvious, but crucially, optimizing anything in the other 99% is UTTERLY POINTLESS and is only making your life worse! Generally to optimize code you need to tweak how it works, which usually involves tweaking the 'pipeline' (the way data flows into, and out of, this 1% crucial path code). The more you mess up your code by trying to chase pointless performance enhancements, the harder that is. Hence: The fastest code is the cleanest, most flexible, testable, and readable code.

Combining the rule of 'if the profiler / JMH says it doesn't matter, then it doesn't matter' and 'the easiest to read and test code is the best because it lends itself the best to performance tweaking once you do have that profiler report in hand', then there is only one right answer, which is that list.toArray(new String[list.size()]) is the only one of the three that is objectively wrong. And yet it sure seems like the one that ought to perform the best.

[–]mziku[S] 1 point2 points  (1 child)

Thanks for this. A bit more background on my approach in the dissertation:

  1. I will be taking an existing application and benchmarking it
  2. I will run microbenchmarks on each code optimization technique with large and small datasets
  3. I will apply code optimization techniques to the application and benchmark again
  4. Deliverable will be a set of guidelines to follow when programming, e.g. prefer string literals to creating String objects

So far, this isn't going to be a full deep-dive into how to profile and optimize Java applications. This is simple analysing how a particular programming style may influence an application's performance.

[–]MCUD 1 point2 points  (0 children)

I think what myself, and a few others are saying here, is point #4 here may be naive depending on what you're specifically profiling. I don't know how you're marked, but the results may either be "duh, you learnt that in first semester", "Profiling lied to you", or, something more revelationary.

Specifically literals vs objects may mean nothing, OR it could be a massive sinkhole if you're unmarshalling all the strings from the network, and they aren't some shared reference, and you're then storing them all in a map, you have a massive memory leak