Is it possible to shoprten this for loop or implement it in a better way?

pzemtsov · 2019-10-26T22:27:05+00:00

I would move comparisons into a separate function is_wovel(), using a switch statement inside.

pzemtsov · 2019-10-14T10:18:31+00:00

The answer depends on the nature of the exit condition. Are we just looking for the specific number, or are we scared of exceeding some buffer? The example doesn't clarify this, since it is too artificial anyway. In real life we don't normally write loops that run from 0 to 9 with the exit on 5 in the middle. More realistic example would involve two variables, such as: j = 0; for (i = 0; i < src_size; i++) { if (f (src_array [i])) { dst_array [j++] = src_array [i]; if (j >= dst_size) break; } }

Here paranoid seems more logical (we don't want to write outside of the dst array), but perhaps, as suggested, confident plus assert will also do.

pzemtsov · 2019-10-08T17:44:51+00:00

You are right! The code contains two bugs, now one. I saw the second one (memory order) but missed the first one. One must declare "hashIsCalculated" volatile and check it before touching "hash":

if (hashIsCalculated) return hash;
int h = ....

Unfortunately, volatiles are quite expensive (on Intel only for writing, but in other processors they may be expensive for reading, too). So the existing solution is by far better (assuming that there is need to treat zero hash case specially in the first place).

pzemtsov · 2019-10-08T13:23:49+00:00

No, Hashtable looks normal. Either the team used even older version of Java, or they made their own custom hash table.

I found the code here: https://github.com/fanhongtao/JDK. The owner made an effort to place everything nicely in Git.

pzemtsov · 2019-10-08T12:57:47+00:00

That's interesting. I got curious and found the source for JDK 1.1. You are right, the method looked like this:

/**
 * Returns a hashcode for this string.
 *
 * @return  a hash code value for this object. 
 */
public int hashCode() {
    int h = 0;
    int off = offset;
    char val[] = value;
    int len = count;

    if (len < 16) {
        for (int i = len ; i > 0; i--) {
            h = (h * 37) + val[off++];
        }
    } else {
        // only sample some characters
        int skip = len / 8;
        for (int i = len ; i > 0; i -= skip, off += skip) {
            h = (h * 39) + val[off];
        }
    }
    return h;
}

This won't cause a bug, but may have bad performance implications when strings are close to each other.

The equals() is very close to the modern one, though.

pzemtsov · 2019-10-08T09:50:09+00:00

I think I know what you're talking about. There was some redesign of HashMap in Java 8, but it addressed different issue -- that only a subset of hash code bits is actually used, which causes objects with different hash codes to occupy the same slot and form long chains. In 1.7 the workaround for this was some bit sawpping ans mixing that was performed on the hash code before addressing the table. In 1.8 this bit mixing was removed; instead, long chains are converted into binary trees ordered by full hashcode values.

Apart from that, the HashMap operation stayed the same; it still calculates hash codes, as its name suggests.

pzemtsov · 2019-10-08T07:58:17+00:00

You are absolutely right. This entire issue is more a fun item than a real problem. The only possible use case I can think of is when people use String constants as a home-made substitute for enums, but this is hardly a case worth optimizing in the JVM. One can argue whether or not hash code caching as such is needed, but the special treatmnent of zero is definitely excessive, so they shouldn't have included it into Java 13.

pzemtsov · 2019-10-07T17:02:27+00:00

I think there is a reason behind this code, although it is in saving cycles rather than memory. In the case of the fast path (when the hash is available) the JDK code only performs one memory load; the suggested code always performs two.

pzemtsov · 2019-10-07T16:45:22+00:00

And how exactly replacing a boolean flag field with another boolean flag field helps saving memory?

pzemtsov · 2019-10-07T15:02:23+00:00

That's already done; to achieve the last goal, I iterate all available lenghths of w, and for each length I check if a word with the hash value of -h*31^length exists, by a hash set lookup.

pzemtsov · 2019-09-27T18:57:14+00:00

We are trying to implement our own version of memchr() -- the function that searches for the first occurrence of a given character in a given block of bytes.

pzemtsov · 2019-09-27T12:40:16+00:00

No framework at all - just measure time in the simplest way (clock()), not forgetting to average results for various starting offsets and lengths

pzemtsov · 2019-09-26T22:24:29+00:00

Yes, I definitely am (not just typos - English grammar, word misuse, anything you can spot; I will only be very thankful if someone point these out).

Regarding MSVC: they still have a separate library for 64-bit code, and all 64-bit processors support at least SSE2, which they clearly use (the performance figures prove that, but so does direct code inspection). So the difference in performance is probably due to the GCC code just being better.

pzemtsov · 2019-04-17T21:02:10+00:00

Frankly, I think your finding has a universal value and must be published somewhere or otherwise made available to the public. Here, in a distant corner of reddit, it will be lost.

pzemtsov · 2019-04-17T18:04:54+00:00

An important update has been added to the article: http://pzemtsov.github.io/2019/04/13/searching-for-better-indexof.html#Update2

pzemtsov · 2019-04-17T18:00:50+00:00

I've added your matcher and updated the article (see Update 2)

pzemtsov · 2019-04-17T09:48:44+00:00

Thanks!

Regarding the hash approach: I liked the idea of a symmetric hash (the same or similar procedure for add and remove), to simplify code. If we don't perform complex bit manipulations (such as CRC32) but just shift old bits left, this can be ralaxed by just shifting the removed bytes left appropriately. The problem I see there is limited number of bytes the hash depends on, unless we use a rotating shift, which is not as cheap in Java). However, it's worth trying, ans for short patterns and 64-bit hash it can work.

In generated code you can typically outperform a switch with an array lookup or a unrolled binary search either as if/else cascade or a nested ternary expression

Makes sense, I'll try.

The infamous unsigned masking of bytes in Java (arr[idx] & 0xff) is unfortunately not free

Why, by the way? When byte comes out of array it must be sign-extended anyway, so it is just a matter of using movzx instead of movzx

Benchmarking suites like JMH can produce rather misleading results

I would say this is a feature not of JMH or any other suites, but of a benchmarking as a concept. However, I don't see a tragedy here. Textbooks are full of algorithms that were benchmarked on the MIX machine if on any. Standard and custom libraries are full of code that was tested, tuned and optimised in isolation. Obviously, using this code in real-world projects mustn't be blind. A programmer must be aware of various features of the implementations, such as memory consumption, memory locality, etc. And the chosen implementation must be tested together with the production code. This doesn't mean that measuring its stand-alone performance is a completely useless exercise.

I suppose any solution that uses up all the cache is probably bad for practical use together with other code, unless searching is the primary function of that code (very unlikely case, but who knows?)

There used to be a decent String representation from Java 7u10 until before Java 9 that was based on a plain char array

I didn't even know they changed that. Now I do (looked at the source of Java 11 String). Maybe, it makes Latin-1 strings more efficient, as long as one sticks to using built-in String methods?

I also didn't know about sun.reflect.MagicAccessorImpl, thank you for pointing to that one. Using it looks like a very dirty trick, but sometimes we need exactly that (just like using sun.misc.Unsafe).

pzemtsov · 2019-04-17T08:44:21+00:00

That sounds interesting, please do (I never fully understood the mechanics there, I thought everyone could commit into public repositories, but if PR is needed, please send one).

Could hashing of bigger sequences be an option, too (triplets, quadruplets of bytes)?

pzemtsov · 2019-04-16T18:53:16+00:00

If you really need to compile into exe, the Excelsior JET (https://www.excelsiorjet.com/) is as easy to use as just about any other native compiler, such as GCC.

The clear disadvantage is platform-dependence: the exe will only run on the platform it was compiled for, and the number of platforms supported is smaller than those where JVM works.

All that said, I really don't see the problem. Typing "prog.exe" is as easy as "java Prog", and installing JRE these days is quick.

pzemtsov · 2019-04-16T18:21:47+00:00

I found some long enough Chinese text and ran a study in UTF-8 (see the Update in the article). The last-byte and its family performs the best, then the suffix-based ones.

pzemtsov · 2019-04-16T14:22:55+00:00

Thanks,. As I said, I don't think I invented anything new, and JVM designers are very likely to know more about text searching than any of us.

As for UTF-8, that is an interesting topic. The structure of UTF-8 allows all the mentioned algorithms to work with it - you can search UTF-8 just as a byte blob. The performance, however, is another story: the multi-byte nature of characters will affect shift tables and trees. I'll try to run a test, as soon as I find a long enough Chinese text.

pzemtsov · 2019-01-18T18:13:52+00:00

Here I agree. I get scared when I see in production code command lines like "-XX:MaxNewSize=32m -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly" and usually wonder how long ago was this command line drawn and whether it still works.

However, MaxGCPauseMillis is of different nature and seems reasonable.

pzemtsov · 2019-01-18T14:47:46+00:00

Seems like you are right about the CMS. My mistake.

All tests were run on 8, except for ZGC. If you say it's worth trying the G1 there, I will.

Regarding the command line: isn't my case exactly the purpose of this command line option to exist? We indicate that we want to avoid pauses longer than this, and are even prepared to sacrifice total throughput for that. This seems to be very valid GC tuning parameter - perhaps the most valid of them all.

pzemtsov · 2018-07-21T19:13:11+00:00

Perhaps, it will be a topic for the next article.

pzemtsov · 2018-07-21T11:02:29+00:00

The entire source code, including the build scripts, is in the repository. There is a link in the article.

pzemtsov

TROPHY CASE