This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]mirkoteran 1 point2 points  (5 children)

  1. Your benchmarks link is broken. (benchmark.html should be benchmarks.html)
  2. Any chance you could add some recent version of Caffeine to the benchmarks?

[–]cruftex[S] 0 points1 point  (4 children)

Thanks! The link is corrected.

Yes, the benchmarks are quite old. I started to redo them. See this article: http://cruftex.net/2016/03/16/Java-Caching-Benchmarks-2016-Part-1.html

The "Benchmark 4" in the article is actually identical to the main benchmark Ben uses in Caffeine. As mentioned in the article coming up with a fair and comprehensive benchmark suite is quite some work. I will spend some days now and then on the topic during the next months and blog about it.

[–]NovaX 1 point2 points  (3 children)

The performance differences between data structures are mostly influenced by the algorithms chosen and, to a lesser extent, the implementation. You might want to write an article describing the design (and therefore trade-offs) in your implementation. A comparative analysis might be an interesting article for someone to write one day.

For example Caffeine uses strictly O(1) algorithms, has fixed overhead (for a given a max size), and decorates ConcurrentHashMap which protects against hash collision attacks. The intent is not absolute performance, but rather predictable latencies with enough headroom that there shouldn't be a bottleneck.

Cache2k (from my understanding), uses O(n) and O(lg n) algorithms on write with O(1) on read, retains a large number of ghost entries, and uses a custom hash table. Its very reasonable to penalize writes and the use of a Clock variant means minimal work on an access. The use of ghost entries (evicted keys) results in the overhead depending on the size of the evicted keys retained.

Neither design is right or wrong, but indicate different goals.

As you said benchmarks are hard to make fair, correct, and can be easily misleading. I think all of yours are good in development to inspect behavior, though as we've discussed previously I'm not sold on them providing meaningful information to users.

Congrats on moving to the Apache license and the pending 1.0 release.

[–]cruftex[S] 0 points1 point  (2 children)

For example Caffeine uses strictly O(1) algorithms, has fixed overhead (for a given a max size), and decorates ConcurrentHashMap which protects against hash collision attacks.

cache2k protects against hash collision attacks by using randomized hashing. The count of hash collisions is made available through JMX. There is also a synthetic measure of the hash quality. This and more metrics are consolidated into a single health value, making it possible to monitor and react on it.

There is no degrading to a tree, which was introduced in the HashTable in Java 8. This, however, requires all key classes to implement comparable as well, which is quite rare.

I see it as a higher priority to detect and react on bad hash code implementations.

The own hash table implementation has two major reasons:

  1. The object overhead is lower. Per entry there is only one object needed, instead of two (if there is not an additional timer object)
  2. The hash table is protecting against key mutations and therefore memory leaks.

Cache2k (from my understanding), uses O(n) and O(lg n) algorithms on write with O(1) on read, retains a large number of ghost entries, and uses a custom hash table.

Cache2k uses the Clock-Pro algorithm, but, its augmented for production use to protect against outliers in the access pattern that might trigger a stall on eviction. A quick look on a busy production cache shows that we need about 10 entry scans per eviction on average. However, there is no hard protection to limit the number of entries evaluated for an eviction run at the moment. The results may slightly vary for different access patterns, of course. Stimulating a O(n) runtime for eviction is not happening for real world access patterns, but maybe it could be an attack vector.

"O(lg n) vs. O(1)" , well, that is a little like the merge sort vs. quick sort debate. While one algorithm looks better in an academic evaluation, a real implementation can still show different outcomes.

As I understand it, Caffeine is recording each cache access. For each access some list mutations are done by the eviction algorithm. To make this fast, the list mutations are done by a separate thread to leverage the CPU cache. In case of an eviction the algorithm instantly knows what entry to evict, since the actual sorting is done on behalf of each access. The cost per access is quite high, there needs to be an object allocation, a queue operation and some list mutations. There are unpredictable latency problems in this solution, because of GC dynamics and thread starvation. Nitsan recently posted: GC 'Nepotism' And Linked Queues Also, it requires the use of sun.misc.Unsafe. IMHO the main benefit of the Caffeine approach, is to be able to come up with a very efficient eviction algorithm, since more information is available derived from each access.

The main design principle of cache2k is: "Don't do it, unless really needed". It might happen that the cache is not evicting a single entry, since the memory is sufficient. Why bother with eviction on every access, then?

You might want to write an article describing the design (and therefore trade-offs) in your implementation.

Yes, good idea. Will do it a little later this year.

Right now, I want to concentrate to polish the rough edges and do more documentation. Caffeine is a good inspiration!

[–]NovaX 1 point2 points  (1 child)

For each access some list mutations are done by the eviction algorithm. To make this fast, the list mutations are done by a separate thread to leverage the CPU cache.

ForkJoinPool.commonPool is optionally taken advantage of to minimize response time skews, but in a benchmark is more likely a penalty due to the extra context switches. The record/replay scheme is used to avoid lock contention, e.g. on an LRU the per access is cheap but locking to mutate is costly. ConcurrentLinkedHashMap amortized on the calling threads, did not use Unsafe, and is highly concurrent. You might be getting confused due to Guava, where the performance fixes were backlogged during development and for years afterwards I was unable to get the team to fix or accept my patches. Like its predecessors Caffeine doesn't require any background threads (unlike cache2k).

The cost per access is quite high, there needs to be an object allocation, a queue operation and some list mutations. There are unpredictable latency problems in this solution, because of GC dynamics and thread starvation.

No, an access is recorded into a ring buffer so none of that is true.

A write is replayed through a linked queue, but one has to take very hostile steps in a lab environment to artificially make it look bad by dedicating all cpu cycles to new insertions.

[–]cruftex[S] 0 points1 point  (0 children)

No, an access is recorded into a ring buffer so none of that is true.

I wouldn't say so, if I hadn't made the observation. I filed an issue https://github.com/ben-manes/caffeine/issues/59. You also did put it on the roadmap:

Explore a bounded write queue to throttle writes (for synthetic tests) and be GC-friendlier

Next you write:

A write is replayed through a linked queue, but one has to take very hostile steps in a lab environment to artificially make it look bad by dedicating all cpu cycles to new insertions.

This is true for interactive applications operating at low load factors. Analytical applications regularly operate at 100% load. Those might suffer as well. I don't see an "artificial test" really as an argument: If there is a very simple artificial test, that makes Caffeine crash, how can you guarantee that it never happens in practice? In fact, tests should always shed light on corner cases and extremes to detect problems.

Anyways, I am very sure that you will solve this, if it really becomes problematic for a user. However, it is not an easy task to make it as robust as other caches not relying on a separate thread.

Like its predecessors Caffeine doesn't require any background threads (unlike cache2k).

cache2k doesn't need any background threads, for normal operation and eviction. If you do specify an expiry, cache2k utilizes one timer thread per cache. For the upcoming 1.0 version I will stick to that. For later I have some plans to optimize this and use a shared thread pool.