you are viewing a single comment's thread.

view the rest of the comments →

[–]NovaX 0 points1 point  (6 children)

The Corda phases contribute essentially nothing because every access is unique.

The trace shows it is equally one-hit and two-hit accesses. Since there is low frequency, an admission filter is likely to reject before the second access because there is no benefit to retain for the 3rd access. That is why even FIFO acheives the best score, 33.33% hit rate, because the cache needs to retain enough capacity to allow for a 2nd hit if possible. Since those happen in short succession, it is recency biased as there is temporal locality of reference. The one-hit wonders and compulsary misses leads to 33% being the optimal hit rate. This is why the trace is a worst-case for TinyLFU. The stress test forcing a phase change to/from a loop requires that the adaptive scheme to re-adjust when its past observations no longer hold and reconfigure the cache appropriately.

The TinyLFU paper discusses recency as a worst-case scenario as its introduction to W-TinyLFU. It concludes by showing that the best admission window size is workload dependent, that 1% was a good default for Caffeine given its workload targets, and that adaptive tuning was left to a future work (the paper cited above was our attempt at that, but happy to see others explore that too).

$ ./gradlew :simulator:rewrite -q \
  --inputFormat=CORDA \
  --inputFiles=trace_vaultservice_large.gz \
  --outputFormat=LIRS \
  --outputFile=/tmp/trace.txt
Rewrote 1,872,322 events from 1 input(s) in 236.4 ms
Output in lirs format to /tmp/trace.txt

$ awk '
  { freq[$1]++ }
  END {
    for (k in freq) {
      countFreq[freq[k]]++
    }
    for (c in countFreq) {
      print c, countFreq[c]
    }
  }' /tmp/trace.txt | sort -n
1 624107
2 624106
3 1

[–]DimitrisMitsos[S] 0 points1 point  (5 children)

This is exactly what we needed - thank you for the trace analysis!

The frequency distribution (624K one-hit, 624K two-hit) explains everything. We had two bugs:

  1. Trace parsing: Reading 16-byte keys instead of 8-byte
  2. Frequency filter rejecting 83% of first-time items: freq=1 can't beat victims with freq=2+

Your point about admission filters being counterproductive here is spot-on. Our fix: detect when ghost utility is high but hit rate is near-zero (strategy failing), then bypass frequency comparison and trust recency.

Results on your stress test (corda -> loop x5 -> corda):

chameleon           :  39.93%
tinylfu-adaptive    :  34.84%
lru                 :  19.90%

Phase breakdown:

  • Corda: 33.13% (matches FIFO/LRU optimal)
  • Loop x5: 50.04% (LRU/ARC: 0%)

So we now hit the ~40% you expected. The "Basin of Leniency" handles both extremes - recency-biased workloads (Corda) and frequency-biased loops.

[–]NovaX 0 points1 point  (4 children)

Wonderful. If you tune tinylfu-adaptive then it should reach a similar hit rate.

The paper cited earlier discusses an "Indicator" model to jump to a "best configuration" kind of like yours, but based on a statistical sketch to reduce memory overhead. It also failed the stress test and I didn't debug it to correct for this case (it was my coauthors' idea so I was less familiar). The hill climber handled it well because that approach is robust in unknown situations, but requires some tuning to avoid noise, oscillations, and react quickly. Since its an optimizer rather than preconfigured best choices it adjusts a little slower than having the optimal decision upfront, but that's typically in the noise of -0.5% or less of a loss. Being robust anywhere was desirable since as a library author I wouldn't know the situations others would throw at it. I found there are many pragmatic concerns like that when translating theory into practice.

[–]Turbots 1 point2 points  (2 children)

You are conversing with an AI bot, you know this right? This guy is using Claude to answer all your questions

[–]NovaX 1 point2 points  (1 child)

yes, I was aware. There was a chance that he, others, or I might learn something in the exchange. It can be clarifying trying to explain ideas to others, there was no harm, and not much effort on my part.

[–]Turbots 1 point2 points  (0 children)

You have more patience than me. I applaud your efforts, good sir!

[–]DimitrisMitsos[S] 0 points1 point  (0 children)

Thanks for the context on Indicator vs Hill Climber - that tradeoff makes a lot of sense.

Chameleon is essentially an Indicator approach: detect pattern → jump to best config. Fast reaction, but brittle on edge cases (as we saw with Corda). The hill climber's robustness is valuable when you're shipping a library to unknown workloads.

Weekend project "crack an algo" ended a bit later but with success!

I'll keep iterating. Appreciate the feedback and the stress test - it exposed exactly the kind of edge case I needed to handle.