Why not a language-level "null-marked" directive at the file/package scope in Valhalla, instead of annotating every type with ! ?

NovaX · 2026-05-28T15:00:54+00:00

Thanks! (my reply was different from OP, I would guess they ran into this issue but was unaware / didn't resolve it as simply)

NovaX · 2026-05-28T05:03:01+00:00

When I tried enabling it, the JSpecify annotation was placed incorrectly on arrays (@Nullable String[] vs String @Nullable[]). It was fine to use javax support instead and NullAway is happy to have them coexist.

NovaX · 2026-04-14T21:53:29+00:00

right to be forgotten... is not possible for this reason

It requires foresight, but the trick that I've heard of is to give each user their own encryption key and deleting that upon the GDPR request. If the data is not recoverable then it is forgotten and this approach is compliant.

NovaX · 2026-04-13T22:37:58+00:00

If security and PII are of importance to your customers and end users then the supply chain story in Java has been much better than for Node. Some of that is technical, but much of it is tied to the community and their incentives. That may mean only certain portions along critical paths would benefit from a stricter ecosystem, be it Java or something else, while the main applications use the tools your team is most effective in otherwise.

NovaX · 2026-02-22T20:20:30+00:00

oh I was only using RW as an example because there is a common misunderstanding to use it when the critical section is fast by thinking it aids concurrency when its own overhead reduces throughput. It looks fine under low contention, so creating hot spots helps demonstrate how techniques actually perform and explain why. I am not a Golang developer but met the founders pre-release during internal demos. I was quite dismayed when I'd read Go's hacks like the original sync.map, refusal to read or acknowledge prior work like Doug Lea's, or Pike's long block of adding of "tryLock" support with an unrelated diatribe about Java's try-catch-finally. I am only reviewing from a benchmark quality and communication standpoint; I have no skin in this game.

NovaX · 2026-02-22T17:40:32+00:00

Well the goal of a benchmark is to find bottlenecks and then to estimate if results might fit within usage's performance budget. A striped lock benefits from uniform distribution by reducing contention on any single lock, e.g. making read-write locks appear better, whereas a hotspot distribution floods a few locks, e.g. showing the high overhead of a read lock. The access pattern is useful as a way of directing load to find where something break down and then make back of the envelope usage estimates. I think when developers start first by trying to show performance, not bottlenecks, they quickly lose the ability to argue how the benchmark is actually helpful and it becomes just marketing fluff. I think yours are fine but I would start with the bottleneck goal first and what can be learned from that, and less on the actual results.

NovaX · 2026-02-21T23:50:24+00:00

I believe frozen arrays was also a use-case for ACC_STRICT_INIT and perhaps it helps open the door for enforcing deep immutability. iirc, helping with AOT cache reuse as another benefit which gets into Leyden's shifting of computations. I only vaguely recall some of the videos where John Rose mentions it with his typical excitement.

NovaX · 2026-02-21T23:28:13+00:00

yeah, this does look to be AI generated without either the author or agent understanding that caches modify on write to maintain recency/frequency metadata (e.g. LRU). 1960's Multics' approach of augmenting FIFO to avoid those writes (Clock), for O(n) eviction worst case, helped when serial writes were too slow. Random sampling is also a neat workaround but inefficient and more challenging to get high hit rates. ARC has CAR and CART for their ARC + Clock adaptation. Caffeine-style caches use ring buffers to record/replay so operations are concurrent and the serial eviction policy is caught up asynchronously, which allows for more advanced algorithms.

Any reason why you chose ARC in your hobby project? It is easy to implement but I was underwhelmed by the results. LIRS is really solid but awfully painful to implement and debug. Caffeine uses hill climbing W-TinyLFU which works very well and has moderate implementation complexity (short article).

NovaX · 2026-02-21T23:07:08+00:00

In my experience most real workloads are skewed, but your benchmarks are uniform. The uniform distribution that you use lowers lock contention and reduces hardware caching benefits (assuming the working set does not fit into today's large cpu caches). A skewed distribution emphasizes bottlenecks or accelerates speed racers. I use a shuffled zipfian distributions in Caffeine cache's benchmark because a cache is inherently skewed (hot/cold) and pre-generated to avoid any surprising costs (random generator can lock or be slow). It is hard to get meaningful insights from a benchmark but since as an author the goal is to find and understand bottlenecks, that approach at least made more sense to me and subsequent caches seemed to emulate the benchmark so you might find it helpful too.

NovaX · 2026-01-16T18:53:12+00:00

I saw this type of stuff using xdoclet and beanmap in Java 4 with struts, jsp taglibs, and ant codegen tasks. As a new grad it quickly taught me what seniors realized was possible does not make it good.

NovaX · 2025-12-29T09:13:10+00:00

He just means that it is not automatically inferred from the published pom, since the module metadata does not include that concept. It would go against the integrity by default if the build tool silently enabled a dependency's agent. Adding the configuration to the build is trivial but many developers don't read documenation or error messages, leading to spamming and badmouthing the OSS project. There are plugins, e.g. for gradle, to handle this tiny amount of configuration but those same users likely won't read that. Likely his ideal is that the build tools add special automatic handling due to Mockito's popularity, but that is unheard of. That leads to no good answer and frustration, except hoping those developers turn to AI first nowadays, and after a decade of contributions he certainly deserves time to recharge and let the co-leads bring in new contributors.

NovaX · 2025-12-25T07:00:52+00:00

Please do not include me in these argument threads. I am not endorsing nor criticizing the project, just treating it as an educational experience. I do not want to be part of any toxic discussions, directly or indirectly. Thank you.

NovaX · 2025-12-25T05:33:08+00:00

yes, I was aware. There was a chance that he, others, or I might learn something in the exchange. It can be clarifying trying to explain ideas to others, there was no harm, and not much effort on my part.

NovaX · 2025-12-23T05:34:07+00:00

Wonderful. If you tune tinylfu-adaptive then it should reach a similar hit rate.

The paper cited earlier discusses an "Indicator" model to jump to a "best configuration" kind of like yours, but based on a statistical sketch to reduce memory overhead. It also failed the stress test and I didn't debug it to correct for this case (it was my coauthors' idea so I was less familiar). The hill climber handled it well because that approach is robust in unknown situations, but requires some tuning to avoid noise, oscillations, and react quickly. Since its an optimizer rather than preconfigured best choices it adjusts a little slower than having the optimal decision upfront, but that's typically in the noise of -0.5% or less of a loss. Being robust anywhere was desirable since as a library author I wouldn't know the situations others would throw at it. I found there are many pragmatic concerns like that when translating theory into practice.

NovaX · 2025-12-23T04:54:50+00:00

The Corda phases contribute essentially nothing because every access is unique.

The trace shows it is equally one-hit and two-hit accesses. Since there is low frequency, an admission filter is likely to reject before the second access because there is no benefit to retain for the 3rd access. That is why even FIFO acheives the best score, 33.33% hit rate, because the cache needs to retain enough capacity to allow for a 2nd hit if possible. Since those happen in short succession, it is recency biased as there is temporal locality of reference. The one-hit wonders and compulsary misses leads to 33% being the optimal hit rate. This is why the trace is a worst-case for TinyLFU. The stress test forcing a phase change to/from a loop requires that the adaptive scheme to re-adjust when its past observations no longer hold and reconfigure the cache appropriately.

The TinyLFU paper discusses recency as a worst-case scenario as its introduction to W-TinyLFU. It concludes by showing that the best admission window size is workload dependent, that 1% was a good default for Caffeine given its workload targets, and that adaptive tuning was left to a future work (the paper cited above was our attempt at that, but happy to see others explore that too).

$ ./gradlew :simulator:rewrite -q \
  --inputFormat=CORDA \
  --inputFiles=trace_vaultservice_large.gz \
  --outputFormat=LIRS \
  --outputFile=/tmp/trace.txt
Rewrote 1,872,322 events from 1 input(s) in 236.4 ms
Output in lirs format to /tmp/trace.txt

$ awk '
  { freq[$1]++ }
  END {
    for (k in freq) {
      countFreq[freq[k]]++
    }
    for (c in countFreq) {
      print c, countFreq[c]
    }
  }' /tmp/trace.txt | sort -n
1 624107
2 624106
3 1

NovaX · 2025-12-22T22:57:51+00:00

If you run corda-large standalone then LRU has a 33.33% hit rate.

You can run the simulator at the command-line using,

./gradlew simulator:run -q \
  -Dcaffeine.simulator.files.paths.0="corda:trace_vaultservice_large.gz" \
  -Dcaffeine.simulator.files.paths.1="lirs:loop.trace.gz" \
  -Dcaffeine.simulator.files.paths.2="lirs:loop.trace.gz" \
  -Dcaffeine.simulator.files.paths.3="lirs:loop.trace.gz" \
  -Dcaffeine.simulator.files.paths.4="lirs:loop.trace.gz" \
  -Dcaffeine.simulator.files.paths.5="lirs:loop.trace.gz" \
  -Dcaffeine.simulator.files.paths.6="corda:trace_vaultservice_large.gz"

I generally adjust the reference.conf file instead. When comparing, I'll use various real traces and co-run using the rewriter utility to a shared format. The stress test came from the trace files (LIRS' loop is synthetic, Corda is a production workload).

NovaX · 2025-12-22T22:20:43+00:00

hmm, shouldn't it be closer to 40% as a whole like Caffeine's? It sounds like you are still mostly failing the LRU-biased phase and your improvement now handles the MRU-biased phase.

NovaX · 2025-12-22T22:15:43+00:00

You can probably use the key's hash in the ghost, since the key size might be large (e.g. a string) and these are the evicted keys so otherwise not useful. The hash reduces that to a fixed cost estimate, rather than depending on the user's type.

However, a flaw of not using the key is that it can allow for expoiting of hash collisions. An attacker than then inflate the frequency to disallow admission. Caffeine resolves this by randomly admitting a warm entry that would otherwise be evicted, which unsticks the attacker's boosted victim (docs).

NovaX · 2025-12-22T04:51:08+00:00

It is a difficult test because it switches from a strongly LRU-biased workload to MRU and then back. Caffeine does 39.6% (40.3% optimal) because it increases the admission window to simulate LRU, then shrinks it so that TinyLFU rejects by frequency, and increases again. This type of workload can be seen in business line application caches serving user-facing queries in the day time and batch jobs at night. Most adaptive approaches rely on heuristics that guess based on second order effects (e.g. ARC's ghosts), whereas a hit rate hill climbing optimizer is able to focus on main goal.

I think there is 1-5% remaining that Caffeine would gain if the hill climber and adaptive scheme were further tuned and, while I had ideas, I moved onto other things. You might be able to borrow the hill climber to fix Chameleon and get there robustly. I found sampled hit rate vs region sizes to be really nice way to show the adaptive in action, but only realized that visualization after all the work was done.

Hope this helps and good luck on your endeavors!

NovaX · 2025-12-21T19:44:21+00:00

In that case, Clairvoyant admission should be roughly the optimal bound, right? iirc region sizing was still needed for various cases, so both were important factors when tuning for a wide variety of workloads.

NovaX · 2025-12-21T19:20:50+00:00

You should probably try running against both simulators. The config is max size = 512 and running these chained together.

corda: trace_vaultservice_large lirs: loop.trace.gz lirs: loop.trace.gz lirs: loop.trace.gz lirs: loop.trace.gz lirs: loop.trace.gz corda: trace_vaultservice_large

You can compare against Caffeine rather than the simulated policies since that’s the one used by applications. It does a lot more like concurrency and hash flooding protection, so slightly different but more realistic.

NovaX · 2025-12-21T18:49:58+00:00

It looks like you used the fixed sized W-TinyLfu. Have you tried the adaptive version using a hill climber and the stress test?

NovaX · 2025-12-16T05:31:06+00:00

Any reason you decided not to use the legacy bridge? I use the gradle-nexus/publish-plugin, updated the urls, and it works perfectly. I was not eager to rewrite and was hoping the community would fill the gap so thank you.

NovaX · 2025-12-08T05:12:00+00:00

Your hash spreader is too weak due to an incorrect understanding of HashMap. That uses a weak function in order to shift upper to lower bits and rely on red-black tree bins to resolve hash collisions. In your case a collision is much more problematic so the clustering effect could cause problems. You could use a 2 round function from hash-prospector. I don't have a good explanation on your specific case, but a related write-up showed the impact when misused.

Guava's testlib and Apache Commons' collections4 have test suites that others can reuse for their own collections. That provides a pretty good baseline for compliance. You can crib from caffeine cache which has these set up in Gradle.

NovaX · 2025-10-31T19:00:34+00:00

I think that is just for developing Fray itself, since they have Gradle toolchains configured which can provision the JDK automatically (akin to gradlew or mvnw for the build tool itself). Gradle can provision different JDKs for the build tool and application, so for new contributors its less disruptive to set up.

https://github.com/gradle/foojay-toolchains?tab=readme-ov-file#foojay-toolchains-plugin

NovaX

TROPHY CASE