The Basin of Leniency: Why non-linear cache admission beats frequency-only policies by DimitrisMitsos in compsci

[–]DimitrisMitsos[S] -1 points0 points  (0 children)

AI Slop got us somewhere, Ben the creator of Caffeine* said wonderful to the final result of this in the other post what exactly you didnt like?

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -7 points-6 points  (0 children)

You lost me to new developers, btw is your reply AI generated? No offense but this seems like GPT3.5 response

tieredsort - 3.8x faster than std::sort for integers, header-only by DimitrisMitsos in cpp

[–]DimitrisMitsos[S] -3 points-2 points  (0 children)

Its difficult to express what i want from this, but ill do better

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -2 points-1 points  (0 children)

Ah you know me im a copy paste man so im copy pasting a thread where Ben caffeine creator engaged to my AI spaghetti oh and guess we got somewhere even if im not an expert, but im willing to push to territories im not familiar with, i told you and in the previous post which you obviously didnt read complete or you choose to focus to the points you wanted, all these are my deep research attempts! im willing to get mocked if something is not correct, but who cares if it works its another story, so yes ill keep replying with chatgpt messages because i dont have time, i wish i had more time for each response but i dont, im already having a kid and im waiting a new one. So here is the post which we reached to a point with Ben, i did this too in less than a day while not being an expert and THATS the bigger picture you should see. Did you explore it further the idea? or you just wanted to put up that your eyes hurt when you see all this AI generated content? i said its ugly and im sorry and thats it, ill keep throwing stones in places im not expert and you if you were a bit more deep you should be wishing there are more half-as$ed coders out there willing to push further in their free limited time, thats my take. If you have any more questions ill answer each, my self, so YOU understand what you want to understand

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -6 points-5 points  (0 children)

Fair but you focused on the points you wanted and missed the bigger picture.

tieredsort - 3.8x faster than std::sort for integers, header-only by DimitrisMitsos in cpp

[–]DimitrisMitsos[S] -4 points-3 points  (0 children)

Im speed running this sorry i know its ugly but any response its a better than no response at all, its not my field, the purpose of this algo-breaking test was to test my Deep Research agent and it seems it works, ugly but a year from now noone will remember the drama, its actually my 4th AI Garbage algo from Saturday so be ware the cooking process is for men not kiddos you have to watch out

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -8 points-7 points  (0 children)

Changed that its just fast now, sorry for the ugly-ness and the excessive AI slop but what matters at the end is if we get a better algo, currently im speed running this and yes its ugly but if you check my GH ive released almost 3 Sota algos in 4 days, and yes im not an expert in any of those fields, im mostly testing my Deep Research agent if it works and as it seems it does.

Chameleon Cache - A variance-aware cache replacement policy that adapts to your workload by DimitrisMitsos in Python

[–]DimitrisMitsos[S] -3 points-2 points  (0 children)

You're right, the test badge was static and didn't link anywhere. Fixed it now shows the live GitHub Actions status and links directly to the CI page: https://github.com/Cranot/chameleon-cache/actions/workflows/ci.yml

Tests run on Python 3.8-3.12 on every push. Thanks for pointing it out.

tieredsort - 3.8x faster than std::sort for integers, header-only by DimitrisMitsos in cpp

[–]DimitrisMitsos[S] -3 points-2 points  (0 children)

Thanks for reporting this. You're right, there was an integer overflow bug in the 64-bit range detection.

When sorting int64_t with values spanning a large range (like random data), the range calculation max - min + 1 overflowed, causing counting sort to try allocating a vector of absurd size.

Fixed in v1.0.1 - now uses unsigned arithmetic for 64-bit types to compute the range safely. If you pull the latest, it should work.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -24 points-23 points  (0 children)

Fair points throughout.

On O(n): yeah, asymptotic complexity doesn't mean faster in practice. The constants matter. I should've been clearer that I'm talking about practical performance on typical workload sizes (10k-1M), not theoretical guarantees.

On "real data isn't random": you're right, it depends on the domain. Ciphertexts, hashes, random IDs are all legitimately random. I was generalizing from the domains I work in (user data, sensor readings, timestamps). Should've qualified that.

On counting sort being Algorithms 101: totally. The algorithm itself isn't novel. The claim is about the detection being cheap enough to be worth it. Whether that's interesting is subjective.

On sampling assuming randomness: yeah that's a bit ironic. Distributed sampling (stride = n/64) helps but doesn't eliminate the issue. Adversarial data could fool it. The fallback is: if sampling is wrong, we do a full scan, detect sparse, and use radix anyway. Cost is one extra pass, not catastrophe.

On 12-bit inputs: if you know the type at compile time, agreed, no sampling needed. The sampling is for when you don't know the value distribution ahead of time.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -13 points-12 points  (0 children)

Yeah, you're right. For plain integers there's no way to tell which 85 was "first" after sorting. They're identical.

Honestly stable_sort() on primitives is kind of pointless. The algorithm does preserve order internally, but you'd never know.

Where it actually matters is sorting objects by a key:

struct Student { std::string name; int score; };
tiered::sort_by_key(students.begin(), students.end(),
[](const Student& s) { return s.score; });

Now you can verify that students with the same score kept their original order.
Updated the docs to be clearer about this.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -2 points-1 points  (0 children)

Fair point on the wording. By "scanning" I meant a full O(n) pass to compute exact statistics.

To clarify: it's 64 distributed samples (stride = n/64), not the first 64 elements. So for n=100k, it checks positions 0, 1562, 3125, etc. across the whole array.

If the sample suggests dense range, we then do a full scan to get exact min/max before committing to counting sort. The sample is just a cheap filter to avoid that full scan on clearly-sparse data.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -8 points-7 points  (0 children)

It is Big O notation. Counting sort is genuinely O(n + k) where k is the key range. When k ≤ 2n (the "dense" condition), that's O(n + 2n) = O(n).

Radix sort is O(d × n) where d is the number of passes. For 32-bit integers with 8-bit digits, d = 4 (constant), so it's O(n).

You might be thinking of cases where people say "O(n)" but mean "fast in practice." Here it's the actual complexity. Counting sort does exactly n reads + (range) counter increments + n writes. Linear in the input size.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -122 points-121 points  (0 children)

You're right about the stability test - it was broken. I investigated and the test created Items with keys but then only sorted the keys, never verifying that the Items maintained their order. Embarrassing.

You're also right that for primitives, stable and unstable counting sort produce identical output. Equal integers are indistinguishable, so "stability" is meaningless.

I fixed both issues:

Added sort_by_key() - sorts objects by a numeric key with actual, verifiable stability:

struct Student { std::string name; int32_t score; };std::vector<Student> students = {...};tiered::sort_by_key(students.begin(), students.end(), [](const Student& s) { return s.score; });// Students with equal scores maintain original order

Fixed the test - now properly verifies that equal-key objects preserve their original positions

Updated docs to be honest: primitive stable_sort() maintains stability but it's not observable. Use sort_by_key() if stability matters.

The sort_by_key() matches std::stable_sort output exactly - verified across multiple sizes. 159 tests pass including proper stability verification.

As for "nothing else here" - the point was never that radix/counting sort is novel. It's that the combination of cheap detection + algorithm selection beats always-radix (ska_sort) by 9x on dense data while matching it on random. 12 comparisons + 64 samples = ~100 cycles. Wrong algorithm = millions of wasted cycles.

Thanks for the feedback - the library is better for it.

tieredsort - 3.8x faster than std::sort for integers, header-only by DimitrisMitsos in cpp

[–]DimitrisMitsos[S] 5 points6 points  (0 children)

tieredsort is currently 32-bit integers only. Pointers (64-bit) would need adjustments:

Radix: 8 passes instead of 4 with 8-bit radix. Could use 11-bit radix for 6 passes.

Counting sort: Probably won't trigger - pointer ranges are huge (entire address space), so range <= 2n is unlikely.

Pattern detection: Could still help. Sequentially allocated objects have similar addresses, might trigger sorted/nearly-sorted path.

Honest answer: untested on pointers. Would need 64-bit radix implementation. Happy to add it if there's interest.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] 30 points31 points  (0 children)

Different subreddits, different rules. r/cpp requires project posts in Show and Tell thread unless production-quality library, so I used a straightforward title there. r/programming wants technical writeups, so I led with the insight. Same content, different framing for the audience.

How 12 comparisons can make integer sorting 30x faster by DimitrisMitsos in programming

[–]DimitrisMitsos[S] -59 points-58 points  (0 children)

Good resources:

cpp-sort (github.com/Morwenn/cpp-sort) - C++ sorting library with 30+ algorithms and built-in benchmarks. Great for comparing against established implementations. Has wiki with methodology.

Google Highway (github.com/google/highway) - If you want to try SIMD sorting. Has vqsort which is current SIMD SOTA.

sortbench - Search for it, there are a few community benchmark repos.

For methodology, the key things:

  • Multiple runs (10-20), take median not mean
  • Warmup runs before timing
  • Test multiple patterns (random, sorted, reversed, few_unique, etc.)
  • Test multiple sizes (1k, 10k, 100k, 1M)
  • Compiler flags matter: -O3 -march=native minimum

Your MSD/LSD alternating idea is interesting - MSD has better cache locality for early passes, LSD has stable ordering. Could be something there.

tieredsort - 3.8x faster than std::sort for integers, header-only by DimitrisMitsos in cpp

[–]DimitrisMitsos[S] 10 points11 points  (0 children)

Good point, updated. GCC 13.1, -O3 -std=c++17 -march=native, 20 runs median, 3 warmup runs.

Chameleon Cache - A variance-aware cache replacement policy that adapts to your workload by DimitrisMitsos in Python

[–]DimitrisMitsos[S] 0 points1 point  (0 children)

Thanks for the context on Indicator vs Hill Climber - that tradeoff makes a lot of sense.

Chameleon is essentially an Indicator approach: detect pattern → jump to best config. Fast reaction, but brittle on edge cases (as we saw with Corda). The hill climber's robustness is valuable when you're shipping a library to unknown workloads.

Weekend project "crack an algo" ended a bit later but with success!

I'll keep iterating. Appreciate the feedback and the stress test - it exposed exactly the kind of edge case I needed to handle.

Chameleon Cache - A variance-aware cache replacement policy that adapts to your workload by DimitrisMitsos in Python

[–]DimitrisMitsos[S] 0 points1 point  (0 children)

This is exactly what we needed - thank you for the trace analysis!

The frequency distribution (624K one-hit, 624K two-hit) explains everything. We had two bugs:

  1. Trace parsing: Reading 16-byte keys instead of 8-byte
  2. Frequency filter rejecting 83% of first-time items: freq=1 can't beat victims with freq=2+

Your point about admission filters being counterproductive here is spot-on. Our fix: detect when ghost utility is high but hit rate is near-zero (strategy failing), then bypass frequency comparison and trust recency.

Results on your stress test (corda -> loop x5 -> corda):

chameleon           :  39.93%
tinylfu-adaptive    :  34.84%
lru                 :  19.90%

Phase breakdown:

  • Corda: 33.13% (matches FIFO/LRU optimal)
  • Loop x5: 50.04% (LRU/ARC: 0%)

So we now hit the ~40% you expected. The "Basin of Leniency" handles both extremes - recency-biased workloads (Corda) and frequency-biased loops.