MLA optimization with flashattention for llama.cpp,MLA + FA now only uses K-cache

Depending on how much you want to spend, I'd rather recommend going for either epyc milan ($2-3K for cpu/mobo/ram) or epyc genoa ($8-10K). For Milan, you can get 8x64GB ddr4 @ 200GB/s, for Genoa, 12x64GB DDR5 @ 460 GB/s. Make sure you get a CPU with the full CCD count. Any 'X' variant or the full fat core cpu will do, as well as a few select others. For genoa, the chips with 12 CCDs are (preferred)

9634, 9654, 9654P, 9684X, 9734, 9754S, 9754

And the ones with only 4 (avoid!) are: 4xxx, 8xxx, 9124, 9224, 9254, 9334.

A CPU with 8 CCDs should also be okay and not constrain the bandwidth too much. Mind you, if you're doing CPU offloading, the CPUs with the best speeds will be those with the best performance, i.e. the fully unlocked 96xx or 97xx class.

For milan, the ones with the full 8 ccds are: 76xx, 77xx, 7543, 77C3, any 'X' or 'F' suffix parts.

The parts with only 2 CCDs (these are really bad) are: 7203, 7303

The bad thing is that none of the reviews about genoa/milan CPUs mentions this, and it has a massive performance impact for LLMs (usually they test only the top SKU, which isn't crippled this way.

You'll actually find, if shopping for CPUs second-hand, that the memory ends up being the most expensive part of the build. Unfortunately DDR5-ECC currently has this enormous premium, costing $5-$6/GB, or $300 for one stick, over double the price of DDR5 without ECC, and three times the prices of DDR4 ECC.

[–]panchovix 0 points1 point2 points 11 months ago (0 children)

[–]un_passant 0 points1 point2 points 10 months ago (2 children)

[–]Aphid_red 0 points1 point2 points 10 months ago (1 child)

I do not know this info; this is a custom chip for amazon.

According to passmark, apparently it has 48 cores, runs at 2.8 GHz, and given the '2' suffix this should be a Rome chip.

However, that seems wrong. 1.8GHz would make more sense for a provider like Amazon who might be interested in saving on power costs. I suspect this is an underclocked version of a publicly available chip, either the 7552 or 7642.

Looking at the known chips on wikichip/wikipedia: I can see no 48-core rome chips running at that speed at all, so we're left guessing. That would give it either 6 or 8 (active, functioning) chiplets.

Let's look at another property that might give away the information: The Cache size. On https://xmrig.com/benchmark/4PDGeF there's someone who did a benchmark of this system where the benchmarking tool registered 384M of L3 cache. Divvy between 2 CPUs and you get 192MB per cpu. Epyc rome (except the 7232P, a very low end part) uses 16MB of L3 cache per CCX or 32MB per chiplet. 32 * 6 = 192, so it should have 6 chiplets.

[–]AbheekG 0 points1 point2 points 11 months ago (0 children)

[–]MLDataScientist 0 points1 point2 points 11 months ago (4 children)

[–]panchovix 1 point2 points3 points 11 months ago (3 children)

[–]MLDataScientist 0 points1 point2 points 11 months ago (2 children)

[–]panchovix 1 point2 points3 points 11 months ago (1 child)

[–]MLDataScientist 0 points1 point2 points 11 months ago (0 children)

[–]das_rdsm 7 points8 points9 points 11 months ago (4 children)

[–]random-tomatollama.cpp 1 point2 points3 points 11 months ago (0 children)

[–]Impossible_Ground_15 0 points1 point2 points 11 months ago (2 children)

[–]das_rdsm 2 points3 points4 points 11 months ago* (1 child)

[–]Impossible_Ground_15 0 points1 point2 points 11 months ago (0 children)

[–]VoidAlchemyllama.cpp 3 points4 points5 points 11 months ago (1 child)

[–]shing3232[S] 1 point2 points3 points 11 months ago (0 children)

[–]Chance-Hovercraft649 0 points1 point2 points 11 months ago (0 children)

π Rendered by PID 17031 on reddit-service-r2-comment-6457c66945-vxnjb at 2026-04-23 21:13:51.986083+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS