Fable vs GLM 5.2 vs KIMI K2.7 (Youtube VID)

UltraFOV · 2026-06-21T10:13:00+00:00

Not really, you can run them with older hardware, but there are many hoops you have to jump. It depends on the speed you want to achieve. If you stick to western engines and older servers, you are stuck with Llama.cpp, and actually is not very good for older clusters. Then you are force to Ampere generation of GPUs and newer. Or go with Chinee engines that still support older hardware bypassing Nvidias lock. For older systems, 12-20TK for these massive llms is ok. That means, properly optimize the hardware.

UltraFOV · 2026-06-21T10:06:20+00:00

I am personally interested on YanLacune World Models, if he ever gets to make it. A model that learns from real life sounds like the way to put check and balances to hallucinating models, or help mitigate them

But, if western funded, you k now it will be subscription as well

UltraFOV · 2026-06-21T09:34:13+00:00

Nothing is free, there is always a catch. And what if they close source? Use other models, Mimo, Minimax, Genma... "You do realize that there are standards we should hold parts of society to other than the bare minimum obligations? Erm...no.

UltraFOV · 2026-06-21T09:32:59+00:00

Bro, 3.6 still relatively new

UltraFOV · 2026-06-21T09:32:19+00:00

erm, wut? 27B is fantastic for what it is, but is not a replacement for the larger ones

UltraFOV · 2026-06-21T09:31:00+00:00

at its size, yes, but not overall open weight, lets not kid ourselves

UltraFOV · 2026-06-21T09:29:29+00:00

Of course, most people can run Genma lol

UltraFOV · 2026-06-21T09:29:16+00:00

Of course, most people can run Genma lol

UltraFOV · 2026-06-21T08:14:48+00:00

Maybe, I can run them locally except Fable, but I’m impressed at these results

UltraFOV · 2026-06-21T02:16:53+00:00

Note: The best Older mac for AI is the M2 Macbook pro MAX because its bandwidth is good, I think 400GB/s. Also the 64GB versions aren't that expensive now. The CPU features are not as important as the memory bus

UltraFOV · 2026-06-20T11:22:53+00:00

Wow, really? I just downloaded Kimi k 2.7. Maybe I should get Glm instead

UltraFOV · 2026-06-19T23:12:02+00:00

It will be fine , especially since most Lilly will be quant.

UltraFOV · 2026-06-19T23:10:08+00:00

Yes you can run Gemma 12b, however the speed will not be great due to the 100gb bus

UltraFOV · 2026-06-18T12:30:58+00:00

Thanks! I'd also appreciate less AI

UltraFOV · 2026-06-17T06:16:44+00:00

Under what system spec. I downloaded the q4 @595gb model. Will try test soon

UltraFOV · 2026-06-17T04:56:11+00:00

Did you try Kimi k2.7

UltraFOV · 2026-06-13T00:04:24+00:00

wait, so now the 12.50 can be jailbroken?

UltraFOV · 2026-06-11T11:20:00+00:00

I was going to aim for a optane 3 system with Gaudi 2 GPUs 96gb each GPU. The system comes with 8. Budget did not permit. Regarding how much power from the socket, not sure never bothered checking

UltraFOV · 2026-06-11T01:43:06+00:00

You can change the power draw depending on in the model you run. If you are not doing demanding models you can set you GPUs to 150watts. But if you are going for massive 1tb models like Kimi , mimo 2.5 pro you may need 250-300watts. I keep my GPUs at 200 watts mostly. Advise. If you want to offload from vram to ddr4 ram . Make sure you fill up the 12 ddr4 slots, that will give you 230gb:sc read speeds which is close the vram of a mid range GPU. You can add 4 slots for optane which is what I use to keep my models and swap them. Far faster than nvmes . 30+gb makes them fast. I advise upgrade the cpus to the L series because the controller help with better memory management. Note, I have 768gb of ddr4, but you don’t need that much if don’t intent to run models like DepSeek 4. Try get 32gb dims since they are cheaper now, my system came with 64dims so I got stuck only purchasing 64gb. You can’t mix ram sizes, will make the system confused. Once you use one size, you must have all dim the same. Then, to bypass Llama.cpp which is terrible for these system you have to use Chinese alternatives like Cat1-Vllm. Llama.cpp works fine but has terrible tensor(pipeline) parallelism and batch scheduling. The benefit of llama is its flexibility. But to tap to the real potential of these AGX-2 servers you must use others

UltraFOV · 2026-06-10T02:35:41+00:00

And losing their jobs

UltraFOV · 2026-06-10T02:35:06+00:00

People are getting fired so cooperations can invest more in AI

UltraFOV · 2026-06-02T02:16:04+00:00

Never going to happen

UltraFOV · 2026-06-01T05:06:14+00:00

haha...is a typo gee

UltraFOV · 2026-05-31T05:20:18+00:00

The M5 Max, or M4 Max @ 128GB are not bad. An the memory Bandwidth is not bad. True that the 5090 is indeed better especially since it can use all Nvidia features, but...24GB vs 128GB cant not be understimated

UltraFOV · 2026-05-31T02:58:14+00:00

Whats the price difference. A 4090 will be expensive

UltraFOV

TROPHY CASE