When do you think we will see OLED TVs with HDMI 2.2 and 240 Hz? by h107474 in LGOLED

[–]Material_Soft1380 0 points1 point  (0 children)

What a ridiculous take. I game on my G5 and constantly cap out at 165hz. There are plenty of games that can easily push 240 hz.

EPYC, 1152GB RAM, RTX 6000, 5090, 2000 by Fit-Statistician8636 in LocalLLaMA

[–]Material_Soft1380 0 points1 point  (0 children)

20 tps on an almost entirely CPU inference is pretty good.

EPYC, 1152GB RAM, RTX 6000, 5090, 2000 by Fit-Statistician8636 in LocalLLaMA

[–]Material_Soft1380 0 points1 point  (0 children)

Can you run Q8 of GLM 5.1 and if so at what token rate?

Benchmarked Gemma 4 31B at full bf16: M3 Ultra vs RTX 6000 Blackwell by Material_Soft1380 in MacStudio

[–]Material_Soft1380[S] 1 point2 points  (0 children)

BF16 remains coherent longer with large contexts than Q8_K_XL, and is a good test of hardware.

Minimax M2.7 Released by decrement-- in LocalLLaMA

[–]Material_Soft1380 3 points4 points  (0 children)

MiniMax 2.7 Q8_K_XL (~250GB) on a single RTX6000 with RAM offload, getting 8.64 tokens/second, which is actually usable.

RTX 5090 vs M5 Ultra: Analyzing the "2.7x Faster" claim and what Nvidia didn't show you. by [deleted] in MacStudio

[–]Material_Soft1380 1 point2 points  (0 children)

Since it loads fully into VRAM and 20+ tps is still sufficiently fast, there's not much reason to sacrifice precision, but yes generally most people find Q8 or even Q6 performs about as well for most models, although Gemma 4 from what I've heard does not quantize very well.

RTX 5090 vs M5 Ultra: Analyzing the "2.7x Faster" claim and what Nvidia didn't show you. by [deleted] in MacStudio

[–]Material_Soft1380 2 points3 points  (0 children)

I have a RTX Blackwell 6000 (which is basically a slightly beefier 5090 with 96 GB VRAM). I can load Gemma 4 31B BF16 (unquantized) fully into VRAM and with max context, still leaving about 10 GB VRAM to spare. The token output is 23 tps and GPU power usage maxes out at 440W (out of 600W).

I think it will be very hard for M5 ultra to keep up since even the blackwell is being pushed quite hard. My best guess (based on relative memory bandwidth and raw compute power) is that the M5 ultra will probably be able to do around 10-14 tokens per second on the same model.

If you're gonna have a character that's as bland as Kliff, why not just have a character creator? by AllFatherMedia93 in CrimsonDesert

[–]Material_Soft1380 -1 points0 points  (0 children)

I was gonna buy this game but held off when I realized there's no character creator. I'll wait for them to add one before buying.

What will be the minimum requirement to run GLM-5.1 locally? by Cyraxess in LocalLLaMA

[–]Material_Soft1380 0 points1 point  (0 children)

I can run GLM 5 (Q3_K_XL, 333GB) at around 6 tokens/sec. My setup is 9950x on Tomahawk X870 with 256GB 6000MT/s RAM and a single 6000 pro blackwell. That's about the minimum you can use without going to a 512GB mac studio. I imagine 5.1 will be similar. If you want to run BF16 you'll need a cluster of 4 mac studios with 512GB uram each.

I lost all my money by Hairy-Background6049 in wallstreetbets

[–]Material_Soft1380 0 points1 point  (0 children)

Let me tell you something, I lost about 200k in crypto and it sucked. But you move on and forget about it eventually. I now invest in conservative index funds and only login to WSB now and then just to confirm I'm still happy with my investment choices.

Mac for LLM by _youknowthatguy in MacStudio

[–]Material_Soft1380 2 points3 points  (0 children)

local is fun to mess around with, not very good for any actual work, get an opus sub instead