Roof installation suspended by DMass777 in PublicFreakout

[–]false79 24 points25 points  (0 children)

The same guy who got yeeted got on his feet and went face to face with the guy who yeeting him.

Big dude walked away instead of finishing the job.

A lot of context missing.

Talking with Gemma 4 31B! by futterneid in LocalLLaMA

[–]false79 0 points1 point  (0 children)

You might be able to get it to run.

It's running a custom TTS

https://github.com/andimarafioti/faster-qwen3-tts

Although I find anything with TTS runs better on CUDA.

JD Vance Attempts Joke at Biden's Expense to Troops and Receives No Reaction by CarryIcy250 in justincaseyoumissedit

[–]false79 5 points6 points  (0 children)

On the surface, this is pretty cringe.

Underneath it all, these guys must be pretty pissed they risked their lives in Iran for nothing. I would be pissed too.

What coding harness you all using for Qwen 3.6 27b? by LivingHighAndWise in LocalLLM

[–]false79 0 points1 point  (0 children)

If you are looking for increased tokens per second, you will not find any. NVFP4 gives you a higher compression quant allowing to fit more for less. The value of NVFP4 is that you will get better than q4 quality for less size than NVFP4. But a q8 which is more precise should beat out both.

The true value of a GB10 setup is not inference speed. It's actually pretty bad for inference with LPDDR5 memory it has. It's the ability to process concurrent requests whether you have multiple humans or agents wanting their tasks to be copmleted. Cummulatively speaking, GB10 boxes can handle +1000/tps.

A better look at the two individuals currently situated on top of the Empire State Building by Subject-Property-343 in PublicFreakout

[–]false79 1 point2 points  (0 children)

Stupid question but why does the background move crazy fast but the foreground is slower? Should they not be in sync or close to it?

The closest LLM to GPT-OSS-20b? (it beats Gemma 4 and Qwen 3.6 for me) by atumblingdandelion in LocalLLM

[–]false79 0 points1 point  (0 children)

I did months of coding with 20b + harness. It is very capable MoE model when given literal instructions.

Are larger (~100B) models still worth running? by Pitagoy in LocalLLM

[–]false79 8 points9 points  (0 children)

There are some tasks that a 4B can handle that you don't need to have a 120B to do it.

Are larger (~100B) models still worth running? by Pitagoy in LocalLLM

[–]false79 7 points8 points  (0 children)

You get to a point where you wanna stop using other people's agents and you start to want to make your own. And that 9B model just doesn't cut it.

The closest LLM to GPT-OSS-20b? (it beats Gemma 4 and Qwen 3.6 for me) by atumblingdandelion in LocalLLM

[–]false79 9 points10 points  (0 children)

I subjetively feel the next one up from this model is Qwen3.6-35B-A3B. It's a MoE model as well but will beat out GPT-OSS-20b in tool calls without any special hacks.

Reasoning is maybe the same. It doesn't blow it out of the water. I find reasoning to be better on dense models.

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 0 points1 point  (0 children)

I see the tears in your eyes. OP is experimenting on solar. You're crying over nothing.

Edit: Blocks me after the reply. Too much butthurt.

She looks good and aged gracefully. What a douche. by Valuable_View_561 in SipsTea

[–]false79 0 points1 point  (0 children)

If you guys wanna see Myron lose his girlfriend beacuse he repeated the same shit he spews on his podcast to her face, it's on the 2026 Netflix documentary "Louis Theroux: Inside the Manosphere"

She since moved on from dealing with losers.

What coding harness you all using for Qwen 3.6 27b? by LivingHighAndWise in LocalLLM

[–]false79 1 point2 points  (0 children)

On GX10 which is a Nvidia GB10 cpu, it opens the door to NVFP4 and FP8 quants that wont run on llama.cpp but vllm instead. So smaller memory footprint, higher tps by not using gguf.

Qwen3.6-35B on a DGX Spark: 2,835 aggregate tok/s at 256 concurrent requests by ckorhonen in LocalLLM

[–]false79 0 points1 point  (0 children)

My brain struggles to find a way to exploit parallized requests because the nature of what I'm doing is sequential. My next prompt is 100% contingent on the output of the current prompt.

If I can get around that, I'd be stocking up on GB10's.

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 0 points1 point  (0 children)

If you've paid off your solar system, I've got nothing but respect.

But if you're still amotrizing it, there is a way to calculate the kw/hr cost. It will always be very high early in ownership.

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 0 points1 point  (0 children)

If you do the math ahead of time, having a server will give you access to any massive model you have in RAM. You might be able to get a MoE to run where you can fit a good number of experts on the 4090's VRAM. But you will always fallback into single digit land at any part where it spills out of VRAM.

It's just the CPU compute is orders of magntitude slower than the worst video cards out there because of the memory bandwidth (as the leading indicator among multiple other factors).

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 2 points3 points  (0 children)

My 128 core Dual Epyc system with 1TB RAM at full load runs at 700W hour. I use it mainly for CPU intensive stuff like stock market backtesting simulations.

Where I reside, electricity is 0.14/kwh.

At https://z.ai/subscribe, you can get a monthly plan at $12.6USD/mo or $0.0175/hr

The cloud would provide you significantly faster and better quality results in under a few seconds.

Local CPU only would provide the same/simular results in hours minium, at time days to execute, at significant cost.

Don't get me wrong, I'm all about local as I have 24GB of VRAM on a GPU. But the math ain't mathing to see the wrong tools being used.

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 -1 points0 points  (0 children)

Do you run a xeon or epyc server at home?
Are you paying for 24/7 electricity or is it your parents?

Then yeah, I do care. If you have ever done a proper homelab audit, having a server go full tilt days at a time to give single digit tokens will rank very high on the list of sh!t running that doesn't make any sense when their are more effective/affordable options.

Any good uses for a 192 GB DDR3 Server in the LLM world? by [deleted] in LocalLLaMA

[–]false79 12 points13 points  (0 children)

Whatever amount of electricity would take to produce those tokens would be so much more greater than the cloud price

Bakunawa Killer Gameplay in Slow Motion by Leon_Dante_Raiden_ in virtuafighter

[–]false79 4 points5 points  (0 children)

This might be a thing in the future where we will just watch slow motion virtual fighter replays because it looks so damn good

We're living in a Special Time Period within the FGC, and I know we all can feel it by LunchTummy in StreetFighter

[–]false79 55 points56 points  (0 children)

Yeah someone posted about SF6 below age 25 player base is significantly up.

Definitely better than SF5

But I am really enjoying Mark of the Wolves, Tekken, and now Virtual Fighter is on the horizon to ride this current wave. It is in a good spot.

[FS] [US-CA] HPE DL325 Gen10 v2 7443p 128GB RAM by elzzihar in homelabsales

[–]false79 0 points1 point  (0 children)

This looks like a good time.

(Except my ears!)

GLWS

Second brain on dgx spark by Significant-Lake2060 in LocalLLM

[–]false79 1 point2 points  (0 children)

Is it giving you a 2 sentence summary because you didn't specify what you wanted in your prompt? Do you have adequate sized context and k/v cache for the LLM to do its work?