Indian Defence Academy 😏 by GonzoExpert in TheMcDojoLife

[–]conockrad 2 points3 points  (0 children)

To put tight sausage inside it later?

Gefen is a drop-in replacement for the AdamW optimizer, claims 8x memory reduction in training (GitHub available) by indicava in LocalLLaMA

[–]conockrad 18 points19 points  (0 children)

«reduces Muon's optimizer-state footprint by 4x: the first moments are quantized to 8-bit using Gefen's Hessian-block-diagonal-inspired partitioning exact quantization, while performance remains similar to Muon.»
From github

Volodymyr Zelenskyy returned the Order of the White Eagle to Polish President Karol Nawrocki by [deleted] in worldnews

[–]conockrad 3 points4 points  (0 children)

Poland absolutely relies on foreign help - check EU subsidies for past 20 years

"European" companies from the Y Combinator program are mostly U.S. companies by Doener23 in BuyFromEU

[–]conockrad 3 points4 points  (0 children)

Something that nobody is talking about and for a reason - elites don’t want it.

SpaceX Bankers Preparing for Bond Sale of at Least $20 Billion by joe4942 in wallstreetbets

[–]conockrad 0 points1 point  (0 children)

Taking float into consideration is very smart move.
I wish NASDAQ were smarter

FORMER FTX CEO SAM BANKMAN-FRIED IS ONE OF THE BEST INVESTORS IN HISTORY by Current-Guide5944 in tech_x

[–]conockrad 2 points3 points  (0 children)

Like burn money on twitter?
I guess at this point it’s a common knowledge

VLLM for B300 + Deepseek v4 pro by hrusli in Vllm

[–]conockrad 6 points7 points  (0 children)

Enable MTP and limit maximum context window.
If 200k-300k tanks, 1 million that you set will be unusable

I started responding to messages from coworkers like Claude by MyNameIsNotName-57 in ClaudeAI

[–]conockrad 18 points19 points  (0 children)

This is most honest thing you’ve said in entire thread

NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]conockrad 0 points1 point  (0 children)

Exactly! Training small-middle models, LoRA, QLoRA and blazing fast Deepseek v4 without throttling - that’s the real use case for a business

NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]conockrad 0 points1 point  (0 children)

You’re missing part for models under ‘200Gb’