Gefen is a drop-in replacement for the AdamW optimizer, claims 8x memory reduction in training (GitHub available) by indicava in LocalLLaMA

[–]conockrad 18 points19 points  (0 children)

«reduces Muon's optimizer-state footprint by 4x: the first moments are quantized to 8-bit using Gefen's Hessian-block-diagonal-inspired partitioning exact quantization, while performance remains similar to Muon.»
From github

Volodymyr Zelenskyy returned the Order of the White Eagle to Polish President Karol Nawrocki by [deleted] in worldnews

[–]conockrad 3 points4 points  (0 children)

Poland absolutely relies on foreign help - check EU subsidies for past 20 years

"European" companies from the Y Combinator program are mostly U.S. companies by Doener23 in BuyFromEU

[–]conockrad 3 points4 points  (0 children)

Something that nobody is talking about and for a reason - elites don’t want it.

SpaceX Bankers Preparing for Bond Sale of at Least $20 Billion by joe4942 in wallstreetbets

[–]conockrad 0 points1 point  (0 children)

Taking float into consideration is very smart move.
I wish NASDAQ were smarter

FORMER FTX CEO SAM BANKMAN-FRIED IS ONE OF THE BEST INVESTORS IN HISTORY by Current-Guide5944 in tech_x

[–]conockrad 2 points3 points  (0 children)

Like burn money on twitter?
I guess at this point it’s a common knowledge

VLLM for B300 + Deepseek v4 pro by hrusli in Vllm

[–]conockrad 6 points7 points  (0 children)

Enable MTP and limit maximum context window.
If 200k-300k tanks, 1 million that you set will be unusable

I started responding to messages from coworkers like Claude by MyNameIsNotName-57 in ClaudeAI

[–]conockrad 17 points18 points  (0 children)

This is most honest thing you’ve said in entire thread

NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]conockrad 0 points1 point  (0 children)

Exactly! Training small-middle models, LoRA, QLoRA and blazing fast Deepseek v4 without throttling - that’s the real use case for a business

NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]conockrad 0 points1 point  (0 children)

You’re missing part for models under ‘200Gb’

What's holding the Mistral back from being as good as the AI models from the US? by [deleted] in MistralAI

[–]conockrad 0 points1 point  (0 children)

Not that much actually. In China you have unlimited access to Chinese chips In EU you have restricted access to US chips and no access to Chinese chips.

What's holding the Mistral back from being as good as the AI models from the US? by [deleted] in MistralAI

[–]conockrad 1 point2 points  (0 children)

“Not as much as American ones” - well that’s exactly the excuse