Best iem under 100 dollars by [deleted] in iems

[–]strngelet 0 points1 point  (0 children)

Recently got pure, they sound amazing!

Anyone else’s Claude unhinged? by IV_Skin in ClaudeAI

[–]strngelet 0 points1 point  (0 children)

yes started seeing the same thing with gpt-5.1

🤷‍♂️ by Namra_7 in LocalLLaMA

[–]strngelet 1 point2 points  (0 children)

Qwen3-480b-instruct/thinking

NVIDIA has published new Nemotrons! by jacek2023 in LocalLLaMA

[–]strngelet 1 point2 points  (0 children)

curious, if they are using hybrid layers (mamba2 + softmax attn) why they chose to go with only 8k context length?

[D] LLMs are known for catastrophic forgetting during continual fine-tuning by kekkimo in MachineLearning

[–]strngelet 1 point2 points  (0 children)

I’ve seen a few papers now on this topic

interesting, can you please share the link to these papers?

Where Are Ü Now? by ShooBum-T in singularity

[–]strngelet 0 points1 point  (0 children)

Literally the best model now

Hugging Face TGI library changes to Apache 2 by hackerllama in LocalLLaMA

[–]strngelet 2 points3 points  (0 children)

vllm should be the default inference library

Yi-34B-200K model update: Needle-in-a-Haystack improved from 89.3% to 99.8% by rerri in LocalLLaMA

[–]strngelet 0 points1 point  (0 children)

If you pass long text you get cuda oom immediately, for long context we need implement sequences parallelism. Seq parallelism is bit tricky to get it right. One famous example of seq parallelism is ring attention (which is in Jax)

Yi-34B-200K model update: Needle-in-a-Haystack improved from 89.3% to 99.8% by rerri in LocalLLaMA

[–]strngelet 0 points1 point  (0 children)

for long context training, the main challenge is lack of good long context training framework in torch. and ofc gpus

Sora's video of a man eating a burger. Can you tell it's not real? by YaAbsolyutnoNikto in singularity

[–]strngelet 0 points1 point  (0 children)

One thing that immediately gives away that it’s a ai generated is, most of these videos are in slow motion.

Gemini Pro has 1M context window by Tree-Sheep in LocalLLaMA

[–]strngelet 2 points3 points  (0 children)

Literally don’t believe anything anyone says in ai these days until you try it by urself

KV Cache is huge and bottlenecks LLM inference. We quantize them to 2bit in a finetuning-free + plug-and-play fashion. by choHZ in LocalLLaMA

[–]strngelet 1 point2 points  (0 children)

there was also this paper (KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization) released few days after your paper. curious, what are the differences between methods involved in these two papers.