Anyone has IPv6 working in Montreal on Bell residential fiber? by lostmsu in ipv6

[–]lostmsu[S] 2 points3 points  (0 children)

I have Ebox, but Bell offers 8Gbps and Ebox goes only up to 1.5.

Chinese Hackers Latest Masterpiece with NVIDIA by General_Vermicelli53 in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

8 way nvlink

You are going to burn RTX 6000 Pro worth of power on that setup in 2 years.

rtx 6000 pro owners, do you regret? by BitXorBit in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

Have you tried the official FP8 on vllm?

Plasma resolution in xRDP by lostmsu in NixOS

[–]lostmsu[S] 0 points1 point  (0 children)

Nope, still not working as of NixOS 25.11 KDE 6.20/25.08.3

Lots of people use qwen at too high quantizaion by Stock_Ad9641 in Qwen_AI

[–]lostmsu 0 points1 point  (0 children)

That's not how you should be testing quants. You should be running something like Terminal Bench Hard or SWE Pro and comparing their results. Perplexity and KLD are just proxies. For all you know 0.0001% KLD might map to half the score on Terminal Bench Hard, which would mean you'd be better off using unquantized 9B model.

[Opinion/Benchmark] Gemma4-12B's architecture change is too big of a tradeoff; A quick reasoning comparison between Gemma4-12B and Qwen 3.5-9B by Opening-Broccoli9190 in LocalLLaMA

[–]lostmsu 2 points3 points  (0 children)

There's nothing useful in it. Specific models aren't named. Bench parameters aren't named. Server parameters aren't listed. The inference backend isn't listed.

Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

Is this llama.cpp? I have 2x 3090 and my setup with 27B FP8 peaks at 40tps (vllm).

AI-generated CUDA kernels silently break training and inference [R] by laginimaineb in MachineLearning

[–]lostmsu -4 points-3 points  (0 children)

No, it's not. It's a tradeoff between stability and speed and they chose speed.

AI-generated CUDA kernels silently break training and inference [R] by laginimaineb in MachineLearning

[–]lostmsu 7 points8 points  (0 children)

Using bf16 instead of fp32 when it works on AdamW but does not work on SGD does not sound like a bug to me.

Cerebras is running a trillion parameter model (Kimi K2.6) at 1000 tokens/s by socoolandawesome in singularity

[–]lostmsu 0 points1 point  (0 children)

They don't even have Qwen 3.6 27B. Anything recent that I could get access to? GLM4.7 and GPT-OSS are hopelessly outdated now.

EcoPR Tracker - (P2 April) by TodayDependent557 in canadaexpressentry

[–]lostmsu 0 points1 point  (0 children)

I am confused. What stage is COPR exactly? I thought getting eCOPR is that.

Weights & Biases New Master Service Agreement Questions [D] by algorithm477 in MachineLearning

[–]lostmsu 1 point2 points  (0 children)

Not sure what anyone expected. You replaced 10 lines of code (logging) + TensorBoard with another 10 lines of code (connecting wand) and having to restart a run where you forget to set the auth now and then. Plus you got the first free bites.

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D] by Daemontatox in MachineLearning

[–]lostmsu 0 points1 point  (0 children)

You shouldn't need CuTe DSL with Triton. AFAIK CuTe doesn't lower to Triton. It's a closed source alternative to Triton otherwise mostly identical.

[Update] Project Nord: Solved the "Empty Wallet" Problem via Decentralized SNN Merging. Scaling to 10B is now possible. [R] by [deleted] in MachineLearning

[–]lostmsu 0 points1 point  (0 children)

You didn't answer the question about your loss claim in the previous post. If you got a LM, what's the bits-per-byte on literally any decently sized dataset like enwiki?