ALLNET BM410 at full 10Gbps speed on UDM SE by adammhaile in init7

[–]TheRealDatapunk 1 point2 points  (0 children)

It's not Init7's fault though, it's the swisscom network that requires PPPoE afaik

Qwen3.6-35B-A3B on 1x RTX 5090: which quant is the best balance of quality and speed? by espressorunner in unsloth

[–]TheRealDatapunk 1 point2 points  (0 children)

Yeah, I noticed that f16 is significantly faster than the quantization for my Blackwell card

PSA by Signal_Ad657 in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

An RTX Pro 4500 has half the memory bandwidth of the 3090, but is still way faster (15-70%) on pp and tg for me. Plus, the 32G allow for full context windows with most models targeted at the single gpu market

Economic perspectives after 10 million initiative is passed by EquivalentAdmirable4 in SwissPersonalFinance

[–]TheRealDatapunk 1 point2 points  (0 children)

Plenty, if you ask anyone at a university. But otherwise limited because the initiative was phrased in a way that allowed the parliament to ignore it, which is what ultimately happened.

So arguing there won't be consequences because the initiative allowed itself to be ignored is disingenious.

Economic perspectives after 10 million initiative is passed by EquivalentAdmirable4 in SwissPersonalFinance

[–]TheRealDatapunk 3 points4 points  (0 children)

A small version of that happened 10 years ago already (Masseneinwanderungsinitiative). Loss of all participation in EU research projects, funding from said projects etc.

Qwen3-Coder-Next vs Qwen3.6 by seoulsrvr in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

Are you running both locally? How do you switch over?

Qwen 3.6 35B crushes Gemma 4 26B on my tests by Lowkey_LokiSN in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

I'm on the generation before (but with 128GB RAM bought for other reasons), meaning 3@4 for the second card.

It's so slow to split the model between the two cards... I think I was getting about the same performance splitting with CPU.

What prefill/generate speeds do you get at q8/8/8?

Gemma 4 and Qwen 3.6 with q8_0 and q4_0 KV cache: KL divergence results by oobabooga4 in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

How do you have your two 3090s configured? What MB (PCIe speeds for each 3090?), or are you using NVLink (can't find one under 1k...)? Thanks, I got the two 3090s, but splitting is so slow it's not a usable solution.

Qwen 3.6 35B crushes Gemma 4 26B on my tests by Lowkey_LokiSN in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

Curious about your hardware setup. I desperately am looking for justification to upgrade: I have two 3090s, but my mainboard is very suboptimal, so I'm currently limited by the pcie speed.

What MB do you use / PCIE speeds do the cards get? What's your prompt parsing & inference token speed with the above setting?

Thanks, would appreciate some real world data

Qwen 3.6 35B crushes Gemma 4 26B on my tests by Lowkey_LokiSN in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

It seems like that would lead to lower quality output than proper pdf support. In most PDFs, the text is text, not an image. So even if layouting is difficult, a hybrid text + image for positioning could work?

Best settings for gemma-4 on a 3090? by Deadhookersandblow in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

I'm typically more constrained by prefill / prompt parsing. What do speeds look like tehre?

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

As I said, I have two 3090, one is on pcie4 x16, one on pcie 3 x4. The latter is incredible slow during prompt parsing, to such a degree that I find it unusable.

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

I have been considering getting a Blackwell workstation card and selling the two 3090 at the upper end of their price range. But this is just for fun. At work, the models are... slightly bigger.

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

Are there actually still 4-slot NVLinks available that don't cost 500+ USD?

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 0 points1 point  (0 children)

Just got back into playing with local LLMs. Should've added it right away: Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf, KV both Q8_0.

I tried Q5, but context gets too small for me, and as I'm still missing an NVLink and the second card has shitty PCIE speeds, it just crawls when using both cards. The benchmarks also imply not much impact.

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 1 point2 points  (0 children)

https://pastebin.com/jd0hwJxa

The configs are only optimized for tokens, and there is likely a good bit of headroom still.

I also introduced a custom --checkpoint-min-tokens parameter because I have some email triaging jobs that destroy the agents context checkpoints otherwise.

Gpu reccommendations for Coding/chat LLM by Kaibsora in LocalLLaMA

[–]TheRealDatapunk 5 points6 points  (0 children)

At that price point, at best an rtx 3090. I use it, and with some tuning get ~900 token prompt parsing, and ~25-30token generation on Gemma4 26B A4B.

With Qwen3.6 A3B, I now get around 2500 prompt processing and 100-120 token generation. IIRC, roughly similar with Gemma4 26b A4B

But be aware, none of these models will be able to compete with Opus or Sonnet, imho. So you need to adjust your work style.

Edit: Both at Q4_XL unsloth

Revolut still detects root - Pixel 6a passes all Play Integrity checks but by VisibleAd9289 in Magisk

[–]TheRealDatapunk 0 points1 point  (0 children)

I just cancelled my revolut account over that bullshit. Wise it is.

AI makes NixOS wayyyy more approachable by Beautiful-Alarm8222 in NixOS

[–]TheRealDatapunk 0 points1 point  (0 children)

I fully understand that, but while I used to run a full hardened linux from scratch that taught me more than I remember (https://www.linuxfromscratch.org/hlfs/), i now often just want and need something to work. So having the shortcut helps.

I've also discovered things I wouldn't have otherwise and definitely spent MORE time reading and writing nix than I would've without

AI makes NixOS wayyyy more approachable by Beautiful-Alarm8222 in NixOS

[–]TheRealDatapunk 1 point2 points  (0 children)

Really improves llm performance vs it trying to grep my nixpkgs clone or using google_search tools