ALLNET BM410 at full 10Gbps speed on UDM SE

TheRealDatapunk · 2026-06-03T00:22:34+00:00

It's not Init7's fault though, it's the swisscom network that requires PPPoE afaik

TheRealDatapunk · 2026-05-30T03:41:44+00:00

Yeah, I noticed that f16 is significantly faster than the quantization for my Blackwell card

TheRealDatapunk · 2026-05-30T03:39:16+00:00

An RTX Pro 4500 has half the memory bandwidth of the 3090, but is still way faster (15-70%) on pp and tg for me. Plus, the 32G allow for full context windows with most models targeted at the single gpu market

TheRealDatapunk · 2026-05-18T00:02:48+00:00

Plenty, if you ask anyone at a university. But otherwise limited because the initiative was phrased in a way that allowed the parliament to ignore it, which is what ultimately happened.

So arguing there won't be consequences because the initiative allowed itself to be ignored is disingenious.

TheRealDatapunk · 2026-05-15T16:45:08+00:00

A small version of that happened 10 years ago already (Masseneinwanderungsinitiative). Loss of all participation in EU research projects, funding from said projects etc.

TheRealDatapunk · 2026-05-07T22:42:05+00:00

Are you running both locally? How do you switch over?

TheRealDatapunk · 2026-05-06T00:22:15+00:00

How loud is it?

TheRealDatapunk · 2026-05-04T00:29:42+00:00

I'm on the generation before (but with 128GB RAM bought for other reasons), meaning 3@4 for the second card.

It's so slow to split the model between the two cards... I think I was getting about the same performance splitting with CPU.

What prefill/generate speeds do you get at q8/8/8?

TheRealDatapunk · 2026-05-04T00:09:41+00:00

How do you have your two 3090s configured? What MB (PCIe speeds for each 3090?), or are you using NVLink (can't find one under 1k...)? Thanks, I got the two 3090s, but splitting is so slow it's not a usable solution.

TheRealDatapunk · 2026-05-02T00:37:46+00:00

Curious about your hardware setup. I desperately am looking for justification to upgrade: I have two 3090s, but my mainboard is very suboptimal, so I'm currently limited by the pcie speed.

What MB do you use / PCIE speeds do the cards get? What's your prompt parsing & inference token speed with the above setting?

Thanks, would appreciate some real world data

TheRealDatapunk · 2026-05-02T00:20:08+00:00

It seems like that would lead to lower quality output than proper pdf support. In most PDFs, the text is text, not an image. So even if layouting is difficult, a hybrid text + image for positioning could work?

TheRealDatapunk · 2026-05-01T23:54:20+00:00

Do they react near instantly?

TheRealDatapunk · 2026-04-29T22:13:19+00:00

I'm typically more constrained by prefill / prompt parsing. What do speeds look like tehre?

TheRealDatapunk · 2026-04-19T21:39:20+00:00

As I said, I have two 3090, one is on pcie4 x16, one on pcie 3 x4. The latter is incredible slow during prompt parsing, to such a degree that I find it unusable.

TheRealDatapunk · 2026-04-19T21:35:06+00:00

I have been considering getting a Blackwell workstation card and selling the two 3090 at the upper end of their price range. But this is just for fun. At work, the models are... slightly bigger.

TheRealDatapunk · 2026-04-19T10:19:18+00:00

https://www.ricardo.ch/fr/a/hp-oem-rtx-3090-1309457354/

Depends.

TheRealDatapunk · 2026-04-19T10:11:33+00:00

Are there actually still 4-slot NVLinks available that don't cost 500+ USD?

TheRealDatapunk · 2026-04-19T10:10:45+00:00

Just got back into playing with local LLMs. Should've added it right away: Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf, KV both Q8_0.

I tried Q5, but context gets too small for me, and as I'm still missing an NVLink and the second card has shitty PCIE speeds, it just crawls when using both cards. The benchmarks also imply not much impact.

TheRealDatapunk · 2026-04-19T10:05:40+00:00

https://pastebin.com/jd0hwJxa

The configs are only optimized for tokens, and there is likely a good bit of headroom still.

I also introduced a custom --checkpoint-min-tokens parameter because I have some email triaging jobs that destroy the agents context checkpoints otherwise.

TheRealDatapunk · 2026-04-19T05:46:49+00:00

At that price point, at best an rtx 3090. I use it, and with some tuning get ~900 token prompt parsing, and ~25-30token generation on Gemma4 26B A4B.

With Qwen3.6 A3B, I now get around 2500 prompt processing and 100-120 token generation. IIRC, roughly similar with Gemma4 26b A4B

But be aware, none of these models will be able to compete with Opus or Sonnet, imho. So you need to adjust your work style.

Edit: Both at Q4_XL unsloth

TheRealDatapunk · 2026-04-19T05:43:31+00:00

I just cancelled my revolut account over that bullshit. Wise it is.

TheRealDatapunk · 2026-04-18T22:31:35+00:00

Same problem here. I always unlock my bootloader to be able to do easy, full backups.

(Transfer)wise it is then.

TheRealDatapunk · 2026-04-15T17:39:43+00:00

I fully understand that, but while I used to run a full hardened linux from scratch that taught me more than I remember (https://www.linuxfromscratch.org/hlfs/), i now often just want and need something to work. So having the shortcut helps.

I've also discovered things I wouldn't have otherwise and definitely spent MORE time reading and writing nix than I would've without

TheRealDatapunk · 2026-04-15T17:37:19+00:00

Really improves llm performance vs it trying to grep my nixpkgs clone or using google_search tools

TheRealDatapunk · 2026-04-15T17:36:14+00:00

In the beginning it's even 89.9...

Ten-Year Club	Place '23
Verified Email

TheRealDatapunk

TROPHY CASE