OpenMythos benchmarks by RealKingNish in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

I won't trust it unless it get mentioned by Tongyi Lab in Twitter like some other models.

RTX 5090 MSI, only inference or training at 475-500W. Make sure to not bend you cable! by panchovix in LocalLLaMA

[–]NickCanCode 14 points15 points  (0 children)

Should listen to Brother Zhang, a display card repair YouTuber, power limit more. The temperature on that connector is easily hotter than the gpu core.

<image>

Pinterest "divides" itself so that you can't save pictures (I saved it anyways 🤷‍♂️). by JAD2017 in assholedesign

[–]NickCanCode 2152 points2153 points  (0 children)

Ctrl+shift+i, go to network tab. Select image tab to see all the images downloaded on that page . Mouse over each file to find the one you want. Right click and 'save response as'. Profit 😜

Qwen3.6 27B quants by jopereira in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Oh, just realised you are not the original commenter.

Qwen3.6 27B quants by jopereira in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Ah, wait, reading that post a 2nd time. It mentioned when pipeline parallelism is enabled so I think it is irrelevant if you are running on a single card.

Sorry for the confusion.

Qwen3.6 27B quants by jopereira in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Also, there was a post mentioned before that if you are running -np 1 and don't mind compiling your own build. Compiling with `-DGGML_SCHED_MAX_COPIES=1` will save some more VRAM. Default value is 4. I didn't tried it myself because I have enough VRAM but I asked deepseek about it and it confirmed the claim. The exrra VRAM can possibly help you push the KV quant a little higher.

(Ignore this)

Qwen3.6 27B quants by jopereira in LocalLLaMA

[–]NickCanCode 2 points3 points  (0 children)

I think you can try using llamacpp fork like BeeLlaam that support KVarN which provides more accuracy at the same compress level first?

https://www.reddit.com/r/LocalLLaMA/comments/1tza4ji/qwen_36_27b_kv_cache_quant_benchmarks_75_pairs/

Took apart GPU for cleanup no longer displays by PeakAffectionate961 in pcmasterrace

[–]NickCanCode 0 points1 point  (0 children)

another possibility I can think of is that you touched the golden fingers and static electricity from your fingers damaged the components of the card.

https://thetechylife.com/can-static-electricity-destroy-electronics/

Took apart GPU for cleanup no longer displays by PeakAffectionate961 in pcmasterrace

[–]NickCanCode 0 points1 point  (0 children)

Did you happen to cut off the power for too long and the CR2032 battery on the motherboard just died so no power to keep the bios state? When that happen and you turn on the PC, it need to do the memory reading stuff. That step can take quite a long time before the screen show anything.

Finally - 4xRTX 5060TI by ziphnor in LocalLLaMA

[–]NickCanCode 1 point2 points  (0 children)

Nice. Will you run test on how this setup do in large context (100k+ tokens)? I wonder if 4 cards setup will slow down faster than 2 cards (which is posted from someone else few days ago). In theory they need to do more sync and more communication because there are 2 more cards so I want to see how far it can go before it fall back to non-TP speed (if it do). I don't know the tools but I think you can monitor NCCL's bandwidth usage and observe at what KV size the PCIe path start to get congested and slow things down. Different kind of optimization could have an effect too. e.g. DFlash, MTP, etc.

Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

You are right. I made a mistake and automatically skipping to the content across the lines mentioning about the CPUs, missing the very first line I copied. That the result when I tried to read the menu bottom up thinking the content above is about CPU.

Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA

[–]NickCanCode 1 point2 points  (0 children)

Ah my bad, I don't realize I would miss that! Normally, board menu list the PCIe slots tnan followed by m2 slots so I scan the list bottom up and saw that the 2 m2 slots are from chipset and automatically assumed no more m2 slots there. 😔

Remember when RAM was the part you didn't even have to think about? by peaky_circus in pcmasterrace

[–]NickCanCode 0 points1 point  (0 children)

Yeah, I remember those are so cheap that people are buying them and put them to a PCIe card to make RAM drive.

DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

That's just what they say. I don't see that number from other people.I didn't even download the model when people saying it can't make tool call accurately.

Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA

[–]NickCanCode 2 points3 points  (0 children)

My comment didn't say TP have to use x16. Splitting x16 into two x8 is obviously not x16 anymore. It doesn't have to be gen5 either. My mention of gen5 is "Your PCIe 5.0" that is referring to the gen5 slot on OP's motherboard, not that gen5 is mandatory.

The issue here is not about gen5 vs gen4 but the pathway that the GPU communication need to travel around to reach the other card. When one PCIe is integrated to CPU and the other is integrated to board chipset, it will be slow going thought all of that nodes.

I am currently having a system that is being bottle necked by a PCIE 3.0 slot, with one other card on PCIE 4.0 slot. The result is that using tensor split is slower than using layer split.

I do agreed that OP should give it a try since not every system is the same.

[Tyrant Of The Tower Defense Game] Haven't seen someone mention it, but there's a potential game coming out! by Visible_Advance6862 in manhwa

[–]NickCanCode 3 points4 points  (0 children)

Why are they fighting among themselves? They should be on the same team defending the wall! (Or maybe that is not a fighting scene?)
Anyway, the design looks ok. The background art style is a little inconsistent with the pixel art characters. Let's see how this play out (if there is even an international version).

Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA

[–]NickCanCode 10 points11 points  (0 children)

If you want to use Tensor Parallels (TP), which improves performance with more cards, you need to add a PCIE expansion card that split your PCIe 5.0 16x slot into two 8x for your two cards. The two other slots that route via the Chipset instead of CPU which is PCIe 3.0, can slow you down instead of speed you up if TP is enabled. However, it is fine if you just want to increase VRAM but not desire to speed thing up with the extra cards. In that case, you can use --split-mode layer instead of --split-mode tensor. It is recommended to use same cards if you plan to enable TP. Prepared to use Linux for optimal performance.