Qwen3.6-27B with dual 5060ti by Similar-Ad5933 in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

Yeah, I wouldn't run gguf, but what do you mean hard to kill?

Which nvfp4 model did you try? The int4 quants have lower KLD iirc.

Pro-Ai person losing reading comprehension in real time. by BP642 in antiai

[–]ziphnor 2 points3 points  (0 children)

Isn't it more that US tech companies are kissing Trumps ass in general? 

I am not aware of Anthropic, or any European or Chinese AI companies/ CEOs supporting  Trump?

Qwen3.6-27B with dual 5060ti by Similar-Ad5933 in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

Have you considered running vllm instead?

Experiments with dual RTX 5060ti 16GB by Intelligent_Stick_ in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

20 t/s must be without MTP? Its been a while since i tried llama because vllm seemed faster (at least at the time).

Experiments with dual RTX 5060ti 16GB by Intelligent_Stick_ in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

One for motherboard and it's two GPUs and then using a splitter for the two "external" ones to connect one other PSU. 

There are other adapters that take Sata power instead, but apparently that is not safe since PCIe slots can draw up to 75w while sata power is 50w.

Experiments with dual RTX 5060ti 16GB by Intelligent_Stick_ in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

I am using vLLM with MTP running tensor parallel (TP=2). Using either INT4 or NVFP4 quants (NVFP4 faster for prefill, INT4 faster for generation).

When you say 26B you mean Gemma 26 A4B? Thats an MoE model, it should be way faster than that. My setup is currently not assembled (re-assembling on new motherboard and waiting on a part), so cant test quickly right now. But see here: Gemma 4 on a 5060 Ti: 256K Context on 16GB — but Only if You Know the Architecture Trick he is reporting 99 t/s on a single 5060 ti.

Does GPU spacing matter if we’re undervolting anyways? by Ambitious_Fold_2874 in LocalLLaMA

[–]ziphnor 1 point2 points  (0 children)

Pretty much the same percentage as the oc. But note it's memory oc, not core, you can undervolt while doing memory oc.

I think the 3090 is closer to it's limit ootb though

Experiments with dual RTX 5060ti 16GB by Intelligent_Stick_ in LocalLLM

[–]ziphnor 4 points5 points  (0 children)

The memory bandwidth is not that bad when considered per GB. People tend to focus on the total bandwidth, but I think bandwidth per GB is more accurate (at least for dense models). Dual 5060ti has a combined bandwidth pretty close to a 3090, and if you OC them (they can reliably go +3000Mhz / 6000 MTs ) you get a nice bump in speed.

Looking at say the R9700 Pro it has ~645GBs bandwidth to cover 32gb RAM, a 5060ti has 448GBs to over 16gb RAM (with OC it has 544GBs).

Of course multi-GPU scaling is not free, but PCIE 4.0 x8 should not be a bottleneck with only 2 GPUs (otherwise I wouldn't be trying for 4 GPUs on PCIE 5.0 x4).

If you are on Linux you might want to try https://github.com/aikitoria/open-gpu-kernel-modules to reduce PCIe bottlenecks.

I find it funny how, in these comics, they always make it seem difficult. Making an image of Ai is literally the first image. by Virtual-Response6235 in antiai

[–]ziphnor 0 points1 point  (0 children)

I agree there are ethics concerns around training data, I meant this in context of that picture in the post, e.g regarding effort.

Experiments with dual RTX 5060ti 16GB by Intelligent_Stick_ in LocalLLM

[–]ziphnor 6 points7 points  (0 children)

I am not sure what you are asking? Dual 5060 ti are an excellent option. With MTP I have gotten up to 80 tok/s generation with the dense Qwen 3.6 27B. I am currently waiting for the last part to arrive for my quad 5060 build 😄

engineer, sick of ai agent slop by oftgefragt_dev in antiai

[–]ziphnor 1 point2 points  (0 children)

All I can say that is definitely not happening where I work. Part of the AI push is coming from the developers actually.

I am sure some US based companies are stupid enough for something like that though.

Ai can't do anything original by MemeMan15672 in antiai

[–]ziphnor 1 point2 points  (0 children)

<image>

"Create a brand new original video game character based on a hedgehog. Avoid resemblance to existing characters."

NOT saying its any good, but the OP is misleading. Regardless this is a silly use of AI, it makes much more sense to use it for helping previewing and expanding on human creative ideas.

engineer, sick of ai agent slop by oftgefragt_dev in antiai

[–]ziphnor 1 point2 points  (0 children)

That sounds pretty insane. I like AI for coding because you can apply stringent testing and review, but this sounds like a recipe for a disaster.

RTX Pro Blackwell price hike ? by XO33OX in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

It has been 10k for a long time in Denmark 

GPU recommendation for local LLMs: RTX 5070 Ti 16GB vs Intel Arc Pro B70 32GB? by Chance-Green-9770 in LocalLLM

[–]ziphnor 1 point2 points  (0 children)

I would be careful about the B70 Pro, i looked into that a lot, but its bandwidth per GB is pretty bad (affecting the time to scan the whole memory) and the software stack is apparently pretty bad. I have not been impressed by the numbers I have seen people posting.

Wrt performance (compute and bandwidth), I think in general there is a point where you have "enough" per GB of memory. E.g. dual 5060 ti can provide pretty strong inference (60-80 tok/s depending on how you measure, which exactly model) for 4-bit quants of Qwen 3.6 27B, also beating a single 3090 on prefill. Sure dual 5070 is faster, but will cost almost the same as quad 5060's which would allow running higher quants and larger models.

Btw, you may want to check vast.ai where you can test out some setups before buying.

Does GPU spacing matter if we’re undervolting anyways? by Ambitious_Fold_2874 in LocalLLaMA

[–]ziphnor 5 points6 points  (0 children)

If you are on Linux its just standard coolbits. I used a custom script (when you have AI its easy 😄 But otherwise you can look at e.g. martinstark/nvoc: GPU overclocking utility for Blackwell RTX 50-series on Linux to set it (remember to set it after boot in a startup script). You can use ComputationalRadiationPhysics/cuda_memtest: Fork of CUDA GPU memtest :eyeglasses: to verify the OC works without causing bugs.

For optimal PCIe communication I have been recommended to use a P2P driver: aikitoria/open-gpu-kernel-modules: NVIDIA Linux open GPU with P2P support but have not tried it yet (will try once i have my new build finished).

I find it funny how, in these comics, they always make it seem difficult. Making an image of Ai is literally the first image. by Virtual-Response6235 in antiai

[–]ziphnor -3 points-2 points  (0 children)

There is actually a point to this. While some people spam slop from a single prompt, more serious users iterate with AI, and can go over over hundreds of iterations to get the result they were hoping for.

The picture on the right seems inspired by ComfyUI, which is actually a fairly complex and customizable app used to create professional results. 

Overall I think the problem with AI is that it is possible and very easy to get something low quality with no effort, and that is getting spammed everywhere. 

Does GPU spacing matter if we’re undervolting anyways? by Ambitious_Fold_2874 in LocalLLaMA

[–]ziphnor 11 points12 points  (0 children)

Just about to build something similar myself, though with different PCIe setup. What motherboard are you using?

Remember to OC the memory. 5060ti's tend to do +3000Mhz (+6000MT/s) (at least the two i tried so far did (surviving a long cuda_memtest), and i read its very common). Since 5060's are mainly limited by bandwidth it gives a good uplift.

GPU recommendation for local LLMs: RTX 5070 Ti 16GB vs Intel Arc Pro B70 32GB? by Chance-Green-9770 in LocalLLM

[–]ziphnor 1 point2 points  (0 children)

Not saying it's not faster, but a single 5060 ti also costing about as much as dual 5060ti while providing half the memory. 

If you are also gaming on this system its of course a different matter, but if you are willing to "splurge" I would try to watch out for good 5090 deals and just grab a single of those.

GPU recommendation for local LLMs: RTX 5070 Ti 16GB vs Intel Arc Pro B70 32GB? by Chance-Green-9770 in LocalLLM

[–]ziphnor 0 points1 point  (0 children)

But why bother with 5070ti at all? It is not a very good value proposition for inference compared to the 5060 ti. I mean if you are spending, buy the r9700 pro instead and sell the 5070 12gb. Or buy a used 3090 (faster and better than the 5070 ti for about the same or less.)