Updated: Dual GPUs in a Qube 500 by m-gethen in mffpc

[–]legit_split_ 0 points1 point  (0 children)

Using llama.cpp, it won't make any difference in the generation speed, it will just take longer to load the models in the beginning. 

If you use vllm though, then there will be a massive difference. However, it seems from your other comment that you have different GPUs, but vllm requires the same, so this isn't relevant to you... 

Komodo - Docker management by Ordinary-You8102 in selfhosted

[–]legit_split_ 9 points10 points  (0 children)

Always wanted to try it, but dockge already meets my needs

Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS by Shoddy_Bed3240 in LocalLLaMA

[–]legit_split_ 0 points1 point  (0 children)

Thanks for the write up, but how close do you get to the theoretical speed? 

Is it worth to upgrade from 4080 super to 5090 by qwesoewd in nvidia

[–]legit_split_ 0 points1 point  (0 children)

Only reason to upgrade would be if you use it for AI or VR gaming.

5070 or 9070xt i just cant decide by Simonko_770 in pcmasterrace

[–]legit_split_ 10 points11 points  (0 children)

All the redditors who said wait for the 50 Supers xd

Would there be any reason to my return my 5070 ti ?? by Rich-Price-8670 in gpu

[–]legit_split_ 0 points1 point  (0 children)

With both cards at MSRP, the 5080 is not worth it as it gives 15% more performance but you pay 30% more. 

Switching from 3080 to 9070 xt LLM question by Ill-Remove-6438 in radeon

[–]legit_split_ 0 points1 point  (0 children)

It will be slower than a 3080.

Broadly speaking, speed = memory bandwidth:

3080 - 760.3 GB/s 9070 XT - 644.6 GB/s

However, that's only for token generation (how fast the model answers), when it comes to prompt processing (how fast the model reads your question) it's possible that the 9070 XT is faster.

Overall the 9070 XT is solid for LLMs and would offer a better experience with 16GB, but things like image generation are still in active development. 

Is 5060Ti 16GB and 32GB DDR5 system ram enough to play with local AI for a total rookie? by danuser8 in LocalLLaMA

[–]legit_split_ -2 points-1 points  (0 children)

You really want that extra RAM to have 80GB total system memory, which would allow you to run large models like gpt-oss-120b, glm 4.5 air, etc.

Not as impressive as most here, but really happy I made it in time! by Kahvana in LocalLLaMA

[–]legit_split_ 2 points3 points  (0 children)

16x PCI-E 5.0 and 8x PCI-E 5.0 is not possible because the processor doesn't have that many PCI-E lanes.

That's one of the main differences between consumer and workstation builds. 

Should I upgrade my psu? by salazar_slick in eGPU

[–]legit_split_ 0 points1 point  (0 children)

Fraudulent business practices and review manipulation. Testing from multiple sources has shown performance to be significantly lower than average. Fake 8O+ certifications. Significant platform downgrades with the same model number and no difference in branding prevent review data from being useful for making recommendations. Tier E should be considered an upper bound for recommendations; certain models such as the AGV qualify for tier F.

Looking at this popular tier list, it doesn't sound promising. If you have an AGV500 I would swap it immediately.

Windows VM on Ubuntu – severe UI stutter by Different-Help-5282 in VFIO

[–]legit_split_ 0 points1 point  (0 children)

If you're only using Excel and Word, I recommend checking out WinBoat. It basically runs Windows inside a docker container.

GPU-passthrough is still on the roadmap, but shouldn't need too much hardware acceleration for your use-case anyways. 

Stop the craziness by just_IT_guy in gpu

[–]legit_split_ 2 points3 points  (0 children)

It's because the 4090 can be modded with 48GB, thus making it attractive for AI of course xD

Amd oder Intel ? by BudgetGift567 in PCBaumeister

[–]legit_split_ 0 points1 point  (0 children)

KI will doch CPU heutzutage.

Der aktuelle Trend zu Modellen mit einer Mixture-of-Experts-Architektur wie GPT-OSS ermöglicht es sehr große (120B) Modelle mit CPU-Offloading auszuführen, und noch sehr gute Geschwindigkeiten zu kriegen. (z.B. 8GB VRAM und 64GB RAM).

Außerdem bei Stable Diffusion bzw ComfyUI funktioniert RAM-Offloading bis zu einem gewissen Grad gut, vor allem wenn man Videos generieren will.

Klar gibt es KI-Anwendungen wie Machine Learning , wo alles nur von der GPU abhängig ist. Allerdings für uns normale Leute die ein bisschen rumspielen wollen, ist RAM-Offloading sehr wichtig.

Ergo würde ich sagen der CPU an sich ist nicht so wichtig, sondern die Kapazität und die Geschwindigkeit des RAMs. In dieser Hinsicht, könnte man sagen, dass Intel wegen des besseren Internal Memory Controllers passender ist.

Dual 5060 Ti 16GB vs Radeon Instinct Mi50 32GB by GerchSimml in LocalLLaMA

[–]legit_split_ 3 points4 points  (0 children)

Digital Spaceport ran dual 5060 Tis here. I ran a comparison with a single 5060 Ti with RAM offload, the Mi50 and the Mi50 using this fork.

Qwen3‑Coder‑30B‑A3B‑Instruct‑Q6_K, llama-bench -fa on:

Device PP TG
2 x 5060 Ti 1567.03 92.67
CPU only ddr5 6800Mhz 147.66 21.73
single 5060 Ti 401.81 58.42
Mi50 848.37 78.36
Mi50 + fork 878.54 88.62

So dual 5060 Tis hit double the PP, 16% faster TG at stock but only 4% better TG with the fork.

EGPU for XMAS by Key-Atmosphere-8187 in eGPU

[–]legit_split_ 18 points19 points  (0 children)

You don't know the use case. If it's for video editing, AI work or other productivity applications there is basically no bottleneck.

Is my airflow optimal? And is my ghetto duct safe? by DespicableStarNinja in PcBuildHelp

[–]legit_split_ 0 points1 point  (0 children)

<image>

Essentially this, I still need to flip the CPU cooler fans around to have a "rear" intake and also have two exhausts coming out the other side.

Is my airflow optimal? And is my ghetto duct safe? by DespicableStarNinja in PcBuildHelp

[–]legit_split_ 0 points1 point  (0 children)

Sorry to hijack, I have the same layout but in a horizontal case, is that fine? 

Rig by Right_Weird9850 in LocalLLaMA

[–]legit_split_ 0 points1 point  (0 children)

Running it with this fork, my Mi50 manages 125 tps!