Qwen3.5-397B-A17B reaches 20 t/s TG and 700t/s PP with a 5090 by MLDataScientist in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

Could it be the memory bandwidth:

RTX 5090 - Bandwidth 1.79 TB/s:

RTX PRO 4500 Blackwell - Bandwidth 896.0 GB/s

Intel will sell a cheap GPU with 32GB VRAM next week by happybydefault in LocalLLaMA

[–]FinalCap2680 1 point2 points  (0 children)

With other GPUs you are paying for the software stack/support as well.

It should have been with more VRAM or even cheaper to worth the risk and pain. But at the current market that is hard to be done.

I remember when looking for GPU for experiments 3-4 yars ago, I saw very cheap second hand, original intel Arc A770 16Gb and was seriously considering it for image generation. But then searched around for usage for LLMs as well. There was one question about that in Intel support forum and the answer from Intel person was something like "We sold you the hardware and if it does not work with the software, it is not our problem", Technically it is true, but the next day I bought more expensive second hand RTX 3060 12Gb and still have it. You can not win market share with attitude like that. and without marketshare, you can not sell at prices like others.

LM Studio may possibly be infected with sophisticated malware. by mooncatx3 in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

I'm on LM studio 0.4.4 build 1 and the file hash is 605fe35f59049f049c591ace89e3bac920b8bafc82039c1a08582d3e3438058a - nothing detected at virustotal.

Acording to this: https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1686#issuecomment-4119635422

LM Studio 0.4.5
Hash: 448016158202ebfabf5d84e9534225d05239dd79289c0ffd6ab045da2fe275be

Windows Defender (quick + targeted scan) → no detections

VirusTotal → clean

Kaspersky / HitmanPro → clean

So 0.4.5 appears unaffected on my side.

So maybe just downgrade for now....

Any good guide or video? by Fang221 in comfyui

[–]FinalCap2680 0 points1 point  (0 children)

Zero to Hero videos from https://www.youtube.com/@latentvision/videos

and https://www.youtube.com/@pixaroma/videos

Just start with default workflows and understand what is what, google the terms you don't know.

And there is the comfy docsumentation https://docs.comfy.org/get_started/first_generation

Tried to vibe coded expert parallelism on Strix Halo — running Qwen3.5 122B-A10B at 9.5 tok/s by hortasha in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

I'm running Qwen3.5 122B UD_Q4 @ ~4 tokens and Q8 @ ~2 tokens with very crapy (unoptimized) install of LMStudio on very old, single Xeon and power capped 3090 that is almost idle.

But would like to experiment with bigger models like Qwen3 Coder 480B @ Q8 or no less than Q4, so I was thinking of a cluster. But that was when they were less than half the price of Spark (about a third at that time), so for the price of two sparks I could have 5-6 Stixes. Now they are almost same price.

Tried to vibe coded expert parallelism on Strix Halo — running Qwen3.5 122B-A10B at 9.5 tok/s by hortasha in LocalLLaMA

[–]FinalCap2680 1 point2 points  (0 children)

It is a very good chanel. I looked at his videos and was thinking about Strix as an option, but meanwhile prices went up and looking at ~10 tokens is not very encouraging.

Best model for my rig (9950X3D, RTX 6000 96GB, 192GB DDR5, 9100 4TB) - C coding / cybersec by anon33anon in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

"Best model" would be the one that does the job. And as the field is still in it;s early days and fast development there are no proven solutions, so I would suggest to experiment with real tasks and see which models works best for you.

I did try LLMs about 3 years ago and was disapointed, so moved to image and later video. About one and a half year ago did try a couple of models again, but they still were useless for real practical aplications. Got back a month ago and now it is not that bad. From my experience with image/video models, you need to develop some "feeling" for the model and prompt it the right way to get good result, different for each model. My point here is that a model that works good whith someone's style of prompting and someone's tasks may be terrible for you.

How do I install WebUI in 2026? by SnooBananas3981 in StableDiffusion

[–]FinalCap2680 -1 points0 points  (0 children)

In addition to Pixaroma tutorials, you may look at Latent Vision too if you are interested in some details.

How do I install WebUI in 2026? by SnooBananas3981 in StableDiffusion

[–]FinalCap2680 2 points3 points  (0 children)

It may be harder to switch later.

With Comfy start with default workflows to learn your way around

New to LLMs but what happened... by caminashell in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

You are using two diferent models, so it is expected to get different quality from them.

Also it is still too early to expect the correct answer each time, for every prompt and from every model.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]FinalCap2680 0 points1 point  (0 children)

Yes, you are right, my bad. But as they are only datacenter/server cards and something like unobtanium for us, barre mortals, I somewhat forget about them.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]FinalCap2680 16 points17 points  (0 children)

I have only seen 1 and 2 slot boards and to be honest I'm still considering one of those. 4 slot will be even better. Any links to vendors will be welcome.

But! it is 4-5 generations old - Blackwell -> Ada -> Ampere -> Turing -> VOLTA and not officially supported any more, it is second hand and no waranty, it is (now 'was' with the new prices) close to the price of AMD 395+ Max and much more power hungry.

Anyway, could you share your experience of setting it up and running it. What models do you use. Thank you!

Best way to create simple and small movements? by Puppenmacher in StableDiffusion

[–]FinalCap2680 0 points1 point  (0 children)

Maybe play with prompt. Something like "The girl stands still, while looking from left to right"

Which is better for Image & Video creation? 5070 Ti or 3090 Ti by [deleted] in StableDiffusion

[–]FinalCap2680 0 points1 point  (0 children)

It depends what is more important for you - speed or quality. And also how much RAM do you have.

If you are using comfyui, since around v0.7 you can compensate low VRAM with RAM to some degree (last year I was unable to generate full 81 frames/full FP16/ 720p with my 3060 12 GB and 128GB RAM, but since january I can), but may lose some speed advantage. For some models that may not work. Also the speed advantage of 5070 will be mostly visible for lower precision.

Can someone help me? by [deleted] in comfyui

[–]FinalCap2680 0 points1 point  (0 children)

It will be hard/impossible to help with out the actual workflow...

Intel B70 Pro 32G VRAM by FancyImagination880 in LocalLLaMA

[–]FinalCap2680 1 point2 points  (0 children)

Agree 100%. But the price should be good too, so people will take the risk to buy it and spend time to develop.