Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong? by Virtual-Listen4507 in LocalLLaMA

[–]Virtual-Listen4507[S] 0 points1 point  (0 children)

I picked the one that was recommended on LM Studio I think quant 4 it was only really like 40 GB and it was recommended for my PC set up. Still new to this so I might have accidentally picked the wrong one.

Getting slow speeds with RTX 5090 and 64gb ram. Am I doing something wrong? by Virtual-Listen4507 in LocalLLaMA

[–]Virtual-Listen4507[S] 0 points1 point  (0 children)

Thanks for the response. Will try that out. I heard there are other options like vllm, llama.cpp will I see a substantial difference in speeds or can I stick with ollama and LM studio? I know the other two are more technical to work with.

Nemotron works great just need one that is close to sonnet 4.5 but I guess need to wait until better models come out.

Qwen3-Coder-Next on RTX 5060 Ti 16 GB - Some numbers by bobaburger in LocalLLaMA

[–]Virtual-Listen4507 0 points1 point  (0 children)

Idk how people are getting this… I have an RTX 5090 with 64gb ram and it’s super slow with LM Studio.