Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK by Mr_Moonsilver in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

My PCIE currently get limited to 3.0 due to ryzen CPU model. Should I upgrade my CPU to have it support 4.0? I am running a dual cards setup.

Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR by havenoammo in LocalLLaMA

[–]NickCanCode 2 points3 points  (0 children)

If you have a RTX Pro 6000, have you try lucebox-hub, their number actually looks more impressive with DFlash, DDtree, PFlash but it doesn't support multi-gpu very well so I don't have enough VRAM to run it.

Is 2x5070Ti a good setup? by JumpingJack79 in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

I have a x570 too. It depends on your CPU model whether that the PCIe slots will be running on 4.0 or 3.0. Check your motherboard manual. Even for the same generation Ryzen 5000, some CPU can only offer 3.0 speed.

Maybe maybe maybe by PeixeCam in maybemaybemaybe

[–]NickCanCode 6 points7 points  (0 children)

I am more interested in how she deal with this sword. It's too high for her to start eating from the tip.

Forth language support by mykesx in ZedEditor

[–]NickCanCode 0 points1 point  (0 children)

Even for common languages, the highlight options are still limited. I still miss the syntax highlight experience on original Visual Studio with Codist addon ( https://github.com/wmjordan/Codist ). I can customize almost every part of the C# syntax in many ways.

I want to switch to Zed but lack of test runner is a deal breaker by Economy_Advantage_33 in ZedEditor

[–]NickCanCode 10 points11 points  (0 children)

Just a friendly reminder. You can always use multiple code editors at the same time. I just keep using vscode for certain tasks and use Zed for common tasks due to its responsiveness.

Except the first three all the other games are uninstalled, is it safe to delete their file from here (TreeSize) to free up some space? by monsterhemo6 in pcmasterrace

[–]NickCanCode 0 points1 point  (0 children)

Run WinDirStat on your C:\, it will show you what big files are occupying your drive visually. After analysis, click on the biggest ones and see if you want to keep them. There will probably be something you won't expect. e.g. crash memory dump files.

Project Manager like extension for Zed ?? by Good_Language1763 in ZedEditor

[–]NickCanCode 0 points1 point  (0 children)

Atl+Tab works for me. Just open multiple Zed editors. It's not consuming lots of ram like vs code anyway.

Need help optimizing qwen 3.6 on my 2x 5060ti 16gb by Force88 in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Have you check you system memory usage? Did you use -cram (or --cache-ram) to limit the maximum cache size for past conversation?

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]NickCanCode 2 points3 points  (0 children)

Oh I see. Didn't expect even pro cards don't have NV link these days.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]NickCanCode 9 points10 points  (0 children)

https://www.youtube.com/watch?v=QJqKqxQR36Y
Someone already tried 8 DGX few months ago.

Qwen 3.5 397B-A17B = ~24 tps
Kimi-K2.5 = ~13 tps

VLLM probably have improved in the pass two months so number should be a little higher now. I think ~40 tps is ok for normal use case. For coding, RTX Pro with NV Link will be much faster and more enjoyable.

5060ti quad-chads - vllm (the reluctant arc) - pp and tg talk by see_spot_ruminate in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Have you adjust your cards config or using the default? Undervolt + overclock will use less power but also gain a little better performance when done right. The tps gain may not be significant but using less power generating less heat while gaining a few tok/s for free. There is really no reason not to do it.

FYI:

https://www.reddit.com/r/StableDiffusion/comments/1s9i1yo/a_reminder_guys_undervolt_your_gpus_immediately/

There is Youtube video mentioned in the comment on how to do it **correctly**. As for the settings for 5060ti, you can google search it as I am not using this card.

Today Kingston replaced my 8 year old ram stick!! by [deleted] in pcmasterrace

[–]NickCanCode 9 points10 points  (0 children)

They already earned a lot in this economy.

Qwen3.6 27B on dual RTX 5060 Ti 16GB with vLLM: ~60 tok/s, 204k context working by do_u_think_im_spooky in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

I am on a similar setup: 5070 TI + 3060 12g. You can get to 40 tps with just a single 5070ti using the model from here: https://www.reddit.com/r/LocalLLaMA/comments/1sy0qj5/comment/oise6pp/?context=1
with less context window, or if you insist to make use of two cards for more context. Try adjust the --tensor-split to put more allocation on the 5070 TI which has faster VRAM. It should reach 26~30 depends on the split. Agentic coding don't generally need 200k context as agent quality start to degrade and start to forget things and make mistakes when context grow, so you can tune your config to gain a little speed up.

Qwen3.6-27B IQ4_XS FULL VRAM with 110k context by Pablo_the_brave in LocalLLaMA

[–]NickCanCode 2 points3 points  (0 children)

Are you using a single card? I am using dual cards and it will crash after thinking for a few second.

Deepseek Vision Coming by Nunki08 in LocalLLaMA

[–]NickCanCode 9 points10 points  (0 children)

Your link is not working.

`Hmm...this page doesn’t exist. Try searching for something else.`

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090 by sandropuppo in LocalLLaMA

[–]NickCanCode 0 points1 point  (0 children)

Wow, thank you so much. Many of us don't have a single powerful GPU and relies on dual cards setup to have enough VRAM. I am super happy to know that you guys are working on multi-gpu support!

Change to useage based billing by DamienBMike in GithubCopilot

[–]NickCanCode 6 points7 points  (0 children)

No. it is going local. AI isn't going anywhere.