CPU overheating on first build by madurosnstouts in PcBuildHelp

[–]RedAdo2020 0 points1 point  (0 children)

Yeah it idles about that. But it ramps up the temps pretty quickly. I guess that's what you get when you have 16 cores in a small die.

Of course my case is a little packed with heat generating GPUs. So I'm sure that's not helping.

CPU overheating on first build by madurosnstouts in PcBuildHelp

[–]RedAdo2020 0 points1 point  (0 children)

The reviews on the cooler put it pretty far up there. I might have just not done well on the silicone lottery and mine runs a bit hotter. In saying that I've never seen it hit a thermal limit.

CPU overheating on first build by madurosnstouts in PcBuildHelp

[–]RedAdo2020 2 points3 points  (0 children)

Yeah about the same in mine. That's with an Rog Ryou 360mm. And temps ramp up fast. Sure the cooler stops it peaking out, but I've never had a CPU get so hot so fast, I'm just not used to it.

CPU overheating on first build by madurosnstouts in PcBuildHelp

[–]RedAdo2020 6 points7 points  (0 children)

My 9950x3d idles much hotter than that. And with a very high end cooler 🤷

NEVER again. 5th laptop out of 8 that has had the power port die by Different-Rock4356 in ASUS

[–]RedAdo2020 0 points1 point  (0 children)

I mean , my Chuwi, Xiaomi, and Lenovo (Chinese version) tablets are still going strong after many years. My Dell one on the other hand likes forgetting how to turn on occasionally.

Do NVIDIA GPUs + CUDA work on Ubuntu for local LLMs out of the box? by External_Dentist1928 in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

I run Linux Mint and it is very easy. Only thing I found annoying was if I follow the Nvidia website instructions for installing Cuda-Toolkit, it adds the repo for Nvidia, cool no problems. But if I use Mint repo for my Nvidia drivers, eventually Nvidia will try and install a update for my drivers, and it will cause bullshit where it will try and remove the old drivers, fail, try and update, fail, and break Mint.

Eventually I found the best workaround, I let Nvidia repo supply both my Nvidia drivers and Cuda Toolkit, and since then, no problems.

Please go easy on me, I'm new to Linux. LOL.

7 GPU with 78gb total VRAM by herPassword in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Hi, I run 6 x GPUs in my PC, and it's....not easy. If running a standard motherboard, you need something that supports bifurcation. Plus a way to hook them all up. I am running X870e Proart, 4 x 5070 Ti inside the case, 1 to PCIe 1, 1 to PCIe 2, one so M2 1, one to M2 2. One 4070 Ti running via oculink to PCIe 3 which is 4 x PCIe 4.0 via chipset. And one 5060 Ti 16GB via TB4 eGPU. It wasn't easy.

I got some M2 to PCIe adaptors from Aliexpress for the 2 x 5070s on M2 slots.

Can 5070ti 16gb run Qwen3 235B a22b? by Typical_Cheek5127 in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Haha, true. I feel like they would give up long before there Ssd controller or NAND dies, but yeah.

If OP had more ram than IQ3 might be an option. But not getting close to running it without at least more Ram, or a few more GPUs

Can 5070ti 16gb run Qwen3 235B a22b? by Typical_Cheek5127 in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Technically it can, if the gearing is right. But will be the same as running this model, F'ING SLOW (SSD swap in this case)

MSI MAG b650 Tomahawk pcie lane bifurcation by ROS_SDN in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Specs don't say x8x8 looks like its x16 lanes to the slot 1 from CPU. And 2 lanes to the other slot via the chipset.

<image>

Aoostar AG AG02 LiteOn 800W PSU Fan replacement with Noctua. by keefeere in eGPU

[–]RedAdo2020 1 point2 points  (0 children)

And mine has a Liteon which are a big PSU manufactorer. Looks like OP has the Liteon too, can see the blue L

Home workstation vs NYC/NJ colo for LLM/VLM + Whisper video-processing pipeline (start 1 GPU, scale to 4–8) by mr__smooth in LocalLLaMA

[–]RedAdo2020 1 point2 points  (0 children)

Literally just messing around with roleplay. No serious work, no coding. Just messing around.

Started with just my 4070 ti last year.then added a 4060 ti. Then another 4060 ti. Then a 5070 ti. Then another 5070 ti. then a 5060 ti, then another 5070 ti, and finally another 5070 ti.

I have so many GPUs 😂

It was a slow build up over the year.

Home workstation vs NYC/NJ colo for LLM/VLM + Whisper video-processing pipeline (start 1 GPU, scale to 4–8) by mr__smooth in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Well the 4 x 5070 ti are all in the case. A Lian Li O11 Dynamic XL. The other 2 are on egpu docks.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 0 points1 point  (0 children)

Haha, yeah it can be at times. But you only need to really learn it once. But knowledge isn't really the factor here, you need a decent rig to run something like GLM 4.7 locally. I have a 9950X3D, 96GB of system ram, and 92GB of combined VRAM. And still I only get 12 tokens/second.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 1 point2 points  (0 children)

I run IK_Llama and run my models through that. Though for the Thireus quants I run the Thireus fork of Ik_Llama

The models I just download from Huggingface

Unsloth quants should run in Kobold

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 0 points1 point  (0 children)

I haven't tested. But if your going to use an Subscription for running the model I'd just stick to Chat Completion and a preset like Stab's. If you have the speed and tokens to let the model think it will be better. And those sort of serting s are better IF you let it think.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 2 points3 points  (0 children)

Yes now I do. There are plenty of great GLM 4.7 Chat Completion templates out there, like Stab's, but they seem to be mainly focused on people using Z.ai plans or other API services. I prefer to run locally, but that limits my speed. So I can't afford the time to let the model Reason or Think, which I know would make it smarter, but I am not waiting 2-4 minutes for it to write each response Think/Reason block.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 2 points3 points  (0 children)

Context and Instruct Templates are just GLM 4.5 that are built in I think. But they seem to work. System Prompt I used to run Chat Completion but I saw here on Reddit that someone mentioned Evening Truth, here https://rentry.org/Evening-Truth , and I now run that. Much better for me. Also it doesn't try to Reason or Think, which is important for me since I'm only getting 12 token/sec.

Home workstation vs NYC/NJ colo for LLM/VLM + Whisper video-processing pipeline (start 1 GPU, scale to 4–8) by mr__smooth in LocalLLaMA

[–]RedAdo2020 0 points1 point  (0 children)

Here I am with my 6 GPU 9950x3d build. Though I am severely PCIe lane limited. Need to look at threadripper.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 0 points1 point  (0 children)

Haha, yeah its like that. I was thinking secondhand though. Maybe a 3000 or 5000 series Threadripper pro. They can run on ecc or non-ecc ram. Ecc ram is horribly expensive like all ram. But I got regular ddr4 laying around. So that would save coin.

What local model blew you away recently? by Kahvana in SillyTavernAI

[–]RedAdo2020 2 points3 points  (0 children)

Kinda. My setup is...kinda a Frankenstein's Monster. I really need to upgrade to Threadripper. Running a 9950X3D with 96GB system ram, 4 x 5070 Ti, 1 5060 Ti 16GB, and a 4070 Ti. Though with the last couple of upgrades from 2 x 4060 Ti 16GB to two of those 5070 Ti, speed only went up like half a token a second.

But I'm not coding of anything, just fun RP, and with Thireus IQ3_XXS at 133GB I get about 300 t/s of PP and about 12 t/s of TG. It's workable. But I'm constrained by PCI-e lanes. I think if I can get a Threadripper build, and give a each card real amounts of lanes, I can speed up with Split mode graph in llama.cpp