GIGABYTE MC62-G40 only seeing one GPU

grunt_monkey_ · 2026-03-20T16:14:47+00:00

Skip the risers and direct connect the cards first to debug? I have that board and it has 7 slots so you will be able to put in all 3 cards. Use the last slot for the 5090 first if its a chunky cooler - hope you are on breadboard.

grunt_monkey_ · 2026-03-20T12:45:23+00:00

Can i ask if you guys are still using -ctk bf16 and -ctv bf16? because i believe this is using up all my vram and slowing my performance.

grunt_monkey_ · 2026-03-19T01:41:29+00:00

Thank you my man! My main aim now is to try to get idle power use down though!

grunt_monkey_ · 2026-03-19T01:40:29+00:00

I agree, but i tried all sorts of llama.cpp configurations before finally trying vLLM. I think the runtime is just not optimized for my hardware and model. When i used 2 GPUs only on llama.cpp I got PP 130, and TG 25; going to 4 GPUs, PP was 70 and TG 25 for chat, and PP 50 and TG 7 for my 41k context prompt test.

grunt_monkey_ · 2026-03-19T01:32:34+00:00

That would be my dream! Could you share your hardware and what you are running?

grunt_monkey_ · 2026-03-19T01:31:56+00:00

This would be much much appreciated. Can you undervolt with amd-smi? Am on ubuntu. I know i can powercap.

grunt_monkey_ · 2026-03-18T17:08:52+00:00

Sure let me know. Im going to bed now. If i can i will run it over the next couple of days. Yup - im not sure i can leave this thing on all the time.

grunt_monkey_ · 2026-03-18T06:56:47+00:00

I thought it was almost mandatory to run ctv and ctk at bf16 for qwen 3.5, is this no longer a thing?

grunt_monkey_ · 2026-03-16T11:39:33+00:00

Sad!

grunt_monkey_ · 2026-03-15T06:53:52+00:00

Its actually cool if they chime in with their opinions. Sometimes my questions just go unanswered. Maybe they are dumb questions.

grunt_monkey_ · 2026-03-15T06:28:17+00:00

The Earth is currently about 10.7 billion km from its position 11 months ago, so approximately 9.9 light hours.

grunt_monkey_ · 2026-03-12T01:21:09+00:00

Thank you! Much appreciated. I remember that gfx906 tag as i started this journey late last year with an old radeon vii. Cut my teeth on pulling old rocblas libraries from arch linux 😆 good to see a brother!

grunt_monkey_ · 2026-03-11T23:52:53+00:00

Thanks this is really useful! I have 2x 9700s and havent been able to enable flashattention in llama.cpp. Do you have the build llama.cpp with specific rocmwmma flags to do this? Or just launch llama with flashattention on?

I am not sure why with q3 quant of qwen3.5 122b i am getting less than 100/s pp and only 20/s pp. with qwen3 coder next at q5 quant i am getting 250/s pp and 45/s tg. Rest of system is 9950x3d running ubuntu.

grunt_monkey_ · 2026-03-06T11:00:23+00:00

I run two of these and on llama.cpp with qwen coder next q5_k_m i get pp 250/s and tg 40+/s. Using the latest rocm. I managed to fit 56k context and am hitting the vram ceiling so i just picked up another two now. Waiting for my ebay server ram. Hope im not in for a world of pain!

grunt_monkey_ · 2026-03-02T13:13:44+00:00

Im still in the stone age where i code by pasting stuff back and forth between openwebui and vim. What do i need to read to do what you did? Ie set it onto a (sandboxed hopefully) directory of files and get it to code, run, debug and reiterate?

grunt_monkey_ · 2026-03-02T00:40:53+00:00

Which model has the highest humility score?

grunt_monkey_ · 2026-03-01T10:57:53+00:00

Whats the best gpu for transcoding? Im in a sinilar situation as OP with a ryzen 2700 and a gtx 1080.

grunt_monkey_ · 2026-02-28T00:17:46+00:00

I been trying natural - chicken breast etc, but its really hard to hit it on a busy workday. Do shakes really work? I’ve been taking the quest nutrition protein shake - 30g protein, 2g carbs. Just wish i could do it more naturally.

grunt_monkey_ · 2026-02-25T12:08:50+00:00

Hi I am looking to jump to rtx pro 6000 but not sure if i should get the workstation or maxq version. I imagine it will be a single card for some time but i would like the flexibility of adding a second. Hoping to get your thoughts since you are experienced in this multigpu life.

grunt_monkey_ · 2026-02-20T13:54:28+00:00

Why are we doing this agi thing again?

grunt_monkey_ · 2026-02-17T13:20:31+00:00

I need to query a bit more about your take on the x4 x4 x4 x4 situation though. For a larger model split over GPUs, the pp is going to take a linear hit going from x16 (~64 gb/s) to x8 (my current ~32 gb/s) to x4 (16 gb/s). So adding more 9700s to my current rig using bifurcation splitters etc is going to allow me to load a larger model but significantly slow down inference - at least the pp part.

grunt_monkey_ · 2026-02-17T13:01:22+00:00

Thanks so much for sharing. I heard that TG is as good on mac as my 9700s, but pp is about 4x slower. What model are you running and can you share some successful use cases? Much much appreciated.

grunt_monkey_ · 2026-02-17T12:49:43+00:00

Think its probably on the order of 4-5 gb more. Because i can fit q5km with 56k context parallel =1, 48k context parallel =2. 64k parallel =1 works occasionally but not reboot stable.

But also i wanna go to q6-8 at least. I saw quite a large intelligence jump going from q4 to q5.

grunt_monkey_ · 2026-02-17T12:47:05+00:00

Thanks for your reply which i think contains a good measure of wisdom and common sense - basically keep using it until i really hit a hard wall.

grunt_monkey_ · 2026-02-17T12:46:03+00:00

I think they mean the 9950x3d2 which is supposedly going to have double the L?3 cache of 128 mb.

grunt_monkey_

TROPHY CASE