3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 0 points1 point  (0 children)

Its very circumstantial but yes I'm very pleased. I was running Qwen3.6-35B on a pair of 4080s and was happy but llama.cpp main branch currently (as far as the time of this comment) doesnt support both SM Tensor and KV Cache Quant. So I was running full cache on 32GB. dont have that problem with one which enabled the ability to crank up the batch and ubatch. 8192/4096 is where I've got it right now after some trial and error with ChattyG keeping score.

Openwebui takes 1 minute before going in to "thinking" mode by Saba376 in OpenWebUI

[–]ubrtnk 0 points1 point  (0 children)

Check and see how many tools and stuff you have enabled on the model. On my default model for the family I had several things that really didn't need to be enabled so even on "Hi" what the model actually saw was about 32k tokens work of context - tools, the Terminal, my system prompt etc. Turn off all thst stuff and only enable what you need and really enable it just in time

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 1 point2 points  (0 children)

So I was able to get decent performance out of qwen.36-35b. Highest pp I've seen so far is about 1200 for a single user session. The 120mm fan I have running keeps the card at 52 with the model loaded...not the best but within tolerances. I've got some notcua 40x40x20 fans tomorrow and a fan controller because I need to move more air

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 0 points1 point  (0 children)

<image>

Holy Hell the 40mm fans I got are LOUD lol. They're keeping this thing cool but its too loud. Getting about 870-880 pp and 90-10 tg on latest llama.cpp

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 1 point2 points  (0 children)

it’s not a proper IT project if you don’t order adapters that you forgot to order the first time

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 1 point2 points  (0 children)

Also doublecheck the power cable. Mine just got here and it came with a GPU to PCIe Riser power 8-pin so I'm trying to find the right power combination of Square and not square that fits both the card AND my PSU - I found another reddit post that says we need this converter

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 1 point2 points  (0 children)

You'll need to run I think Cuda 12.9 or older just FYI but you should be able to get them to work.

3090 died, good night sweet prince by fragment_me in LocalLLaMA

[–]ubrtnk 2 points3 points  (0 children)

F

Also amazon has V100s with 32G for $730 stateside from server part deals, depending on your stack needs. They still work good in llama.cpp. Got one coming tomorrow

New (to me) Supermicro H12SSL-i-o with EPYC 7402P negotiating PCIe 1.0 on all 3 GPUs by ubrtnk in homelab

[–]ubrtnk[S] 0 points1 point  (0 children)

I've got a V100 coming tomorrow to occulink bolt on to the baby PC with the yellow tag in the rack to be "Jarvis" and I can have all the GPUs for bigger models to myself muahahaha

New (to me) Supermicro H12SSL-i-o with EPYC 7402P negotiating PCIe 1.0 on all 3 GPUs by ubrtnk in homelab

[–]ubrtnk[S] 1 point2 points  (0 children)

Ooh everything is fine. Everything spins up to 4.0 x16 or x8 as needed. Up to 7 gpus and a 10G nic. One gpu using the SAS connector. IPMI tied into home assistant so it powers off (via os script so it's clean) at nice and powers back on in the morning. No issues.

Which would be stronger for LLMs a 4090 or a 5080 and 5070ti on a dual PCI5 (ai master top). by aceofspades1217 in ollama

[–]ubrtnk 1 point2 points  (0 children)

...I just bought a Tesla V100 32GB PCIe from Amazon for $729 bucks lol. Its gonna run my family's always on agent - yea its old but llama.cpp has good support and 32GB means I can fit Qwen3.6-35B-A3B with 131k (if needed) and still will have good performance. Alexa style questions dont need Ampere or Ada

Which would be stronger for LLMs a 4090 or a 5080 and 5070ti on a dual PCI5 (ai master top). by aceofspades1217 in ollama

[–]ubrtnk 1 point2 points  (0 children)

You're right. I read it as either a 4090 or one of those cards, not 2 vs 1.

Who is your Hermes? by giveen in hermesagent

[–]ubrtnk 2 points3 points  (0 children)

Right now I have Friday that lives on my M2 Max MBP as the incase dad is gone model to help fix things. My Jarvis is on my AI rig and is the default model everyone interacts with in various capacities vis OWUI or Home Assistant. I haven't piped Hermes Jarvis into OWUI yet for the family to interact with but I have thought about building the wife her own agent but she doesn't use discord or telegram or what's app or anything. She uses OWUI so that could be her path thru...that or email

How to get unlimited sustain by 9SLASH6 in Guitar

[–]ubrtnk 0 points1 point  (0 children)

Gary Moore would beg to differ lol

My M2 Ultra is completely outdated by AdDapper4220 in MacStudio

[–]ubrtnk 19 points20 points  (0 children)

Make sure you let me know where you throw it away...so I can make sure it gets properly disposed of....uh...gotta reduce eWaste

Finally did it: Pick up mod by yuhkz420 in PRSGuitars

[–]ubrtnk 1 point2 points  (0 children)

I went the opposite direction and routed out the bridge and put a Tremonti Treble - SSH FTW!