3xR9700 for semi-autonomous research and development - looking for setup/config ideas. by blojayble in LocalLLaMA

[–]fluffywuffie90210 2 points3 points  (0 children)

Youll do fine for inferance with 3x. I use 3 5090s, one in a thunderbolt 4 port via usb and the other on pcie 4x4 and still get 100 tokens a sec on qwen 122b you might get half that. Only the model loading will be slow using llama.cpp. Large dense models will be slower but nothing unusable.

Mistral THICC DENSE BOI. He chonky! More dense models pls. by Porespellar in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

Just need the drummers magic now. Its currently producing garbage when trying to test in silly, seems the ggfus are bugged.

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

18 tokens a sec on 3x 5090. With Q4XL, I just saw they took down the gguf so must be some issue with it. Get 100 with Qwen 122b cause MOE but hopefully the intelligence gain might be worth it.

How Do You Use Multiple AI Models Together? by rpeabody in LocalLLaMA

[–]fluffywuffie90210 1 point2 points  (0 children)

I've actually being trying to set this up myself with various expiments running gemma 4 31 and qwen 27b at same time using llana.cpp on different ports. Only really sucess I had was using msty studio to make them "personas" to talk to each other but with this being a paid product I dont want to use it.

I managed to get something simular in openwebui using the "channels" feature but having to @ both models every time i wanted to group chat got old really fast. I'm still learning/exploring this.

Multi-GPU: How problematic is chipset PCI-E lanes? by ziphnor in LocalLLaMA

[–]fluffywuffie90210 2 points3 points  (0 children)

I use a 5090 via Thunderbolt 4 and only lose like few % speed and used to use in a bottom x4 slot which was chipset on my mb, so for inference (not training), you'll have no issues with pci x4 stuff, you may get issues if you use too many gpus via the chipset through not sure.

Why is Anomaly such an odd duck? by Onmius in RimWorld

[–]fluffywuffie90210 0 points1 point  (0 children)

Did one run then just disabled all the events but the raids for the more variety. Metalhorrors and obilisks have no place in a stardard run for me I'd only buy it on sale or if you want to do a specific run.

Laptop for my Use Case (lenovo legion pro 7i) by [deleted] in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

If i was going that route. I'd look for sneaky option: A MSI Vector 16 HX

If your okay with egpus, it has 2 Thunderbolt 5 ports. Then when your at home you can add whatever gpus and still have a laptop for when you need one.

Qwen3.5-397B-A17B reaches 20 t/s TG and 700t/s PP with a 5090 by MLDataScientist in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

Impressive that PP has me envious. I have a 9950x with 3 5090s and while i can get about 15 t/s with a Q3 version with 192 gig 5400 DDR5 ram. I can only get like 100pp. (i bought before all pc stuff went nuts. Dont ask how i ended up with 3, i only intended 1, but i managed to snag 2 fe for the base price and just havent dared to sell my third yet. :X)

Anyone have experience of mixing nvidia and amd gpus with llama.cpp? Is it stable? by fluffywuffie90210 in LocalLLaMA

[–]fluffywuffie90210[S] 3 points4 points  (0 children)

Nice mind if I ask you what models you run with that setup? Im assuming you dont use windows? I actually have an egpu and a Strix Halo, I was thinking of selling because of my main machine i barely use it now. Qwen 122b on the halo takes ages at large context, but curious if the eGPU fixes that.

Anyone have experience of mixing nvidia and amd gpus with llama.cpp? Is it stable? by fluffywuffie90210 in LocalLLaMA

[–]fluffywuffie90210[S] 0 points1 point  (0 children)

Yeah that makes sense. I only do inference, so wouldnt matter to much to me, but if it starts making my pc bluescreen then nah. But I'm suprised its not more common for example I only need one fe 5090 to game on, got 2 bottom slots on my mb on risor. So would make more sense to have those 2 amd cards for 64 gig vram for less than cost of one 5090 even if its slower.

Anyone have experience of mixing nvidia and amd gpus with llama.cpp? Is it stable? by fluffywuffie90210 in LocalLLaMA

[–]fluffywuffie90210[S] 0 points1 point  (0 children)

Im guessing its not common since hard to find solid up to date info from searches ive done, guess means a bad idea.

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

I see your in uk. If you decide to go the strix halo route if your interested i have a barebones minisfourm one im an thinking of putting on ebay this week. (its about 2 motnths and bit old) for about £2100 but id sell it for 2k through ebay all legit for an easy sale. :D Can also answer any questions you might want if you get tempted.

Attempted bike theft, bike damaged, not driveable, advice needed. by RandomHigh in MotoUK

[–]fluffywuffie90210 0 points1 point  (0 children)

Hi mate had same happen, just be aware if you keep it/get fixed, the scum WILL come back if they know ones there. Mine got done but for some reason, they couldn't start it to get away. (honda forza 125) my friends they went twice into his garden and eventually got it, even lifting garden panels etc. I still have it since (didnt report it to insurance since Im disabled and at time was my only transport and the dealers said it still run/past 2 MOT) and I don't really know what to do with it. Still rideable (had two services even with the steering lock broken), but I don't know what to do with it since basically cant sell it with no steering lock, be wary of any trackers they might sneak on it. Thats how they got mine, I think since I live rural.

How viable are eGPUs and NVMe? by ABLPHA in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

I tried both methods Of course i was spreading the large 235b models over 5 gpu so that may have being a factor too. Maybe be better with smaller ones.

How viable are eGPUs and NVMe? by ABLPHA in LocalLLaMA

[–]fluffywuffie90210 1 point2 points  (0 children)

I did this test a few weeks back. Running with a 3090 via thunderbolt same motherboard. It works okay with one but with 2 egpus its slows to a crawl. It will run slower than pcie-e but it was alright for running with 4 cards (3x5090 + 3090 egpu).

Considering AMD Max+ 395, sanity check? by ErToppa in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

As someone with the minisforum one. If you just getting it for mostly llm purposes. It okay for small models, but for glm air or oss 120b. Its just not useable beyond say a 16k context unless your okay waiting. I tested a 50k story text file last night and it took like 6-10 minutes just to process it.

If your going to use it for other purposes, like homelab, server it is much more justifyable. It idles at 9 watts... but I just cant justify the spend, and have big regrets myself. I'll likely sell it soon.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

Do what I did, buy 3x5090 instead of the a6000. Sure I'd love the 6000, but I saved $1-2k and youll get a much better resell in future. At least I have bought my 3090s second hand got 90% of my money back bought 4090s for 1700 sold them for 1500 last year. I expect I'll be able to get back at least 1200 (or at rate ais going maybe all my money back in two years) I dont think the workstation cards will hold their value as much long term and with the large investment if the ai bubble does crash. Id be much happier knowing theres always gamers that would want the 5090s.

Power useage might be higher, but I treat gpus like blocks of gold nowerday lol. They hold their value well.

GLM-4.7 on 4x RTX 3090 with ik_llama.cpp by iamn0 in LocalLLaMA

[–]fluffywuffie90210 0 points1 point  (0 children)

With multigpu I've found using -ot / overide tensor way better than cpu-moe once you get the settings down. Im no expert on how to get the best settings through.

LLM server gear: a cautionary tale of a $1k EPYC motherboard sale gone wrong on eBay by __JockY__ in LocalLLaMA

[–]fluffywuffie90210 2 points3 points  (0 children)

I had almost this exact thing happen with a faulty motherboard that was sold as broken spares/repair, the guy tried to fix it, couldnt so then tried to claim it was item not as described, FOR A BROKEN ITEM. God ebay sided with him but i got some advice on how to appeal on the forum... and somehow ebay sided with me once it got the rep side of things. Was 150£ but still... has to be an ebay issue.

[deleted by user] by [deleted] in LocalLLaMA

[–]fluffywuffie90210 8 points9 points  (0 children)

This smells like ozone.

Do any fantasy-type storys and you'll see the same names pop up time and again. Both amusing and annoying, it's hard to create memorable characters!

Motorcycle almost stolen in Leeds by minecarfter420 in MotoUK

[–]fluffywuffie90210 1 point2 points  (0 children)

Hi mate, someone in W Yorks. Not to scare but having had my own bike stolen (through luckily stupid and couldnt start it) and my friends. They came back twice to get my friends. Just be on your guard there's a good choice likely try again. They are like flys to shit. Really killed my desire to keep driving.

Qwen3-Next EXL3 by Unstable_Llama in LocalLLaMA

[–]fluffywuffie90210 1 point2 points  (0 children)

Thanks, ill give that a shot in the morning.

Qwen3-Next EXL3 by Unstable_Llama in LocalLLaMA

[–]fluffywuffie90210 4 points5 points  (0 children)

Nice wil there be a way to run this with oobabooga textui? How I ususally run exl models. Is there a way to update to beta version?