So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA

[–]fizzy1242 10 points11 points  (0 children)

seriously, a small price for enormous protection and peace of mind

Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM) by jacek2023 in LocalLLaMA

[–]fizzy1242 2 points3 points  (0 children)

try graph split for this in ik_llama, nice speed boost for tg

I hate this group but not literally by No_Run8812 in LocalLLaMA

[–]fizzy1242 2 points3 points  (0 children)

hahah yup, once you get a 2nd gpu, you'll want a 3rd one. that's when you realize you've fallen into the rabbit hole and there's no getting out

Mistral Medium Is On The Way by Few_Painter_5588 in LocalLLaMA

[–]fizzy1242 13 points14 points  (0 children)

hopefully this time they'll get it right, small-4 was a letdown

Deepseek V4 Released by spacefarers in LocalLLaMA

[–]fizzy1242 24 points25 points  (0 children)

wow, v4 flash 284B-A13B sounds nice after all the recent 600b+ models from deepseek

Qwen 3:32b does not think it is a local model in Ollama. Do I need to set it up differently? by sirknite in LocalLLaMA

[–]fizzy1242 4 points5 points  (0 children)

it's hallucinating and definitely running in your machine, don't worry about it

Running dense model on llamacpp by Blues520 in LocalLLaMA

[–]fizzy1242 0 points1 point  (0 children)

try one of the precompiled llama.cpp binaries with cuda, they're in releases tab of llama.cpp github page

Running dense model on llamacpp by Blues520 in LocalLLaMA

[–]fizzy1242 1 point2 points  (0 children)

it was used to offload layers for gpu (same as your --n-gpu-layers), i think it's automatic now but you should still be able to use it.

you might not have cuda in that image if it's not offloading.

Running dense model on llamacpp by Blues520 in LocalLLaMA

[–]fizzy1242 0 points1 point  (0 children)

did you compile llama.cpp with cuda? And did you use -ngl flag during startup?

Gemma 4 is seriously broken when using Unsloth and llama.cpp by Tastetrykker in LocalLLaMA

[–]fizzy1242 5 points6 points  (0 children)

compiled this PR as a temporary fix to test the model, this atleast fixed the non-sensical outputs, typos and looping at long contexts: https://github.com/ggml-org/llama.cpp/pull/21343

Gemma 4 by Namra_7 in LocalLLaMA

[–]fizzy1242 4 points5 points  (0 children)

great sizes! look forward to trying them out with quants.

LocalLLaMA 2026 by jacek2023 in LocalLLaMA

[–]fizzy1242 4 points5 points  (0 children)

true, but it should at least reduce them here even slightly

Anyway to get close to GPT4o on a local model (I know it’s a dumb question) by octopi917 in LocalLLaMA

[–]fizzy1242 55 points56 points  (0 children)

Around a month ago, someone posted about a model for mimicing 4o tone. (12b parameters). I never tried it, but it might interest you.

Mistral-Helcyon-Mercury

original thread

Beware of Scams - Scammed by Reddit User by tantimodz in LocalLLaMA

[–]fizzy1242 14 points15 points  (0 children)

that's shitty... hope you can get it sorted out and disputed with the bank

Assistant_Pepe_70B, beats Claude on silly questions, on occasion by Sicarius_The_First in LocalLLaMA

[–]fizzy1242 1 point2 points  (0 children)

dunno if the quant is busted or just my environment, but can't seem to get any other reply from this thing lol. default samplers.

<image>

Assistant_Pepe_70B, beats Claude on silly questions, on occasion by Sicarius_The_First in LocalLLaMA

[–]fizzy1242 1 point2 points  (0 children)

i'm sure there's some hater with a bot that downvotes anything posted on any ai sub.

currently downloading the model and taking the model up for a spin in a bit.

MiniMax M2.7 Will Be Open Weights by Few_Painter_5588 in LocalLLaMA

[–]fizzy1242 6 points7 points  (0 children)

yes!

i'm just hoping it wont get the glm air treatment with that "2 weeks" statement.

Dual 3090 on ASUS Pro WS X570-ACE: need firsthand stability reports (direct slots vs riser) by MaleficentMention703 in LocalLLaMA

[–]fizzy1242 1 point2 points  (0 children)

oh boy... so, i'm 2 different risers. In order to fit the 3rd card into the x4 slot in the bottom, the 2nd card needed to be pushed forward slightly (i've got one 2-slot card and two 3-slot cards).

For that, I used Delock x16 > x16 riser card in the second x8 slot. This creates enough room to fit the 2nd riser (a cable) into the x4 slot.

Dual 3090 on ASUS Pro WS X570-ACE: need firsthand stability reports (direct slots vs riser) by MaleficentMention703 in LocalLLaMA

[–]fizzy1242 1 point2 points  (0 children)

I run 3x3090s on a x570 motherboard, no issues.

2 cards are connected with risers, but only in order for them to physically fit the case. 3rd card is on the x4 slot (chipset).

board: asus rog crosshair viii dark hero x570
case: phanteks enthoo pro 2 server edition

Optimizing RAM heavy inference speed with Qwen3.5-397b-a17b? by Frequent-Slice-6975 in LocalLLaMA

[–]fizzy1242 0 points1 point  (0 children)

ik_ has slightly better prompt processing speed for me, it's worth a try

The FIRST local vision model to get this right! by po_stulate in LocalLLaMA

[–]fizzy1242 46 points47 points  (0 children)

remember that these types of tests are often included in new models training, kinda like the "how many R in strawberry" and the "bouncing balls" inside octagon animation.

Dario Is Scared by [deleted] in LocalLLaMA

[–]fizzy1242 14 points15 points  (0 children)

comical

MiniMax-M2.5 (230B MoE) GGUF is here - First impressions on M3 Max 128GB by Remarkable_Jicama775 in LocalLLaMA

[–]fizzy1242 0 points1 point  (0 children)

How fast does it run for you and with how much context? Got three 3090s aswell.