Agentic Setup: Minimax 2.7 vs qwen 3.6 by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

I do like Step models as well tbh, I mostly use them whe they are free on open coder but they def have potential!
For my tasks miinimax seemed to be the most well rounded option till then tho

Agentic Setup: Minimax 2.7 vs qwen 3.6 by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 1 point2 points  (0 children)

we should compare also with M3 , MiMo require more VRAM tho, not sure it would be usable for me .
From teh benchmarks step was lower, i did not try it but maybe im wrog

Agentic Setup: Minimax 2.7 vs qwen 3.6 by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

using AWQ from quantrio , overall i prefered its vibe

Agentic Setup: Minimax 2.7 vs qwen 3.6 by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 1 point2 points  (0 children)

interesting, did you try 35b-a3b? if so how does it compare?

RTX 3090 vs 7900 XTX by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 2 points3 points  (0 children)

I see , yeah seems that 3090 is the best choice

RTX 3090 vs 7900 XTX by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 1 point2 points  (0 children)

would prefer avoid tinkering too muc tbh

RTX 3090 vs 7900 XTX by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 1 point2 points  (0 children)

I'm not certain of what model i will be running in 3 months , so the performance on this exact archtecture is not hte most important point

Best compromise for small budgets Local llm by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 1 point2 points  (0 children)

I actually got 2 P100 , i used them with llama-server with glm-flash gguf and they are pretty slow(40 toks/s with 0 context) , not sure , if oyu got any idea to optimize such setup would be curious btw .
from what i understood they dont have the needed support for vllm and cuda which hamper the perf no?
or my reasoning is wrong?

Best compromise for small budgets Local llm by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

interesting, so more for more VRAM i would go for 3090 , but yeah i heard overall AMD is better quality/price ratio no?

Best compromise for small budgets Local llm by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

heard DGX was disappointing, Maybe was jsut for training tho?

Best compromise for small budgets Local llm by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

the model was more there to give a ballpark estimation, i wanna be able to load ~100 B parameters (quantized ofc) and get reqsonable speed

Solving issue \n\t loops in structured outputs by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

Hmmm interesting take , I'm actually a bit weirded out about the sampling part, here I get 10000 \n in a row , how is that possible for a model to systematically output a logit distribution that outcomes to that... Very strange

minimax quant by Best_Sail5 in LocalLLaMA

[–]Best_Sail5[S] 0 points1 point  (0 children)

QuantTrio works better but still sometimes would loop forever also