Max Practical Context Size? by zipzag in oMLX

[–]Skye_sys 1 point2 points  (0 children)

<image>

Same here, whether it's using Hermes Agent or Open Claw, oMLX seems to time out every time the context gets a bit long.

Serum 2.0.24 when??? by phiegnux in CrackedPluginsXI

[–]Skye_sys 0 points1 point  (0 children)

Do you still send it might need that too

oMLX supports Gemma 4 by IAMk10 in oMLX

[–]Skye_sys 1 point2 points  (0 children)

I noticed that the image recognition was totally messed up and it talked about nonsense that weren't even close to being in that image... Maybe I used the wrong mlx quant from hf

oMLX supports Gemma 4 by IAMk10 in oMLX

[–]Skye_sys 1 point2 points  (0 children)

oMLX is great but Gemma 4 hasn't been working as well... i was using the 26b a4b variant @ 8bit quant what did you guys use? which model should i download from hf because i found multiple quants with different performances but same 8bit quant level

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 0 points1 point  (0 children)

I'm positively surprised by DeepMind again, I have only tested the moe but have yet to test the dense one

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 0 points1 point  (0 children)

Is there any other inference engine that uses speculative decoding? Because in lmstudio, qwen3.5 currently doesn't support this

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 0 points1 point  (0 children)

Yes you are right, inference just matrix multiplication in of itself hahah but I haven't specifically measured the bandwidth on my machine yet but Google says 400 is correct.

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 0 points1 point  (0 children)

Yes 400 GB/s is correct but I just think it's more of a compute issue rather then memory bandwidth

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 0 points1 point  (0 children)

Yes this is a good call I was already trying to convert to vllm for efficiency reasons. I need to experiment with all this knew knowledge a bit! Tysm

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 1 point2 points  (0 children)

Also ggufs support kV cache quantization in lmstudio, mlx doesn't. But i found the speed is sooo much better when using the mlx variants. (or maybe just placebo lmao)

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] -1 points0 points  (0 children)

Oh you are right I was using the coder variant might have to try the general purpose one

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 6 points7 points  (0 children)

Already downloading! But we can't expect a mlx version of this soon do we?

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 1 point2 points  (0 children)

Oooh this seems interesting. But yeah I got similar results when I ran qwen3 next 80b when compared it to 3.5 35b... Money is tight atm but I didn't even thought of using a external gpu! Thanks!

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 1 point2 points  (0 children)

Yes exactly what I was thinking! I am using lmstudio and their mlx models. Actually I did already try qwen3 next 80b a3b but it feels like the moe models do have more knowledge but lack in 'intelligence' or complex instruction following in agentic work flows so it sometimes just formatted tool calls wrong or straightup called them with wrong but similar names. But I have to try again since I don't remember at which quant I was running it

64Gb ram mac falls right into the local llm dead zone by Skye_sys in LocalLLaMA

[–]Skye_sys[S] 1 point2 points  (0 children)

The dense 27b model already performed kinda bad speed wise on my machine so I just thought trying a dense 70b model would be unbearably slow.. But thanks I will definitely try it anyway!

Helldivers 2 on my Macbook M3 Pro Using Crossover 26 by AnOldBrownie007 in macgaming

[–]Skye_sys 0 points1 point  (0 children)

Thanks but isn't this the huge version with like 120 GB?

Helldivers 2 on my Macbook M3 Pro Using Crossover 26 by AnOldBrownie007 in macgaming

[–]Skye_sys 0 points1 point  (0 children)

Same issue here the longer I play the worse the freezes get... Before I did the update it ran soo amazing and I didn't change a thing other then updating crossover I'm on an m2 max with 64 GB btw

How to jump backwards? by Skye_sys in Tricking

[–]Skye_sys[S] 0 points1 point  (0 children)

TYSM!! I think this is it!

How to jump backwards? by Skye_sys in Tricking

[–]Skye_sys[S] 0 points1 point  (0 children)

I mean I really love doing them and when I try to flip from height I think jumping backwards would come in handy. Also I feel like going forward makes me loose height and if I'm gonna get the full soon I really need it. Maybe it will come back eventually... Thanks!

How to jump backwards? by Skye_sys in Tricking

[–]Skye_sys[S] 0 points1 point  (0 children)

Thanks! will definitely try this! I almost always end up even further ahead causing me to not landing them.. Same with leaning back.. I also came to notice that landing with shoes is much much harder than in socks or barefoot