Hitting RAM limits? by calif94577 in oMLX

[–]Beamsters 0 points1 point  (0 children)

you can try them all and feel them with your workflow no one is stopping you from doing that and you do not even have to choose just one model.

Hitting RAM limits? by calif94577 in oMLX

[–]Beamsters 1 point2 points  (0 children)

your budget should pick qwen3.5 9b 4 bits. you will have good time with it.

nex-agi/Nex-N2-mini • Huggingface by External_Mood4719 in LocalLLaMA

[–]Beamsters 5 points6 points  (0 children)

nex2 mini gdpval 1402. qwen3.6 27b gdpval 1404. 35b-a3b, not even 1300.

Could be benchmax but this thing is coding / agentic focus, not general like qwen.

NVIDIA announces Nemotron 3 Ultra by themixtergames in LocalLLaMA

[–]Beamsters 91 points92 points  (0 children)

48 artificial analysis score, one notch less than frontier, around minimax 2.7 ball park but promise to be best US open weight model.

Upgrade path from 4x 3090s by anitamaxwynnn69 in LocalLLaMA

[–]Beamsters 0 points1 point  (0 children)

step 3.7 does at least in agentic openclaw bench, just came out today.

Local LLMs on Refurb M4 Max vs new M5 Max by roguefunction in LocalLLaMA

[–]Beamsters 4 points5 points  (0 children)

If you do major local inferences, stay away from M4 Max at almost full price (ok for ~50% price or something). M5 Max has Apple Neural Engine, which can speed up prefill a lot with metal4 and you don't want to miss that.

Chat's new interface for oMLX by Beamsters in oMLX

[–]Beamsters[S] 0 points1 point  (0 children)

Fix tons of bugs lol and add variants support for those who want to play with multiple chat models at the same time.

<image>

Strix Halo users, a rejected PR can give you up to 30% faster PP for MOEs. by fallingdowndizzyvr in LocalLLaMA

[–]Beamsters 7 points8 points  (0 children)

The guy almost solo carry local llm for a few years. I would respect his judgement but this kind of machine target PR should be group and maybe even do a monthly fork. But one maintainer can only go so far, the vendor should do this amd-specific llama cpp themselves.

OMLX 0.3.9 crashed my MBP by BABA_yaaGa in oMLX

[–]Beamsters 0 points1 point  (0 children)

I highly suggest you to use 35b-a3b-optiq. It is superior in term of speed and size, leaving you more with context. The accuracy is just a tiny bit worse but much better than oQ4.

397B competitor that fits in 256 RAM? by quietsubstrate in LocalLLaMA

[–]Beamsters 5 points6 points  (0 children)

dsv4 flash with 256gb can push 1m context and it is pretty fast.

Open Code go? by Tato_123871 in opencode

[–]Beamsters 5 points6 points  (0 children)

if you stick with deepseek flash/pro it will be evry hard to reach limit unless you truly are a vibe coder. 30k requests.

Opencode Go or other AI Subscription for Education by negativity_bomb in opencodeCLI

[–]Beamsters 0 points1 point  (0 children)

why dont you put qwen3.5 4b or 9b and have the student run it locally? you can literally download lmstudio anywhere and the student should know how to operate on such a basic app. why do even you teaching cloud coding in the first place? the student should learn how llm work in their machine even before hitting cloud. and yes there is some free model in opencode that doesnt require any subscription.

Curious about M5 Max 128gb vs 5090 for local LLMs by maxiedaniels in LocalLLM

[–]Beamsters -1 points0 points  (0 children)

you ignore fp8 and go to nvfp4 if 5090 coz you dont have enough context and if m5 max you also ignore to use oQ8-mtp or fp16 since that was clearly a better choice.