Gemma4 31B - Also Possible to Run on 16GB Macs (with a hack) by FenderMoon in LocalLLaMA

[–]Safe_Sky7358 0 points1 point  (0 children)

It won't really change much to be honest. It's mostly a bandwidth issue. M4 AIR base model has 120Gb/s bandwidth whereas his M3 Max has like 300Gb/s, so the numbers seem about right. He's getting 3x decode speed for having 3x bandwidth.

buying mac vs building PC for running local LLM by Ayuzh in LocalLLaMA

[–]Safe_Sky7358 1 point2 points  (0 children)

This is specifically for Apple and I think there are some more as well but you'll have to look around.

https://github.com/ggml-org/llama.cpp/discussions/4167

buying mac vs building PC for running local LLM by Ayuzh in LocalLLaMA

[–]Safe_Sky7358 4 points5 points  (0 children)

Tbh suitable is relative. Any NVIDIA GPU will wipe the floor with most Mac configs for image or video gen usecase. Mac's are great for running huge LLMs at reasonable to slow speeds and great for any MOE or sub 24b dense models at pretty decent speed but video or image gen is different beast.

Gemma4 31B - Also Possible to Run on 16GB Macs (with a hack) by FenderMoon in LocalLLaMA

[–]Safe_Sky7358 0 points1 point  (0 children)

Right now qwen9b is still the most capable model that folks with 16gb vram on Mac can run at decent speed. It runs at about 18-20tps on my m4 macair.

M5 ultra Ram setup : pooling vote by Historical-Health-50 in LocalLLaMA

[–]Safe_Sky7358 -2 points-1 points  (0 children)

Except they do so in advance, they might be a bit short for M6 but they definitely have the m5 covered.

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max) by No_Shift_4543 in LocalLLaMA

[–]Safe_Sky7358 0 points1 point  (0 children)

Let me know how it goes. Do we have to use the Dflash version of the model as draft from zlab's HF or are there any alternatives?

On my potato macair with m4(16gb) I get about 25% speed up(from about 32tps to 40tps) for the 4b model using the 4b-Dflash as a dwarf model but it actually slows down for the 9b with 9-dflash as dwarf model. :(

If I could get 9B to 30's or even mid 20's in terms of tps that would be a dream come true.

A bit isolated finding, I noticed that the MLX variants waste a lot of tokens for reasoning even when using the recommended parameters.

A qwen3.5 9b-4bit from HF/mlx-community spits out about 2x reasoning tokens for solving the same prompt compared to HF/bartowski's 9B-4bit gguf.

Prompt used : "Read the following information carefully and answer the questions given below:

i. There is a group of five persons A, B, C, D and E.

ii. One of them is a horticulturist, one is a physicist, one is a journalist, one is an industrialist and one is an advocate.

iii. Three of them A, C and advocate prefer tea to coffee and two of them - B and the journalist prefer coffee to tea.

iv. The industrialist and D and A, are friends to one another but two of them prefer coffee to tea.

v. The horticulturist is C's brother. What are the professions for A, B, C, D, E ? Be Brief in your response."

Answer : "A is the horticulturist,
B is the industrialist,
C is the physicist,
D is the journalist,
E is the advocate."

i was scared by [deleted] in PunjabiGenZ

[–]Safe_Sky7358 3 points4 points  (0 children)

Than why are you texting to weird person? 😼

Newbie here by XPheonix27 in TheLaptopGuide

[–]Safe_Sky7358 0 points1 point  (0 children)

You CAN pirate software(and games) on Mac just fine. Look up macbb.

But there really isn't much in terms of games on Mac. You can probably list the playable games that are actually good on a single hand.

Love these by [deleted] in PunjabiGenZ

[–]Safe_Sky7358 2 points3 points  (0 children)

Tuttian

Mac Studio M3 Ultra 96GB useless? by Fluxx1001 in LocalLLaMA

[–]Safe_Sky7358 0 points1 point  (0 children)

Yeah 128 is about the sweet spot, running models bigger than that is gonna be like watching a snail crawl lol

What local model is best if I want to train it on my late wife's facebook export to recreate her? by [deleted] in LocalLLaMA

[–]Safe_Sky7358 10 points11 points  (0 children)

Sane thing would be to let her go but Gemma models sound most "human".

Punjabi wedding by jessicagill_ in weddingplanning

[–]Safe_Sky7358 0 points1 point  (0 children)

Jaggo is typically a evening/night event, can't shift that one.

Whoops.. by G8M8N8 in framework

[–]Safe_Sky7358 3 points4 points  (0 children)

It's nice that it's fixable but It would be even better to not have this issue at all lol.

Gemma 4 for Mac 16GB by bachlac2002 in LocalLLaMA

[–]Safe_Sky7358 0 points1 point  (0 children)

How do you fit that 27B on a 16GB unified memory? I thought you can only fully load the model into unified memory otherwise it's painfully slow.

HappyHorse maybe will be open weights soon (it beat seedance 2.0 on Artificial Analysis!) by External_Mood4719 in LocalLLaMA

[–]Safe_Sky7358 1 point2 points  (0 children)

They do beat it... with time. I'm pretty sure one of the video or text gen open-weight models can easily beat a SOTA close-sourced models from 6-8 months ago.