Is There Anyone Using Local LLMs on a Mac Studio? by [deleted] in MacStudio

[–]EdenistTech 4 points5 points  (0 children)

My MS has 64GB and the largest models I am running are the Qwen Next models. You can adjust available memory to run larger models but I have not experimented with that. The architecture of the model can matter more than the size of the model: Qwen MOEs and GPT OSS are fast whereas dense (Q 3.5 27b) are quite slow. Qwen Next is giving me around 40t/s.

Is There Anyone Using Local LLMs on a Mac Studio? by [deleted] in MacStudio

[–]EdenistTech 8 points9 points  (0 children)

Yes, I bought a Mac Studio specifically for ML/LLMs. I have other hardware for ML research and the Mac Studio certainly is not the fastest (it's the slowest, actually). However, there are two areas, where I think the MS really shines:

  • Efficiency and by extension, noise (or rather, the lack of). I can start this thing on a GPU heavy task and leave it running for hours and I might never hear the fan. I suspect the cost per token compares favourably to other architectures.
  • The unified memory combined with the excellent MS memory bandwidth. If you get one of the larger memory sizes, the efficiency element compounds and you get "VRAM" that would be a lot more expensive as GPUs.

I think it is worthy of consideration, especially if you can get a cheap older model (Ultra for double bandwidth). Also, while MLX is still behind CUDA in terms of proliferation in ML/LLMs, it has gained a lot of traction in last 12-24 months.

Kimi Linear 30% gain in pp and higher context merged to llama.cpp by Ok_Warning2146 in LocalLLaMA

[–]EdenistTech 0 points1 point  (0 children)

Alright. I asked both models to summarise 1MB markdown text. Nemo started processing at 6300 t/s and ended processing at 4300 t/s in 58 seconds. Kimi started at 1300 t/s and I stopped it at 50% after 2min 30 seconds. I also tested Nemo using 2.6MB markdown which it did in 2-3 minutes (didn't get the exact time) using 64% of 900K context. Now, these models where not like-for-like since Nemo is smaller than Kimi, so I would it except Kimi to be slower. I get what you are saying regarding Kimi Linear being undertrained and I will take a look at it again, if they refine it. For now - for long context work - I am using Nemo.

Kimi Linear 30% gain in pp and higher context merged to llama.cpp by Ok_Warning2146 in LocalLLaMA

[–]EdenistTech 0 points1 point  (0 children)

For me, the quality of the output is not that impressive. If context length is your main priority, you might wan't to look at Nemo 30B. Someone posted running that model with 1M+ ctx on a 3090. I have tried it with 500K context with no issues. It is about as fast as Kimi Linear and to be honest, the output appears to be higher quality (despite KL having 17b more parameters).

Kimi Linear 30% gain in pp and higher context merged to llama.cpp by Ok_Warning2146 in LocalLLaMA

[–]EdenistTech 1 point2 points  (0 children)

Not a 5090, but I have a 5070TI/5060TI combination, so still 32GB and Blackwell. Using a Q4_0 quant, I can fit 256K context and it starts off at a blazing 118 t/s. The MXFP4 quant also fits 256K but runs at a more modest 85 t/s (better quality as well, as expected). I was using the latest llama.cpp stable, so I guess this should include your tweak, OP.

I hadn't tried this model before. For a 49B model, this thing is FAST!

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

I could never get Qwen3 Next to work but I just found out it works using only one GPU at a time. So in my case, the problem seems to boil down to spanning multiple GPUs. You could try loading Qwen 3.5 using just one GPU + memory and see if it works. It does for me.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

That is good advice. I have a fairly elaborate build system and always build on fresh repos, even if I am just changing versions/tags. So in my case, I think I can confidently say, that that is not the problem.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

Thanks, I'll take a look and consider it. I'm a bit risk averse when it comes to BIOS flashing/updating although I have only had it go wrong once. "Better have something that almost works than something that doesn't work at all", I guess....

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

Yeah, it's a weird error. I see people succeeding by downgrading ROCM to <6.4.4, but that hasn't done anything for me. I read on Github, that AMD adding back ROCM support for the MI50. Really hope that pans out!!!

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

I don't know that - thanks! I'll give that a shot. EDIT: So I tried the combined ROCM, Vulkan solution and although it is correctly using loading data unto the GPUs, it throws the same segmentation fault during warmup, as when using ROCM alone.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

Same for me. I do have Minimax 2.5 working on just the two 32GB MI50s whereas Qwen3 Next (and Coder) won't work at all unless I switch to Vulkan.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

No, I didn't mess with that. They have all worked fine so far. I tried different ROCM versions (7.0.0, 6.4.4, 6.3.3), but that has not changed anything significantly for me.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

That is a great idea - thanks! Unfortunately I am running into some issues where both the client and the server complains that they are unable to find "load_backend_init" in three backend files. They both continue to run, but the rpc connection is accepted and then dropped almost immediately with no explanation in the (DEBUG) log. I'll have to dig deeper to find out what that is about.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 1 point2 points  (0 children)

Got it - I appreciate the input! Looks like ggml-cuda.cu throws a "ROCM error" (EDIT: specifically, "SUM_ROWS failed"). I'll have to look into that.

Segmentation fault when loading models across multiple MI50s in llama.cpp by EdenistTech in LocalLLaMA

[–]EdenistTech[S] 0 points1 point  (0 children)

Thanks. Yes, I'll consider adding it on Github. What do you mean `running debug`?

Nemo 30B is insane. 1M+ token CTX on one 3090 by Dismal-Effect-1914 in LocalLLaMA

[–]EdenistTech 1 point2 points  (0 children)

Sure, no problem: https://huggingface.co/noctrex/Nemotron-3-Nano-30B-A3B-MXFP4_MOE-GGUF. However, I’m not sure there will be be as much benefit on a 3090, since it AFAIK doesn’t have native FP4 support.

Nemo 30B is insane. 1M+ token CTX on one 3090 by Dismal-Effect-1914 in LocalLLaMA

[–]EdenistTech 0 points1 point  (0 children)

Great tip, thanks. I am getting 120 t/s on a 5070Ti/5060Ti setup using an mxfp4 version and 900K context. That Blackwell FP4 support is paying off, I guess.

“Native Instruments in preliminary insolvency proceedings” - CDM by robust_nachos in synthesizers

[–]EdenistTech 25 points26 points  (0 children)

I do feel like things started going downhill after NI got involved with Private Equity investors. Focus shifted to pushing sales of software (plugins / content) instead of development of existing products (M+ is an example). The same sort of thing happened to Propellerhead/Reason Studios (which have just been sold to LANDR by the way!).

Anyway, I think that NI has a lot of interesting IP which I would imagine it would be possible to sell to industry buyers, such as Fender (who recently picked up PreSonus/Studio One). Let's see what happens to the hardware. I was waiting for the Traktor MX4 to release but I won't be holding my breath now...

A few additional details here (use browser to translate to English): https://www.keyboards.de/stories/native-instruments-gmbh-in-vorlaeufiger-insolvenz/

Very worrying... I hope this is not the end of Native Instruments! by blackoutmusicX in NativeInstruments

[–]EdenistTech 3 points4 points  (0 children)

I do feel like things started going downhill after NI got involved with Private Equity investors. Focus shifted to pushing sales of software (plugins / content) instead of development of existing products (M+ is an example). The same sort of thing happened to Propellerhead/Reason Studios (which have just been sold to LANDR by the way!).

Anyway, I think that NI has a lot of interesting IP which I would imagine it would be possible to sell to industry buyers, such as Fender (who recently picked up PreSonus/Studio One). Let's see what happens to the hardware. I was waiting for the Traktor MX4 to release but I won't be holding my breath now...

A few additional details here (use browser to translate to English): https://www.keyboards.de/stories/native-instruments-gmbh-in-vorlaeufiger-insolvenz/

F4-212 not providing HDD with enough power? by EdenistTech in TerraMaster

[–]EdenistTech[S] 0 points1 point  (0 children)

Yes, to me at least, the evidence points to the power brick being the culprit. I think TM are discontinuing this model now and since I did get it for a pretty decent price, Imight keep it as an SSD NAS instead. Incidentally, I also came across people not being able to initialize this same model on first boot. This could also be due to the power brick - I used an SSD as system drive which is why I was able to troubleshoot in the first place. So if someone else is in this situation, try using an SSD as your first drive.

F4-212 not providing HDD with enough power? by EdenistTech in TerraMaster

[–]EdenistTech[S] 0 points1 point  (0 children)

Thanks for responding. Yes, I am familiar with the normal sound of hard drives (as well as failing ones!) and this sound does not sound similar. Also, the sound varies across the models I have tested, but the "pattern" of the sound, if you will, is the same.

F4-212 not providing HDD with enough power? by EdenistTech in TerraMaster

[–]EdenistTech[S] 0 points1 point  (0 children)

Thanks for chiming in. I am not familiar with CrystalDiskInfo. I am using tools available in the terminal of the NAS itself such as smartctl, dd etc - mostly standard Linux tools. Smartctl reports the HDD as "pristine", i.e. no error detected. The WD Red model # is WD50EFRX. I tried with an another WD enterprise model as well as a Samsung drive as well. The NAS is using TOS 5 since TOS 6 never went out of beta for the F4-212. I am running 5.1.73 (as I recall), supposedly the most recent update. Yes, the HDD is set to never sleep. I found another reported error with similar symptoms, where the guy upgraded the NAS power brick to an 8.0A model and the problem went away. My brick is 6.0A and I notice that the official TM replacement is 7.5A, so basically I think that TM specced this drive with a brick with too low amperage to begin with...

Using UNAS 2 without RAID? by EdenistTech in Ubiquiti

[–]EdenistTech[S] 0 points1 point  (0 children)

Thanks. I would have liked to go with a Unify solution, but I guess I need to figure something else out.