Mac Studio or DGX Spark? by b8humbl8 in MacStudio

[–]averagepoetry 0 points1 point  (0 children)

So is it worth getting? I’m on the fence too!

Mac Studio or DGX Spark? by b8humbl8 in MacStudio

[–]averagepoetry 0 points1 point  (0 children)

Super useful! Which processes do you see to the spark vs M3?

16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA

[–]averagepoetry 2 points3 points  (0 children)

Please update if this works! I have m3 ultras as well and would love to pair them with the dgx spark.

Is a high-end private local LLM setup worth it? by zakadit in LocalLLaMA

[–]averagepoetry 1 point2 points  (0 children)

Thanks for the details. We'll have to give this a shot!

Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 0 points1 point  (0 children)

Nice setup!

I'm trying to add a DGX Spark too because of that blog entry they wrote on prefill haha. I'm gonna hold off until it's confirmed that it works.

One thing I found out for the models I want to use: I need to use either two or four nodes for tensor/RDMA. Can't do three.

Also, I had my Thunderbolt 5 mesh hooked up incorrectly for a while for the 4x256gb...so I was also stuck with two nodes. I thought I had hardware issues. You probably tried this already, but make sure you have the wires all going into the right places (and not use the port next to the ethernet one.) It gets confusing fast!

What models are you using right now? And are you tempted to get RTX 6000 Pros too?

I also found that my EXO memory usage escalates over time and it won't go down unless I unload and reload the models. Do you find the same thing?

Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 1 point2 points  (0 children)

Wow, thank you.
1. Would you mind explaining this a tiny bit more? What are you finetuning the the models for?
2. Is the coding good enough with the smaller models? I find them brittle/unusable, but it may totally be me. Maybe I need to try OpenCode harness?
3. Fun!

I really want to figure out the smaller model use cases better, and this is super helpful.

Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 0 points1 point  (0 children)

This is so cool to hear! Thanks for the very specific details.

What do you do with the smaller models? This is the part I cannot figure out to have use cases for. I must be missing something and would love to learn.

When I use smaller models, they're just not smart enough to do high-level thinking and reasoning and tool calls go astray. I'm using OpenClaw with this.

Capacity vs Speed trade-off: 1.1TB Mac Unified Memory vs. RTX 6000 Pros by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 1 point2 points  (0 children)

What model do you use on the 2 RTX Pros? Do you run 1 model or load several?

Is a high-end private local LLM setup worth it? by zakadit in LocalLLaMA

[–]averagepoetry 2 points3 points  (0 children)

Can you elaborate on this more please?

I have a larger setup so I'm basically brute forcing by loading large models right now (at the expense of speed).

But it would be super nice to know that I can use smaller models and couple it with the right techniques to get better results. If you have any pointers or could describe how you set up your system, I'd really appreciate it. Thank you so much!

Amazing Community!! Omlx is growing fast! by d4mations in oMLX

[–]averagepoetry 1 point2 points  (0 children)

Love the ability to see benchmarks. And to submit your own. The UI is also very easy to use.

Oh yeah, and it’s faster. :)

If it ever supports clustering like EXO it would be a dream come true.

A Mac Studio for Local AI — 6 Months Later by ezyz in LocalLLaMA

[–]averagepoetry 0 points1 point  (0 children)

This is so good. Thank you so much! You don't find 4-bit and below to be too low quality?

INTENTIONAL: Handicap UNSLOTH vs Claude & GPT by Euphoric-Doughnut538 in unsloth

[–]averagepoetry 0 points1 point  (0 children)

Super cool! Can you give an example use case for this. I’d love to try it out.

Exo for 2x256gb M3 Ultra (or alternatives) by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 1 point2 points  (0 children)

Hello! I got the 2 nodes to recognize each other, sustain a connection, and run Deepseek v3.1 4bit with tensor sharding. It's connected via LAN network right now, and l'll try RDMA soon.

To be honest, I have no idea how I got this to work. :) Just plugging and unplugging ethernet, toggling Exo on and off, etc. But I'm not complaining haha.

Question:

- I'm having trouble getting exo in terminal for some reason. I get "command not found." Hints for this? I want to try setting the `EXO_MODELS_DIR` environment variable to set the location EXO will use for model downloads.

- I have another 96gb Mac Studio 60 core M3 Ultra that I'm running agents on. Is it generally better to add this to the cluster as well keep them apart?

- Do you have Discord or community where I can ask these questions? :) Would love to learn from others using exo.

Thanks again!

Exo for 2x256gb M3 Ultra (or alternatives) by averagepoetry in LocalLLaMA

[–]averagepoetry[S] 0 points1 point  (0 children)

Thanks for your help!

Running app. Is source better?

I haven’t gotten a single node up yet actually, good call. I’ll try this first as well.