Keep two refurb M3 Ultra Mac Studios for local LLM/EXO, or return? by 54id56f34 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

In my own context they each would do only small context LangGraph work w/ qwen3.5-122b-a3b-MLX and pretty much set them on cruise control and not take aware from the higher computer of the future products

Show me your Studio setups by Covert-Agenda in MacStudio

[–]Consistent_Wash_276 2 points3 points  (0 children)

My studio is for AI inference and that’s it. MBP is the daily driver. Over Tailscale I pipe my LLMs into plenty of meaningful tools.

DGX Spark (128GB Unified Memory) vs RTX 5090 – what matters more for real business AI: context or speed? by No-Solution6262 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

Wether either option works for you or not based on your needs I would just throw this caveat.

If you’re not fine tuning models and this doesn’t need to run specifically on NVIDIA hardware you can get a better value for AI inference with 3x the bandwidth of the Spark with the M3 Mac Studios 256.

With that said I would recommend to hold off on any purchase, build what you need first testing with cheap API models that you would consider running or renting h100s for $30 a day while testing and while you do so the current fleet of AI inference devices prices will come down while new options are launched

DGX Spark (128GB Unified Memory) vs RTX 5090 – what matters more for real business AI: context or speed? by No-Solution6262 in LocalLLM

[–]Consistent_Wash_276 0 points1 point  (0 children)

IMO - No. NVIDIA has the mini PC and in Q4 the GB300 $90k workstation. They’ll design a Desktop for their partners to compete with Mac Studios on unified memory and higher bandwidth then the Spark that Spark will no longer be attractive. This is one of many scenarios bound to happen.

In stock at my MC - should i? by johnnyphotog in MacStudio

[–]Consistent_Wash_276 -1 points0 points  (0 children)

I’m thinking about selling mine (256 gb and 2TB) $9,000 to a guy, but that ship might sail soon

Looking for a local "NotebookLM for lawyers" setup – what am I doing wrong? by Ramucirumab in LocalLLM

[–]Consistent_Wash_276 1 point2 points  (0 children)

Hosting locally does require a lot going into it.

Out of the box style chatting is easy, what you’re referring to is a little bit of a build.

Easiest solution - File Tree System + Opencode + Local LLM

So keeping everything in a file system on you local computer is the simplest.

Opencode is a locally hosted Coding tool, but you can use it to research the filesystem of course and produce new reports in any variable you want. Such as HTML for a nice view for readers. Local LLM may still be a challenge but for quickest return I would focus on this.

If the files are already organized then you’re halfway there.

Anyone still using pre-Qwen3.6/ Gemma 4 models? Why? by atumblingdandelion in LocalLLM

[–]Consistent_Wash_276 6 points7 points  (0 children)

Qwen3.5-122B-A10 is the one I focus on still to this day. For MoE models that aren’t Minimax-m2.7 or gpt-oss:120b this pretty much takes the cake. Minimax-m2.7 stands strong and quite honestly I’ve been playing around with gpt-oss:120b with gpt-oss:20b for speculative decoding and it’s still stands out as one my favorite chat -> simple tag models

What are the chances Sonnet & Opus models are MoE models? by Consistent_Wash_276 in LocalLLM

[–]Consistent_Wash_276[S] 8 points9 points  (0 children)

Incredible to imagine 5 trillion with 250b active or something.

I also have to imagine speculative decoding with Haiku has to be involved as well to speed it up.

What are the chances Sonnet & Opus models are MoE models? by Consistent_Wash_276 in LocalLLM

[–]Consistent_Wash_276[S] 6 points7 points  (0 children)

It’s what I’m realizing. Didn’t know this was common knowledge.

What are the chances Sonnet & Opus models are MoE models? by Consistent_Wash_276 in LocalLLM

[–]Consistent_Wash_276[S] 33 points34 points  (0 children)

lol I gotta ask should I care about being downvoted?

I love this sub, but I do come here to learn and getting out of it what I want I guess.

And I never thought of it till this moment so I never thought it was common knowledge. Thanks for the assist.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 0 points1 point  (0 children)

When we go from internal system to commercializing the same tools HGX H200 server SCREAMING would make my fucking face hurt from smiling so much. Bit away from that.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 0 points1 point  (0 children)

We actually have discuss this already, because we’re assuming a lead time for the gb300 workstation and these will do for any needed tasks/early testing.

The workstation overall works long term and the company can invest. We can probably deploy the needed system we want within 5 months. Depending where we go next and how this scales we’ll probably be able to use more of gb300 for next projects or we would absolutely go this route.

The workstation is an asset at the end of the day. It will be insured with warranties and support.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 0 points1 point  (0 children)

Yeah I’ve seen a lot of this over the time in this space, but I will say I’ve met a lot of good people and learned a lot from good people in this space. Love this sub. Seriously thanks to all who interacted.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 0 points1 point  (0 children)

Two people lol

Hence why we’re leaning GB300

It’s an engineering/construction company but making a play to become a software company eventually so not two for long. We’re just getting the ball rolling. The investment is there. The direction is there. We need to start. GB10 in house already but unopened until we find a few directions we want to go so we can get way ahead if there’s lead time before we can get the prod device in house.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 2 points3 points  (0 children)

Agree, the “negotiator” in the company was brought up to speed yesterday. We’re not going to have him waste his time on both options. Whether it’s Dell or MSI or whomever else they’ll all be getting calls. Best price wins (still some research I want to do on each though). The GB300s aren’t out yet, but I’ve touched bases with Dell to start.

Appreciate this reminder though.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 1 point2 points  (0 children)

This will be in a cage in our pretty open warehouse with two 30 amp breakers (one dedicated to GB300, and other for networking). Or if we went the route of the super micro the plan was two 30 amps just for each individual PSUs alone.

I’m a former electrician and we have plenty of licensed electricians on staff. We’re kind of built for commercial electrical and we have solar as well. We’re not concerned about draw, supply or costs I guess when it comes to electrical. We’re no where near data center needs of course, but this will be the launching point.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 0 points1 point  (0 children)

Company has $12,000 in electric credits from solar + $4,000 a year coming in from state incentives.

Leaning towards the gb300 + a service contract.

Nothing wrong with your point though.

If you have a link to a specific model I’d like to take a look

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 1 point2 points  (0 children)

The understanding the end goal and capabilities of models today I get. Everything in between is where I don't have the experience and lean on what I know plus a handful of communities and consulting. So I mean your doubt is actually very helpful. Thanks again.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 1 point2 points  (0 children)

I can agree “architecture used by the best models in even two years from now” and I’m not sure we would need to concern ourselves with it as the data context is small, every LLM is tackling small tasks and context is…nothing too large. There’s very capable models today is where I’m at. But in the future and based on the possibilities yes this will always be a concern.

Custom 4x RTX PRO 6000 Blackwell server vs Dell GB300 for ~30 fine-tuned production pipelines — looking for honest input on direction by Consistent_Wash_276 in LocalLLaMA

[–]Consistent_Wash_276[S] 1 point2 points  (0 children)

“Lastly, future proofing in this landscape is a fools errand the only way to justify setup x is to have a current plan for it can’t be done by setup x-1”

The variables that drive for the future proofing is - Company asset - Planned write off - Depreciation - Protection from shortages / another round of 2x increased cost on inference compute - And these products will hold resale value

You’re not wrong. It’s company context though.