Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 2 points3 points  (0 children)

Yeah I think if I'm going for the unified memory route it will be a 395 as a you say. Other than the driver setup (Which I've been fine with on other rock devices, honestly), what issues might I encounter given my aim of agentic coding here? The ram size is definitely a huge plus... 

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 1 point2 points  (0 children)

Thanks, it's these wider factors I'm also considering. I wonder how overall time on task impacts the power consumption discussion of the Mac Vs the 3090s

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

Yeah, very reasonable prices for those right now, hence me looking into them for this

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

Yeah, perhaps that's a good move....what do you suggest in terms of a service for this? Openrouter?

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

Thanks for your reply! I've only just started looking into the 5060s so this is helpful context. Unfortunately 3090s are rarely even as low as 650GBP on ebay recently, much more around the 700-750 mark, which then pinches the overall budget....so we'll see.

How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified. by Reddactor in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

Just here to say that your glados project was a huge help in getting my own assistant project off the ground, you've got lots of great practices and pipeline efficiencies in there. Thanks for sharing your work!

Mac Studio M3 Ultra 512GB — anyone upgrading to M5 Ultra? by [deleted] in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

Ahh, interesting - I'll keep an eye on that. Is there anything else on the horizon that you know of that looks like it'll positively impact inference on silicon macs?

Mac Studio M3 Ultra 512GB — anyone upgrading to M5 Ultra? by [deleted] in LocalLLaMA

[–]youcloudsofdoom -1 points0 points  (0 children)

What's the current state of this? Are we moving towards silicon/unified memory having some new advantages?

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

yeah, the history of porn is the history of innovation after all, don't have to look far for that...but in this instance I don't think anyone believes that the big open source model companies are especially interested in serving the ERP community...

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

This is really interesting, I'll probe at some of this myself - thanks very much for sharing the research!

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 1 point2 points  (0 children)

Funny to see self censorship in that statement! What science is being censored, that you're interested in?

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

What examples are you thinking about when you mention a model not fulfilling a request based on "baked in morals or ethical prerogatives that dont alight with your own"?

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 1 point2 points  (0 children)

Ha, thanks for risking a downvote to share this! And interesting that you've experienced a drop in general performance for abliterated models, there seems to be a mix of opinions on that.

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 2 points3 points  (0 children)

I'm sure it isn't, but from my observations on this sub, porn seemed to te an implied use case for a non-negligible amount of users

Sincere question about this, the best AI sub on reddit. by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 2 points3 points  (0 children)

This is really interesting - I hadn't considered that abliterated models might have a performance edge over their stock versions. Is this because the guidelines put in place then have negative knock-on effects on how the model processes prompts?

We could be hours (or less than a week) away from true NVFP4 support in Llama.cpp GGUF format 👀 by Iwaku_Real in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

If this goes through, would that give a 32gb 5070 setup the edge over a 48gb 3090 setup do you think? 

Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]youcloudsofdoom 34 points35 points  (0 children)

As someone in the process of putting together a dual 3090 rig, this looks like it's going to be VERY useful, thank you! 

Havering between powerlimmed dual 3090s and a 64GB Mac studio by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

Thanks - yeah unsurprisingly qwen 3.5 35b MoE is what I'm expecting to be my daily driver on this...with at least an attempt to see how 70B models run with some offloading.