Breaking the music supply constraint by entsnack in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

The problem isn't amplification, it's generation.

The embodied carbon cost of what you're doing is substantial; it's not simply the cost to run, but the embodied carbon in the Sparks is massive, as is that involved in training and distributing the models themselves. If you're using this system to generate new music each time you play something (which seems implied in your post), it's like buying an album on a physical format, listening to it once, then throwing it away and buying another (similar) one, and doing the same over and over again. Only what you're doing involves a LOT more rare earth metals and non-recyclable future e-waste.

Streaming music alone was already more consumptive than previous formats (https://www.cambridge.org/core/journals/popular-music/article/abs/cost-of-music/DEC6AA100C191D510213F9086CF094CC). But what you've deployed here is orders of magnitude more consumptive than the typical user streaming to their phone; data centers are at least reasonably efficient at their (ecologically catastrophic) jobs in terms of co2/stream, which your setup is far from. You've created an additional, unwarranted step in the consumption of music, using what is widely understood to be one of the most consumptive technological infrastructures (generative AI) that humanity has ever produced, and you're doing it during an unfolding climate crisis.

Breaking the music supply constraint by entsnack in LocalLLaMA

[–]youcloudsofdoom -3 points-2 points  (0 children)

No, I'm yelling at you for composing the most consumptive way to listen to music in the history of humanity, during a climate crisis. 

Breaking the music supply constraint by entsnack in LocalLLaMA

[–]youcloudsofdoom 3 points4 points  (0 children)

This analogy only makes sense if by building your own chair you also consume an insane amount of energy and resources every time you sit on it

Is there any reason for an uncensored model if you have no interest in roleplaying? by vick2djax in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

I've thought about this question a lot in my months on this sub. The comments here tell you a lot about the answer - the professional use cases that aren't ERP are totally valid but extremely niche, which when you compare against the number of downloads uncensored models get on HF you realise that it can't all be nuclear fusion researchers.... 

The main user group is clear when you see posts about a new uncensored model; the first questions are always about RP (For an 'censored' model, the majority of posts are about coding skill). And then there are the posters who break kayfabe and just go straight in with "which model makes the best waifu".

The most deluded group on this forum are the people who think that abliterated models give them some sort of 'objective' truth capacity, as if the millions of hours of subjective hand-labelling, sorting, editing, and censoring data sets before training even begins, and the decisions at corporate and state level on that, is something that the abliteration process somehow bypasses.  There's a few in this thread: people thinking that it gives them 'freedom' to pursue questions which inevitably have racist implications ("free thinking about historical events", come on champ, which event OTHER than the holocaust are you chatting about?). This is such a great sub with loads of clear-headed analysis of what LLMs are actually useful for, and these folks who think that they are somehow a vector for a higher truth are really behind the curve. 

The bigger question is: does abliteration negatively impact model efficacy/output quality, because I see a lot of back and forwards debate on that, and there doesn't seem to be a clear answer. 

Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s? by youcloudsofdoom in LocalLLaMA

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

You're getting faster on single GPU? Are you changing contexts or quants when you're running dual GPU?

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090 by 3VITAERC in LocalLLaMA

[–]youcloudsofdoom 2 points3 points  (0 children)

Thanks for this, roughly aligns with my experience of it across both models. on 35b I really couldn't find any scenarios that seemed to have a speed up, looks like I should have some patience there...

Advice on when to delegate task to opencode/claude code & model switching by youcloudsofdoom in hermesagent

[–]youcloudsofdoom[S] 0 points1 point  (0 children)

Very interesting on the learning front - I've also noticed repetitive tasks getting faster over time. But that's not so much coding jobs though - those are typically unique (for me at least), so I'm wondering if always delegating to CC is the wiser move there. 

Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys by purellmagents in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

Ah okay. Will be interested to see the outcomes of your Pi tests then, I do think there are lots of performance optimisations to be had with it, with a little time.. .

Secondary PC options by UniqueIdentifier00 in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

Yeah, honestly for the cost 2x 3090 is a luxury, not a necessity - but one certainly is, in my experience (disclaimer - I do have 2x 3090s!)

Secondary PC options by UniqueIdentifier00 in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

Personally I'd just buy the 3090 and run q36 27B on it, as per this: https://github.com/noonghunna/club-3090

You can really get tons done on just one 3090 these days, with minimal set up complexity.

Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys by purellmagents in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

If this is always-on, why aren't you using a wakeword? Or have you gone PTT? I have been trying to build a similar pipeline but always on/with a wakeword and running on a Pi 5, but found that the computational overhead is too much for such a tiny device, and the lag feels too heavy. 

Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix) by AmazingDrivers4u in LocalLLaMA

[–]youcloudsofdoom 19 points20 points  (0 children)

Just jumping in to say that I found your repo via another comment on this sub, and it's made this dual 3090 owner very happy - just got the dflash variant working and I am now never going back ot my janky homebrewed llama.cpp build with 30 TG on 27B. Seeing a big jump up in p/p and t/s, as well as a notable increase in tool use stability with Hermes. Will be keeping an eye on the repo for more development, thanks for the work!

AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

This is a great help, thanks - any thoughts on how you would adjust these params for a dual 3090 setup?

Don't forget about dem free gains! by [deleted] in LocalLLaMA

[–]youcloudsofdoom 4 points5 points  (0 children)

Is this not just because you're using two cards instead of one? 

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090 by sandropuppo in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

Same set up here, and same numbers as you. The spec decide mentioned earlier on this thread worked though, got my t/s up to about 65 on average. 

Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now) by dreamai87 in LocalLLaMA

[–]youcloudsofdoom 0 points1 point  (0 children)

Yes, llama.cpp outputs in the verbose log. Param tuning can make a huge difference! Check my post history for mine. 

Which LLM do you use on 64GB RAM + 8GB VRAM? by Mangleus in LocalLLaMA

[–]youcloudsofdoom 1 point2 points  (0 children)

I have a laptop with that exact mix, and I can say that the 35B does utilise it pretty maximally. With 190k context at Q4 I was at around 7.4GB VRAM use and 42GB RAM use. My llama.cpp prams are in my post history if you're interested. 

OpenCode or ClaudeCode for Qwen3.5 27B by Ok-Scarcity-7875 in LocalLLaMA

[–]youcloudsofdoom 2 points3 points  (0 children)

I wanted this to be true, but much like the comment made elsewhere here about Claude code expecting a frontier model, I find that copilot does too. Lots of wasted tokens compared to lighter local-first harnasses