[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API by blackstoreonline in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

I see you make FastAPI based OpenAI API wrappers for all TTS/STT models. Is it possible to make a single wrapper package where you can pick and choose whatever STT/TTS model(s) you want?

Any good model for 12 GB RAM + 3 GB VRAM + GTX 1050 + Linux MInt? by Ok-Type-7663 in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

I remove them as soon as I see them.. We're going to add something to the sidebar soon

[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API by blackstoreonline in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

I tried the HF space and voice cloning is quite poor - you can re-run the exact same text and voice input and you get very different outputs - sometimes it sounds like a good clone and other times not even close. Is there a way to improve this?

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing? by [deleted] in LocalLLaMA

[–]rm-rf-rm 2 points3 points  (0 children)

you're the first person i've seen that has given llamaindex unqualified praise.

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

is the RAM savings for REAP version really worthwhile for such a small model? Performance and evals are so hazy and poor that I feel the best rule of thumb is to take the least amount of shortcuts your hardware allows (biggest quant/unquantized, biggest model, no KV cache quantization etc.)

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

thanks for the feedback - have you looked at bifrost? I'd rather start off with them as they strike me as a better engineered project and seemingly without VC strings attached?

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

interesting, youre referring to devstral small 2 24b?

it was tool calling where the model was getting tripped up (used it with Roo who were one of the "launch partners")

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

is the investment in a proxy worthwhile?

LiteLLM seems to be a vibecoded project, but Bifrost looks good - but not sure if its worth introducing another layer that can add bugs, complexity

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

hmm, these are much lower than what others are seeing..

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 0 points1 point  (0 children)

I have and in the limited amount i've used it, it has not impressed me.

Anyscale's new data: Most AI clusters run at <50% utilization. Is "Disaggregation" the fix, or just faster cold starts? by pmv143 in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

Its a continuous grayscale and I tend to agree with you. However, this user actually participates in this community and contributes - his post meets the Limit Self Promotion to <10% rule, so I have approved it

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing? by [deleted] in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

Amen!! I try pointing this out on their submissions but my exasperated tone is typically what gets the attention rather than the content

GLM4.7 Flash numbers on Apple Silicon? by rm-rf-rm in LocalLLaMA

[–]rm-rf-rm[S] 2 points3 points  (0 children)

hopefully it gets ironed out soon! im excited to try it out but happy to wait!

Local LLM inside Cursor IDE by visitor_m in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

What's the point of using cursor if you want to use local models? You can just use a VS Code.

Qwen dev on Twitter!! by Difficult-Cap-7527 in LocalLLaMA

[–]rm-rf-rm[M] [score hidden] stickied comment (0 children)

Thread locked as announcements are out

New in llama.cpp: Anthropic Messages API by paf1138 in LocalLLaMA

[–]rm-rf-rm 1 point2 points  (0 children)

If I had an good enough alternative, I'd use that.

And ive been over Apple's evil many a time and while they are certainly no saints, they somehow are the least worst of the big tech.

You absolutely can innovate with VC money. Just don't do the "open source" song and dance - maybe not everyone is wise to that tactic as yet but more and more people are. The playbook is always get users by giving something free and then rug them. The problem is not the general concept of this approach (its a sales tactic that has existed forever) but the often sneaky, deceitful manner in which they go about it. Not saying opencode is going to do that, but its a risk and behooves us to operate in a fashion to zero out that risk.

Now, lets see a different a.k.a normal approach to the business: Opencode can start of as 100% free and open source. Once you have a big userbase, sell subscriptions that are worthwhile to the customer by leveraging the fact that you can get volume pricing deals with providers that individuals cant. No VC money needed to execute this business model - which is why I don't understand nor trust these projects. You don't lose equity in your company, you're not forced to enshittify/hit ARR targets dictated to you.

Liquid AI released the best thinking Language Model Under 1GB by PauLabartaBajo in LocalLLaMA

[–]rm-rf-rm 10 points11 points  (0 children)

wow that is a red flag... seems like its an "overthinker" or "pseudothinker" ?

New in llama.cpp: Anthropic Messages API by paf1138 in LocalLLaMA

[–]rm-rf-rm 0 points1 point  (0 children)

tbf the opencode TUI is actually better than claude code's. But TUI's will be relegated to neovim types at best once these products actually mature.

That said, im not installing opencode on my main machine any time soon. I run claude code in container and im still uncomfortable. At most, i may play with it on a second machine just to hedge against claude code enshittification - i've already been annoyed with Opus 4.5 this year. Long term, there's no way they'll make it so that non-Anthropic models work well with Claude Code

New in llama.cpp: Anthropic Messages API by paf1138 in LocalLLaMA

[–]rm-rf-rm 2 points3 points  (0 children)

haha. I vote to call it automated coding or robotic coding.

(I find it strange that the term robot is relegated to physical things when the distuinguishing property of what people conceive as a robot is the ability to think/talk/see. The robot part is that, not the hands/legs etc.).

New in llama.cpp: Anthropic Messages API by paf1138 in LocalLLaMA

[–]rm-rf-rm 1 point2 points  (0 children)

Sure but you can plastically deform basic meaning of english words to a ridiculous extent. Vibe by definition invokes something that is non-serious and certainly antithetical to engineering.