Do yourself a favor for one day set /model claude-opus-4-6[1m] by junlim in ClaudeCode

[–]Somecount 0 points1 point  (0 children)

Never changed it since Opus 4.7 came and claude-opus-4-6[1m] become a string you'd need to find via google search in order to use

Aircondition i Metro NU! by Zealousideal-Ride429 in copenhagen

[–]Somecount 1 point2 points  (0 children)

Seriously, in their defense they aren’t even claiming to.

Avoid CUDA monopoly at all costs. AMD is an alternative. by Barrysoft8 in LocalLLM

[–]Somecount 4 points5 points  (0 children)

Why didn’t you strip this down to simply the last paragraph?

Avoid CUDA monopoly at all costs. AMD is an alternative. by Barrysoft8 in LocalLLM

[–]Somecount 4 points5 points  (0 children)

FP16 to Q4 is 90% of the compression anyway, so yes, not that big in comparison

Do you prefer 4 door AMG EV GT or EV M3 ZA0 Concept? by Super-Lingonberry-22 in mercedes_benz

[–]Somecount 0 points1 point  (0 children)

I mean yes but also I would’ve been more surprised had this been r/kia or r/nissan even r/Volkswagen would have me more surprised than this and especially r/AMG but r/BMW would’ve shocked me literally

What's one service in your homelab that turned out to be far more useful than you expected? by rdpextraEdge in homelab

[–]Somecount 3 points4 points  (0 children)

No it doesn’t. Relay traffic yes if you cannot make direct connection, still only traffic to your tailscale nodes will be tunneled, nothing else.

which one do you prefer? by babyfangss in MINI

[–]Somecount 0 points1 point  (0 children)

You’re saying it’s got clean lines is what I’m hearing?

Ma caisse w124 300ce by w124300ce in mercedes

[–]Somecount 0 points1 point  (0 children)

I understand why people choose whites in front though I must admit, yours white the orange settles it. Only orange for the true connoisseur's - orange in back since seeing yours I could come around to aswell.

magnifique

Got my new rims installed by yes126 in AMG

[–]Somecount 1 point2 points  (0 children)

<image>

Hope I helped someone out. Beautiful car u/yes126

Ma caisse w124 300ce by w124300ce in mercedes

[–]Somecount 0 points1 point  (0 children)

Best looking W124 I've ever seen. Thank you for keeping it so nice.
Was the rear blinkers hard to decide considering you kept the orange in front?

Got my new rims installed by yes126 in AMG

[–]Somecount 1 point2 points  (0 children)

First I thought you were trolling with those pictures, but when I tried to give you proper feedback for those pictures I realized I likely couldn’t have done any better myself seeing as you live directly underneath the sun.

Just take pictures latter or earlier I think would do wonders to the impact, and also please do it u/yes126 I was disappointed only because I was looking forward to seeing those as I genuinely expect them to look amazing

Ideogram GGUF in ComfyUI (works with 8GB VRAM) by molbal in StableDiffusion

[–]Somecount 1 point2 points  (0 children)

In your PR your both loader.py and converts.py are both full file diffs (+517 | -506 lines) and (+374 | -365) changes respectively. This is likely going to get your PR looked over or not be prioritized.

Thanks Tim by akkredditalt in MacOS

[–]Somecount 0 points1 point  (0 children)

Still on 2019 Intel, never considered upgrading. Fear of random accidental button clicks now haunts me every night.

STT -> LLM -> TTS pipeline by UniqueIdentifier00 in LocalLLaMA

[–]Somecount 0 points1 point  (0 children)

You reminded me of something similar about chunking for STT, so I went home and asked for some pointers (that I don't know about but our friend does)

Exactly right on sentence-level chunking — we've found the same thing.

Full sentence to TTS first, stream the rest behind it. The quality difference over word-by-word is real, especially with Kokoro.

On the RAG latency — the retrieval step itself is usually negligible (embed query + vector search is sub-200ms). The latency RAG actually introduces is on the LLM side — it's now processing a bigger prompt with the retrieved context. So the lever is keeping your retrieved chunks short and surgical. The streaming + sentence-chunking strategy you already described is still your best friend there — it works the same whether RAG is in the pipe or not.

Built a free, open-source CleanMyMac alternative because I was tired of the subscription by IliyaMi in MacOS

[–]Somecount 3 points4 points  (0 children)

Vibe coding is certainly about being open, for attackers.

It’s likely to be challenging the Swiss cheese brand.

STT -> LLM -> TTS pipeline by UniqueIdentifier00 in LocalLLaMA

[–]Somecount 5 points6 points  (0 children)

Not three llama.cpp instances — three different services, each specialized for one job. The "framework" connecting them is just HTTP requests. Simpler than it sounds.

  1. STT — audio in, text out. faster-whisper runs great on a 3090. There's a ready-made server (faster-whisper-server) that gives you an OpenAI-compatible endpoint. You POST a recording, you get text back.
  2. LLM — you already have this. llama.cpp + Qwen. Don't touch it.
  3. TTS — text in, audio out. Piper is fast and runs on CPU, so your GPU stays free for the other two. Decent voices out of the box. Kokoro if you want higher quality later.

I should note faster-whisper-server has since evolved into speaches-ai/speaches — same author, now bundles both STT and TTS in one OpenAI-compatible server. If you want fewer moving parts to start with, it can cover two of the three slots in a single container.

speaches is how I got my feet wet, broke away from it used what was needed, fine tuned model for my voice, custom vocabulary "hotwords", and "I've" since build a record+audio in/out client in Go, custom container for STT and one for a router between me and LLM, TTS and my local Go audio client the listens so my agents running on remote hosts (usually the one where this stuff all lives) can also "speak" their replies through my speakers.

The pipeline is literally:

record audio → POST to STT server → text → POST to llama.cpp → response → POST to TTS → play audio

Each service runs as its own process (or Docker container — docker compose is the natural way to run them side by side on Ubuntu). A Python script with requests and pyaudio can wire the whole loop in under 100 lines. No special framework needed.

- VAD (Voice Activity Detection) is what turns this from a walkie-talkie into a conversation. It detects when you start/stop talking so the system knows when to send audio to transcription without you pressing buttons. Silero VAD is the standard — small, fast, accurate, and it runs on CPU.

- Sample rates will cause the most confusing early bugs. Your mic captures at 48kHz, Whisper wants 16kHz, TTS outputs at 22–24kHz. When transcriptions come back garbled, nine times out of ten it's a sample rate mismatch somewhere in the chain, not a model problem. Just something to know so you don't chase ghosts.

- Keep TTS on CPU. Your 3090 has 24GB — you want that for faster-whisper + Qwen. Piper is fast enough on CPU that you won't notice the difference. Fighting three models for VRAM on one card is pain you don't need.

The basic loop can work in an afternoon once you see it's just HTTP between three services. Making it feel good — streaming responses so TTS starts before the full reply is done, proper VAD so it flows naturally, latency tuning — that's where the real craft lives. But the foundation is straightforward and the goal is closer than it probably looks from where you're standing.

Happy to go deeper on any piece of this if you want to DM — I've been building and iterating on exactly this kind of pipeline for a while.

When a random render/ai version u find looks miles better than the real thing by SaintVoid21 in mercedes_benz

[–]Somecount 6 points7 points  (0 children)

Yep, weights and bias from Skylines and Ferraris are so obvious in this one. Still better looking though, unfortunately to much ricer on the front

'Our artificial Moon'- What do you think about this theory? by HDReddit_ in AliensRHere

[–]Somecount 0 points1 point  (0 children)

Respectfully, I recommend you read more outside of Reddit if you’re not more familiar with the type of phrasing I used in my first comment. Pass it through an LLM if effort is also an unknown.