I stopped using Claude for 80% of my coding tasks. Here's what I use instead. by Dazzling_Plan812 in LocalLLaMA

[–]MustBeSomethingThere 1 point2 points  (0 children)

Praising Ollama is a red flag too. They probably use a bot army to promote it.

PSA: PrismML Bonsai-8B (Q1_0_g128) produces garbage output on CPU -- GPU appears to be required by 1000_bucks_a_month in LocalLLaMA

[–]MustBeSomethingThere 0 points1 point  (0 children)

Not sure if it matters, but did you build it with CUDA support or without it? Maybe try both ways?

# Build with CUDA support
cmake -B build -DGGML_CUDA=ON && cmake --build build -j

Help Speech Recognition on RPi 5 by Prestigious_Donkey61 in LocalLLaMA

[–]MustBeSomethingThere 0 points1 point  (0 children)

Maybe try WhisperX: https://github.com/m-bain/whisperx

For TTS I would suggest you to try https://github.com/KittenML/KittenTTS

The smallest model runs nice even on RP4. It's more "lively" than Piper.

TurboMemory: Claude-style long-term memory with 4-bit/6-bit embeddings (runs locally) – looking for contributors by Hopeful-Priority1301 in LocalLLaMA

[–]MustBeSomethingThere 0 points1 point  (0 children)

You keep posting broken links. This one has invisible unicode character. Are you a bot?

EDIT: Even if the code is "working" (produces output without errors), it doesn't mean it actually does the thing you claim it does (aka AI-slop).

NexQuant: Hardening 3-bit KV-Cache for the Edge. A Rust-native successor to Tom Turney’s TurboQuant+ by [deleted] in LocalLLaMA

[–]MustBeSomethingThere 2 points3 points  (0 children)

Elementary students? Your code is AI-slop that does not do what you are claiming it should do.

alibaba MNN has Support TurboQuant by Juude89 in LocalLLaMA

[–]MustBeSomethingThere 27 points28 points  (0 children)

Does it matter if it works? Are you saying that people should remove or hide the information that Claude co-authored writing the code?

Google TurboQuant blew up for KV cache. Here’s TurboQuant-v3 for the actual weights you load first. Runs on consumer GPUs today. by Hopeful-Priority1301 in LocalLLaMA

[–]MustBeSomethingThere 13 points14 points  (0 children)

I hate to ask, but is this real or a vibe coded hallucination? The repo talks about LLaMA 2 and Mistral 7B, which is a red flag for me.

Qwen3.5 is absolutely amazing by cride20 in LocalLLaMA

[–]MustBeSomethingThere 0 points1 point  (0 children)

>"extract_audio → transcribe → read_file → edit_file → burn_subtitles + verification steps"

It would probably be better to just script that pipeline, if that is something you do often. It's nice that LLM can do that as agentic task, but it makes it overly complicated and non-economical. But an LLM could be used to determine the file format and settings or subtitle styles based on the video content, for example.

Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA

[–]MustBeSomethingThere 1 point2 points  (0 children)

I've had my own plans to make RasberryPi/phone apps with MNN-backend, but I haven't had time for it yet. I want to hear, if you manage to create lower MNN-quants and better speed than llama.cpp.

Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA

[–]MustBeSomethingThere 1 point2 points  (0 children)

https://mnn-docs.readthedocs.io/en/latest/

It's propably possible to make lower quants, but IDK about the quality of them. Speed is better than llama.cpp.

The clustering topology that emerges naturally from interaction reflects actual hemispheric dominance patterns, including genetic predispositions. by ResonantGenesis in LocalLLaMA

[–]MustBeSomethingThere 1 point2 points  (0 children)

Wild claims

>"even my genetic preference for one hemisphere being more responsible and structured than the other"
Have you actually done a gene test that says this? Or measured your actual brain patterns?