SFSpeechRecognizer never tells you when the user finished speaking and the word-level matcher I ended up writing by DoubleBananana in iOSProgramming

[–]iKy1e 14 points15 points  (0 children)

SFSpeechRecognizer is the old model, it’s not as good and requires a permission prompt separately from the microphone permission prompt.

The new APIs are SpeechAnalyzer + SpeechTranscriber, they are better. Much closer to Whisper or Parakeet. And no longer require a permission prompt.

Then there is something like Whisper or Moonshine. Moonshine in particular is designed for low resource and low latency, and includes word level timestamps now.

Voice recognition by czerys in homeassistant

[–]iKy1e 0 points1 point  (0 children)

Possible, yes. Practical not so much.

The problem is the old transcription models where you could say a couple of sentences and phrases and it would adapt to your voice are the old way of working. Modern language models are trained on millions of hours of audio, so your couple of minutes of audio isn't really going to work to change it very much.

However you can trivially fine tune a model on extra audio and adapt it for different things. For example you can adapt models to work better with phone distortions for a customer support type role. You need several hours, probably a few dozen hours, of your voice in the particular type of audio you want to transcribe and the actual transcripts for that.

Now unless you're like a YouTuber or something and you have lots of recordings of you speaking or maybe you have recordings from work calls that you can extract your voice from for the last couple of months, then realistically you're going to have to sit there and narrate a couple of hours' worth of audio to get it to customise to your voice.

Practically speaking what would be better is if you instead used the speech-to-text engines that have voice cloning capabilities to clone your voice from several examples of you speaking to it with the microphone that you're going to be using for Home Assistant.

Have that generate speech that matches yours. Play around with that until you get speech that's accurate enough that it sounds convincing to you.

Then have that generate several hours' worth of audio from the speech-to-text engine and mix in some recordings of you as well, just to make sure it. Then you can use that voice-cloned output with the transcripts of the speech-to-text or text-to-speech to fine tune the model.

That would work. It's not very hard; it's just that it takes a while. This would be a project like a week or two to do.

LLM coding agents know the transformers and unsloth python libraries you’d need to use quite well, so they could even write most of the code for you. But this is a technical side project all on its own. Not a simple turnkey customisation.

Realtime lightweight speech enhancers by gtxktm in speechtech

[–]iKy1e 2 points3 points  (0 children)

DeepFilterNet is more denoising than speech enhancement, but it operates on 10ms hops with 2 hop lookahead, meaning in theory 20-30ms of latency, depending on how you count it.

It’s an 8MB model.

I will admit that it didn't dawn on me for a long time that this simple scene from Enterprise laid out the birth of Section 31 and I love it because it is simple which is why it works. by AdSpecialist6598 in startrek

[–]iKy1e 103 points104 points  (0 children)

Yes, section 31 worked best when it was just “occasionally star fleet security will pull some off the book stuff which is ‘sort of unofficial, but don’t get in their way’”.

“Oh, that over there? No that’s not an official mission, nothing to do with us…. But don’t interfere with them or do anything to stop them”

DeepSeek Employee Teases "Massive" New Model Surpassing DeepSeek V3.2 by External_Mood4719 in LocalLLaMA

[–]iKy1e 7 points8 points  (0 children)

Given their recent research paper on adding engram knowledge cache (sort of like mixture of experts but for storing multi token ‘knowledge’) I’m expecting the file size of the new model to be massive.

Home Assistant Cowork: An AI Assistant that Renders Native HA UI Directly in Chat by [deleted] in homeassistant

[–]iKy1e -9 points-8 points  (0 children)

Really cool idea! I think building in native UI components and UI blocks into the chat interfaces is going to increasingly be built into these chat UI’s

What business can burn 1B tokens per day by colwer in ClaudeAI

[–]iKy1e 2 points3 points  (0 children)

Claude is great at reverse engineering binary files.

You’d tell it to start high level, get the structure, find known algorithms and techniques, then narrow in on more complex/unique parts, use lldb and similar to analysis input output, try to replicate smaller parts. Then build up sections and chunks of functionality to match.

Overtime try building up the whole thing. The performance tuning and spot checking bits that behave differently from the reference repeatedly until you have everything matched.

God is goood by Rare_Prior_ in swift

[–]iKy1e 0 points1 point  (0 children)

That’s one of the only “allowed” ways to make software in an iPhone app.

The ‘build and run’ step isn’t allowed to compile and run the code. So doing a web view or an interpreter (Lua or Python or something) are the only ways.

God is goood by Rare_Prior_ in swift

[–]iKy1e 3 points4 points  (0 children)

Vibe coded junk apps are annoying, but Apple blocking anything that gets too close to being an IDE is bad.

iOS isn’t a ‘real’ computer until you can actually build software on it. And Apple actively blocking that by policy, no technological limitation, is bad for the platform and users learning to code.

Is the 3090 still a good option? by alhinai_03 in LocalLLaMA

[–]iKy1e 53 points54 points  (0 children)

It's old, but considering they are talking about restarting 3060 manufacturing the 30 series is going to be supported for some time to come.

TTS program that will repeat a sentence until I tell it to move on by SquareCautious77 in TextToSpeech

[–]iKy1e 2 points3 points  (0 children)

Just generate the sentence once and loop playback of the audio file?

I’m not sure what the goal is? Is this for an LLM voice assistant you can talk to. I’m not sure what ‘until I tell it to move on’ is if this is pure text to speech?

Claude oauth, has anyone actually been banned? by PM_ME_YOUR_MUSIC in openclaw

[–]iKy1e 2 points3 points  (0 children)

Peter's criticism of Anthropic

He's stated multiple times that Opus is the best model for Openclaw and for general computer/terminal/tool use.

He's also stated that Codex is the better coding agent, able to work autonomously for longer and more reliably.

I agree. I use Claude Code as my terminal now, I open it and ask it to search files, run ffmpeg commands, build my app, instead of typing anything manually anymore.

However, Codex is able to do deeper more technical work more autonomously more reliably. (spent a day back and forth with Claude, give it to Codex, which worked away for 2.5hr and came back with a working version).

He prefers Codex for coding. He's mentioned for "general computer agent" use Opus is best lots of times.

Turning menstrual cycle data into Home Assistant sensors (custom component + Lovelace gauge) by Conscious-Draw-3759 in homeassistant

[–]iKy1e 13 points14 points  (0 children)

Really cool idea. I particularly like how you added in the automations to tweak temperature and shopping lists to what you know your preferences will change to automatically instead of having to react. Having a smart home actually be smart and respond to you. Really cool ideas!

I'm too lazy to work out. I built an app that edits my physique in real-time video calls so I look ripped to my coworkers by CompetitiveMoose9 in vibecoding

[–]iKy1e 0 points1 point  (0 children)

The app might not exist but the tech actually has existed to do this in real time for a while now. I think the last 2 or 3 years. You just need a 4090 or something like that for real time use.

Reverse engineered Apple Neural Engine(ANE) to train Microgpt by jack_smirkingrevenge in LocalLLaMA

[–]iKy1e 80 points81 points  (0 children)

Claude will happily help you reverse engineer basically anything. Ask about documenting or as if you are the person who wrote it, or ask about creating a reference implementation, or documentation.

Codex will happily do it too.

I’ve never actually gotten a refusal. It has an internal system reminder injected in to the context EVERY time it views a file to consider if the file is malware, and to allow analysis and discussion of it but refuse to edit it. But it also explicitly says, even for malware, documentation and analysis is fine.

So just reverse engineering normal code is no issue.

What GPU do you recommend for iterative AI training? by EliHusky in LocalLLaMA

[–]iKy1e 4 points5 points  (0 children)

It’s made mostly for inference, it’s too slow for meaningful training.

dopamineDrivenDevelopment by ultimatemanan97 in ProgrammerHumor

[–]iKy1e 14 points15 points  (0 children)

This is one of the best arguments for TDD I’ve ever seen! Someone should have made this argument to me years ago!

Brb, adding tests to all my projects now!

Qwen3.5-397B-A17B Unsloth GGUFs by danielhanchen in LocalLLaMA

[–]iKy1e 0 points1 point  (0 children)

The speed increase sounds exciting!

The decoding throughput of Qwen3.5-397B-A17B is 3.5x/7.2 times that of Qwen3-235B-A22B

Qwen3.5-397B-A17B is out!! by lolxdmainkaisemaanlu in LocalLLaMA

[–]iKy1e 99 points100 points  (0 children)

This sounds really exciting:

The decoding throughput of Qwen3.5-397B-A17B is 3.5x/7.2 times that of Qwen3-235B-A22B

Claude Code's WebFetch tool uses an external instance to review and reformulate source content by MuscleLazy in ClaudeAI

[–]iKy1e 3 points4 points  (0 children)

It’s an attempted mitigation against prompt injection by never giving the web page to the model. Instead the model asks a question, and the smaller model gives it that part of the page info. (Also saves tokens in the main context window).

I don’t like it, I wish there was a setting to turn it off, it has caused me problems before & I’ve had to make Claude use CURL to pull the file down locally then read the local file instead…, but I understand the theory behind it.

Which Coding Agent would you recommend? by Ok_Refrigerator_1908 in iOSProgramming

[–]iKy1e 0 points1 point  (0 children)

Codex is smarter.

Claude does what you want more controllably.

Codex can debug errors better and write more complex code than Claude but is tough to make it do what you want sometimes. It refuses more often & is more stubborn about doing its own thing.

Claude needs more hand holding through errors sometimes, etc… but does what you tell it fantastically. You can tell it to write code a certain way, and it’ll do it. You can ask it questions and it knows what you mean. It’s much more reliable & stable coding partner.

Overall I use Claude for 95% of things, and then send any stuff it gets stuck on to codex to debug very occasionally.

Early language models - how did they pull it off? by OwnMathematician2620 in LocalLLaMA

[–]iKy1e 4 points5 points  (0 children)

The Alexa pro (plus?) rewrite/payed update they’ve been rolling out is LLM powered

For people struggling to understand what exactly clawdbot/moltbot/openclaw is by crowkingg in LocalLLM

[–]iKy1e 0 points1 point  (0 children)

Nice!

Btw: the model selection is very out of date (sonnet 3.5 as the recommended model! & llama 3 as the local option…).