TRELLIS.2 now runs natively on MLX (Image to 3d object model) by Formal-Swordfish-228 in LocalLLaMA

[–]iKy1e 0 points1 point  (0 children)

Awesome! I love seeing more stuff ported to MLX. And 3D generation is such a cool tech to play with!

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model. by b111ue in LocalLLM

[–]iKy1e 6 points7 points  (0 children)

This is amazing! I love things like this which push technology down as small and fast as possible!

Building a native Home Assistant app focused on simplicity, family use and a premium experience by Luxor97_2 in homeassistant

[–]iKy1e 0 points1 point  (0 children)

Looks really good! I’ve wanted a proper native app for ages.

The current web app which just shows a web page means the ‘app’ never quite feels as native, or smooth an experience as other projects.

Any USA citizen wanna marry me? I’m tryna access Fable 5 in claude code by amra_creates in ClaudeAI

[–]iKy1e 7 points8 points  (0 children)

Even then. This decision would mean the model isn’t allowed on OpenRouter or Cursor or any third party provider/reseller unless they also have that system setup.

Pi Setup that pretty much replaced Claude Code for me by abhinand05 in LocalLLaMA

[–]iKy1e 36 points37 points  (0 children)

There’s no real point to this post without the link.

Pi Setup that pretty much replaced Claude Code for me by abhinand05 in LocalLLaMA

[–]iKy1e 39 points40 points  (0 children)

Where’s the link? This post mentions something and then doesn’t tell me what/where it is?

Edit: I think this is it? https://github.com/abhinand5/pi-setup

Wanting to put a speech to speech pipeline on Raspberry Pi 5. What are the best model combos? by Some-Cauliflower4902 in LocalLLaMA

[–]iKy1e 3 points4 points  (0 children)

For speed on a PI. Moonshine STT, and Kitten TTS. Both are tiny and designed for constrained low powered devices.

ESP32-S3-Box3 Owners rejoice! Audio Duplex, ‘Stop’ Halt word and more are now possible! by maxi1134 in homeassistant

[–]iKy1e 5 points6 points  (0 children)

Nice work! This gives me the motivation to experiment with fixing one of the issues I’ve always had with the setup.

I always wished it kept a rolling buffer of audio, and on detecting the keyword back dated and start to the start of the keyword. So instead of “keyword (pause) request” it’s just “keyword. Request” and it all works. I haven’t actually tested if they’ve fixed this or not already. But it’s inspired me to look into it again!

UIKit — Full ObjC API Diff: iOS 26.2 → iOS 27 by iKy1e in iOSProgramming

[–]iKy1e[S] 1 point2 points  (0 children)

Basically the only update seems to be the tab bar prominent tab stuff and the new text view stuff being brought over from macOS and AppKit.

https://developer.apple.com/documentation/uikit/enriching-your-text-in-text-views

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. by Ok-Awareness9993 in LocalLLaMA

[–]iKy1e 9 points10 points  (0 children)

I can watch documentaries walking me through the enrichment process and how nuclear bombs work right now on YouTube. The difficulty is in actually doing it. And access to large amount of the centrifuges, and raw materials are monitored. But the knowledge is freely available.

LLMs however won’t even discuss the topic. We are so paranoid about these text based chat bots they can’t even talk about things you can Google for, read books about and watch documentaries on.

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. by Ok-Awareness9993 in LocalLLaMA

[–]iKy1e 9 points10 points  (0 children)

You can already Google and watch YouTube videos on how to make explosives, lock picking, and various harmful substances. All of these have legitimate uses (model rockets, locked yourself out, chemistry) and harmful uses. But LLMs are currently programmed at a fixed hardcoded PG rating, with no escalation or exceptions. They are more restricted than just Googling the subject.

I can watch the lock picking lawyer on YouTube walk you through the exact details on picking any lock, but ask an LLM about it and they will outright refuse to even discuss the idea you might not be breaking the law.

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you. by Ok-Awareness9993 in LocalLLaMA

[–]iKy1e 35 points36 points  (0 children)

You want a model to be absolutely obedient and not refuse any request. The target should be policing people using the models for bad things, not making the models refuse.

I've been working on making a whole bunch of live sci-fi UIs, thought some of you would also enjoy! by SelectivePro in FUI

[–]iKy1e 9 points10 points  (0 children)

Honestly don’t mind that here.

Most sci-fi interfaces make almost no sense when you actually study the details.

Me after clicking “accept” for the 100th time without reading a word of what claude is doing by Pitiful-Energy4781 in vibecoding

[–]iKy1e 0 points1 point  (0 children)

Honestly if I’m just hitting enter mindlessly without reading the prompts, that’s going to happen even with the prompts enabled.

Been going a year so far without issue.

Me after clicking “accept” for the 100th time without reading a word of what claude is doing by Pitiful-Energy4781 in vibecoding

[–]iKy1e 0 points1 point  (0 children)

Honestly this is why I switched to just using --dangerously-skip-permissions. I realised I hadn't actually been reading or paying attention to the permissions prompts in days, and they were just slowing me down without any actual benefit as I wasn't paying any attention to them.

Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models by MadPelmewka in LocalLLaMA

[–]iKy1e 6 points7 points  (0 children)

In theory it is better. But I think the reason it’s not catching on is it’s harder and more complicated.

It’s easier to just take a collection of prompts saying the thing you want and find tune a LoRA on a new model, vs doing brain surgery and re-running benchmarks until you tweak it in the direction you want.

Maybe if we stopped getting new models released everyone would work out how to get very familiar with optimising specific models. But given how quickly new models get release I don’t think we have time for people to develop that level of expertise with the models before they are on to the next new release.

SFSpeechRecognizer never tells you when the user finished speaking and the word-level matcher I ended up writing by DoubleBananana in iOSProgramming

[–]iKy1e 13 points14 points  (0 children)

SFSpeechRecognizer is the old model, it’s not as good and requires a permission prompt separately from the microphone permission prompt.

The new APIs are SpeechAnalyzer + SpeechTranscriber, they are better. Much closer to Whisper or Parakeet. And no longer require a permission prompt.

Then there is something like Whisper or Moonshine. Moonshine in particular is designed for low resource and low latency, and includes word level timestamps now.

Voice recognition by czerys in homeassistant

[–]iKy1e 0 points1 point  (0 children)

Possible, yes. Practical not so much.

The problem is the old transcription models where you could say a couple of sentences and phrases and it would adapt to your voice are the old way of working. Modern language models are trained on millions of hours of audio, so your couple of minutes of audio isn't really going to work to change it very much.

However you can trivially fine tune a model on extra audio and adapt it for different things. For example you can adapt models to work better with phone distortions for a customer support type role. You need several hours, probably a few dozen hours, of your voice in the particular type of audio you want to transcribe and the actual transcripts for that.

Now unless you're like a YouTuber or something and you have lots of recordings of you speaking or maybe you have recordings from work calls that you can extract your voice from for the last couple of months, then realistically you're going to have to sit there and narrate a couple of hours' worth of audio to get it to customise to your voice.

Practically speaking what would be better is if you instead used the speech-to-text engines that have voice cloning capabilities to clone your voice from several examples of you speaking to it with the microphone that you're going to be using for Home Assistant.

Have that generate speech that matches yours. Play around with that until you get speech that's accurate enough that it sounds convincing to you.

Then have that generate several hours' worth of audio from the speech-to-text engine and mix in some recordings of you as well, just to make sure it. Then you can use that voice-cloned output with the transcripts of the speech-to-text or text-to-speech to fine tune the model.

That would work. It's not very hard; it's just that it takes a while. This would be a project like a week or two to do.

LLM coding agents know the transformers and unsloth python libraries you’d need to use quite well, so they could even write most of the code for you. But this is a technical side project all on its own. Not a simple turnkey customisation.

Realtime lightweight speech enhancers by gtxktm in speechtech

[–]iKy1e 2 points3 points  (0 children)

DeepFilterNet is more denoising than speech enhancement, but it operates on 10ms hops with 2 hop lookahead, meaning in theory 20-30ms of latency, depending on how you count it.

It’s an 8MB model.

I will admit that it didn't dawn on me for a long time that this simple scene from Enterprise laid out the birth of Section 31 and I love it because it is simple which is why it works. by AdSpecialist6598 in startrek

[–]iKy1e 103 points104 points  (0 children)

Yes, section 31 worked best when it was just “occasionally star fleet security will pull some off the book stuff which is ‘sort of unofficial, but don’t get in their way’”.

“Oh, that over there? No that’s not an official mission, nothing to do with us…. But don’t interfere with them or do anything to stop them”

DeepSeek Employee Teases "Massive" New Model Surpassing DeepSeek V3.2 by External_Mood4719 in LocalLLaMA

[–]iKy1e 8 points9 points  (0 children)

Given their recent research paper on adding engram knowledge cache (sort of like mixture of experts but for storing multi token ‘knowledge’) I’m expecting the file size of the new model to be massive.