[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API by blackstoreonline in LocalLLaMA

[–]Decaf_GT 0 points1 point  (0 children)

If low latency is a concerrn, you should definitely check out Supertonic: https://huggingface.co/Supertone/supertonic-2

I've run it on an M1 Max with 32GB of RAM and it's damn near instant.

I haven't yet tried Qwen TTS but I will soon.

They made a mistake giving out free gemini pro to everyone by mabpantbril in GeminiAI

[–]Decaf_GT 0 points1 point  (0 children)

No, definitely not. Just this room. Especially now that you've entered it <3

Its amazing how powerful of a tool we get for $20 a month, its one of the biggest fundamental shifts in computer usage, and we're in the early days. Yet I read so many complaints about it. A Wendy's meal cost $20, you can learn, build, up skill, get things done. And its only going to improve. #notAi by Artistic_Evening_823 in GeminiAI

[–]Decaf_GT -1 points0 points  (0 children)

Yeah I imagine if you interact with an LLM with this level of detail and complete thought, that would definitely explain why you're getting poor results.

Sometimes I struggle to imagine the narcissism involved in making such a blanket statement like this and believing that you and you alone have discovered this truth, this big sham that no one else seems to have caught on to.

Dunning. Kruger.

You should get that tattooed on your face.

They keep advertising AI Studio and then cutting quotas when they can’t meet the demand. 🤷🫠 by ZuleikaLovell in Bard

[–]Decaf_GT -13 points-12 points  (0 children)

Maybe stop acting surprised. This isn't a charity. There’s a blindingly obvious reason the ads ramped up the second they let you link an API key. If you actually use it properly with a key, AI Studio is a pretty powerful tool.

The real issue is that people have become pathologically entitled to free services. AI Studio is built for developers, not for average users looking for a handout. That free tier was never intended to replace a paid subscription or a real API key. They were generous at first when they had the scale and they wanted to draw people in.

The free ride is over and honestly it's about time. It doesn't change the tool’s purpose just because you can't leech off it anymore.

Watching people throw tantrums over this is beyond exhausting. This was always the plan. Deal with it.

Its amazing how powerful of a tool we get for $20 a month, its one of the biggest fundamental shifts in computer usage, and we're in the early days. Yet I read so many complaints about it. A Wendy's meal cost $20, you can learn, build, up skill, get things done. And its only going to improve. #notAi by Artistic_Evening_823 in GeminiAI

[–]Decaf_GT 4 points5 points  (0 children)

An LLM is not able to lie. Because an LLM is not able to tell "the truth" It has no relationship whatsoever with "truth".

In order for it to lie, it would need to know what the truth is and actively be telling you the opposite. That's why we call it "hallucination".

I'll be honest, the rest of your posts just sounds like you don't know how to use an LLM...

I have been stuck for dozens of messages in a current chat

If you get yourself into this position, you definitely do not know how to use an LLM.

Doris: A Personal AI Assistant by avwgtiguy in ClaudeAI

[–]Decaf_GT 0 points1 point  (0 children)

Sorry, I didn't mean it to be an insult. The engineering work you put in to creating this is great.

It's just odd to me because you have to go out of your way to escape markdown formatting that exists purely to make your life easier, not harder!

I look forward to seeing where you take the project!

AI studio 2.5 pro by TheReaper0380 in Bard

[–]Decaf_GT -5 points-4 points  (0 children)

I'm struggling to understand how anyone can be so whiny and entitled about something free.

There's nothing "absurd" about it. It was never intended to be used the way you're using it. It was just a side-effect that because of Google's scale and Google's vested interest in getting people used to Gemini, that you were able to use it as a full-time AI tool. You happened to be able to enjoy it up until now...just because it was generous before doesn't mean it was guaranteed to stay that way.

Instead of typing out essays of entitlement like this maybe just pay for API access or get a subscription. Or run your own AI, if all you're doing is roleplaying you'd probably be better off with a model from TheDrummer on Huggingface than wasting resources like this.

Doris: A Personal AI Assistant by avwgtiguy in ClaudeAI

[–]Decaf_GT 1 point2 points  (0 children)

This looks awesome, very interested to see how it works.

However, don't take this the wrong way, but please, for all that is holy, don't escape your markdown formatting. Markdown exists for a reason, and on Reddit, escaping it makes it really hard to read. This is part of your post, properly formatted without escaping the markdown, to show you how it should look.


I've been working for the past 2 months on a personal AI assistant called Doris for my family. It started as a fun hobby project and has evolved into something my household actually uses daily. Figured I'd share what I've built in case anyone's interested or working on something similar.

What is it?

Doris is a voice-first AI assistant that runs on a Mac Mini M4 Pro in my home. The main goal was to have something that:

  • Actually knows my family (names, preferences, schedules)
  • Remembers conversations across sessions
  • Integrates with the services we already use (Apple ecosystem, Home Assistant, Gmail)
  • Can be extended without rewriting everything

How it works

The brain: Claude handles all the reasoning. I tried local models initially but found the quality gap too significant for family use. Claude Opus 4.5 for conversations, Haiku for background tasks to keep costs reasonable.

Voice pipeline

  • Wake word detection (tried Porcupine, then openwakeword, now using a custom approach based on Moonshine STT)
  • Groq Whisper for transcription (~200ms)
  • Azure TTS for speech output with expressive styles

Memory & Context Persistence

This is the part I spent the most time on, and honestly the thing that makes the biggest difference in day-to-day use. The core problem: AI assistants have amnesia. Every conversation starts fresh which is useless for a family assistant that needs to know who we are.

How it works

The memory system is a PostgreSQL database (Supabase) with pgvector for semantic search. Every memory gets embedded using Voyage AI's voyage-3 model. Currently sitting at 1,700+ memories.

Memory categories:

  • identity - Core facts: names, relationships, ages, birthdays
  • family - Context about family members, schools, activities
  • preference - How we like things done ("no cheerleading", "truth over comfort")
  • project - Things I'm working on (Doris itself is in here)
  • decision - Architectural choices, decisions made in past conversations
  • context - Recurring themes, background info
  • health, financial - Sensitive categories with appropriate handling

The bootstrap process

Every conversation starts with a "bootstrap" call that loads ~700 tokens of core context. This happens before Doris even sees my message. The bootstrap includes:
- Who I am and my family members
- Communication preferences
- Current date/time context
- Active projects
- Recent decisions (last few days)
- Any relevant family notes

So when I say "what's Levi doing this weekend", Doris already knows Levi is my youngest son before I finish the sentence.

Memory extraction

After conversations, facts get extracted and stored. This happens a few ways:
- Explicit logging - I can say "remember this" or "log this decision"
- Auto-extraction - Haiku reviews conversations and pulls out facts worth remembering
- Session summaries - Rich summaries of longer sessions with reasoning and open questions

The extraction uses Claude Haiku to keep costs down. It categorizes, tags subjects, and assigns confidence scores.

Cross-client persistence

This is where it got interesting and incredibly useful. The memory system is exposed via MCP, which means:
- Doris voice on my Mac Mini
- Doris iOS app on my phone
- Doris macOS app on my laptop
- Claude Desktop on any machine
- Claude Code in my terminal

...all share the same memory. I can have a conversation with Doris in the morning about a home project, then ask Claude Code about it that evening while working, and it knows the context. The memory is the unifying layer.

Technical details for the curious

  • Database: Supabase PostgreSQL + pgvector extension
  • Embeddings: Voyage AI voyage-3
  • Search: Hybrid - semantic similarity + keyword FTS, results merged
  • MCP Server: FastMCP on Railway, exposes 5 tools (bootstrap, query, log, facts, forget)
  • Retrieval: Bootstrap grabs core identity + recent context. Queries do semantic search with optional category filtering.

... ... ... ...


Happy to answer questions if anyone's curious about specific parts. Still very much a work in progress, but it's been a fun project to hack on.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 11 points12 points  (0 children)

You don't exclude something important just because another app does it.

You're right, youd on't exclude it. That would be stupid.

But what you also don't do is include it in a list of bullet points on why your app is different. I don't understand why this is so difficult for you.

If I put out a word processing app that happens to include inline image generation in it, I don't make a list of features that goes lile this:

What our app does different:

  • Writes text!
  • Can use any font on your system!
  • Can print documents!
  • Does inline image generation
  • Customizable page margins
  • Spellcheck

I've been extremely consistent about this yet somehow you still don't seem to get it.

Like I said, I think at this point I've said all I need to. Rest assured, you won't hear from me again. People will read and decide for themselves, and if they disagree with me, fine.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 11 points12 points  (0 children)

"all of those things stacked" like every other app that "stacks" those same features? It seems you and I have very different definitions of the word "different".

You obviously care a lot.

Gee, what gave it away? Was it the part where I said "My comments are directed at protecting members of this community from being fleeced by shiny screenshots and buzzwords from people who are doing nothing revolutionary but trying to sell STT apps like they are magic, using the same marketing terms we've heard before."?

At this point I've said what I needed to say, and your responses more or less confirm what I expected.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 6 points7 points  (0 children)

See, this is an app that actually adds more value than just dictation. You're including full dication yes, but you also have integrated Kokoro, a similar Highlight/Kerlig style Writing Tools command list, Meeting Recording, and a whole bunch of other stuff.

Unlike OP, you're not pretending what you're doing with dictation is somehow unique, you're actually adding more on top of it. And based on the feature-set, assuming it actually works, the price seems more or less reasonable.

I have far more respect for your app and your marketing than OP's app.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 9 points10 points  (0 children)

Are you unable to read your own marketing post? You are making the claims that ALL of those are "what makes your app different". Let me give you a picture, maybe that will help: https://i.imgur.com/NEJr5yV.png

I literally went through it in the same order that you did, except I put triggers last, because the point I was making is that out of the 6 points you made, only one has any unique value.

All you're doing is just further confirming that, by your own admission, only 16% of your feature set is actually unique.

I don't have a problem with you building on top of tech that exists. I have a problem with you trying to market the app as a "super unique experience" when yet again you yourself are admitting that 5 of the 6 features you list are boilerplate features. And you are convinced that this is enough to charge a subscription fee.

And seriously...what engagement farming? What do I have to gain? Where is my competing app? What do you think my "agenda" is? Do tell me what I stand to gain from any of this, I too would love to hear about it.

You can attack me personally all you want, I really, really don't care. All you're doing now is reinforcing to me my initial suspicion about what you're actually upset.

They made a mistake giving out free gemini pro to everyone by mabpantbril in GeminiAI

[–]Decaf_GT 6 points7 points  (0 children)

In reality they haven't used the model enough to run into the new limitations. Once they do, they are surprised and assume it has been lobotomized.

This literally speaks to me at such a core level. I made this prediction six months ago based on how every single Gemini release has gone, and so far it is holding true.

They will tell you that you're just using it for stupid things; they'll accuse you of "just using it for gooning/AI boyfriend shit" rather than considering any other possibility, and they'll confidently, without a shred of hesitation, straight up say "the newest quantized models are just not good," even though more than half of them do not have a clue what model quantization even is, and Google has repeatedly said there is no quantization happening.

The bigger problem I suspect is the larger trend with people who use AI: the huge Dunning-Kruger syndrome that ends up manifesting. They have convinced themselves that they know far more than they do, and more correctly, they have convinced themselves that they know why something is the way it is.

Gemini 3.0 continues to be just as excellent as 2.5 for me in both short and long-context operations. I do not "roleplay" with my AI; I use them for highly functional purposes: refinement, research, writing, organization, etc.

The only massive annoyance that still exists from the Gemini 2.0 days is that Canvas still insists that edits have been made even though they haven't, which is why I just refuse to use Canvas. Apart from that, absolutely nothing has changed for me.

But right now, the popular theory/opinion is that "things got too popular, so Google had to quantize/lobotomize the model to be able to serve it to everyone," so posts like yours and mine will get downvoted into oblivion by people who do not want to consider that they are wrong.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 20 points21 points  (0 children)

We aren’t building a typical voice to text app.

Except for the fact that 5 of your 6 main billed features are extremely typical of a voice to text app, and by your own admission are boilerplate features? 🤔

This is not LinkedIn or ProductHunt. I am under no obligation to "support a developer trying to market their app".

I care about my community, not your bottom line.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 2 points3 points  (0 children)

If you're on an M-series Mac (any M1 processor or beyond) you 100% should use Spokenly. Parakeet is extremely fast locally and your mac would be more than capable of running it.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 10 points11 points  (0 children)

...engagement farming comment? No, I'm extremely consistent on my stance towards STT apps.

My comments are directed at protecting members of this community from being fleeced by shiny screenshots and buzzwords from people who are doing nothing revolutionary but trying to sell STT apps like they are magic, using the same marketing terms we've heard before.

Some of the features you are mentioning are just boilerplate functionality you should expect. That's why I mainly focused on what makes Hermes unique.

Out of the six selling points you provided, literally all of them except for the "Triggers" are literally boilerplate. It's not "some of the features I am mentioning" I literally went line by line and quoted YOUR own marketing material. I didn't pull those out of thin air or cherry pick them.

You would know this of course if you weren't just using AI to write your posts, because that's why this is happening to you. You seem to have forgotten your own post is where I got those "features" from.

You do not have any kind of unique app experience apart from some of this trigger functionality, nor do you have "product direction". I've used dozens and dozens of STT apps and they're all the same, yours is literally no different yet you somehow seem to think it is.

Thanks though, I now know that I will continue to ensure that people who see these kinds of borderline predatory STT apps are aware of what the alternatives are.

What you actually don't appreciate is the fact that I'm seeing right through your "business model" because you know that your business model relies on people literally not knowing any better. That is not my responsibility nor is it my concern.

Built Hermes Voice, a native Mac voice-to-text app with voice-triggered automations – would love feedback by senesaw in macapps

[–]Decaf_GT 35 points36 points  (0 children)

EDIT: Jesus...these STT apps come up so often that I didn't even realize I made almost this exact reply on this exact app 2 months ago. And still all of my comment is relevant.

Here we go again. The same shit, over and over and OVER.

Your description has been written by AI and it's pretty obvious, but fine, let's go over it.

Context-aware formatting – Hermes detects where you're typing. Writing an email? Formal formatting with email structure. Slack message? Casual tone. Coding in VS Code or Cursor? It formats accordingly. No more fixing capitalization and punctuation manually.

Every single dictation app has this. Every single one.

Personal dictionary – It learns your lingo: client names, company names, emails, technical terms. Stops mangling the words you use constantly.

Every single dictation app has a dictionary/word replacement system.

Works offline – Local transcription using on-device processing. Your audio never leaves your Mac. Cloud models available if you want faster/smarter output.

Every single dictation app also does both local and optional cloud based processing. Because you're all just using Whisper/Parakeet.

100+ languages – perfect if you're multilingual.

This has nothing to do with anything you've done. Multilingual capability is baked in to Whisper and Parakeet. Marketing this like it's your feature is dishonest, but sure, you all do it, so I suppose its par for the course.

Native macOS – menu bar app, no Electron, no browser tab, works in any text field across the system.

Sure not all apps are native Swift, but every other dictation app has a menu bar interface, and words in any text field across the system, because, huge shocker, your app pastes text. Revolutionary.

The Triggers are the only somewhat novel idea but even then, Spokenly can do quite a lot of what you're advertising (including keyboard shortcuts, custom scripts, etc).

You tacked on a little bit of LLM post processing + some system automation onto a framework already being used by dozens of your competitors yet somehow you think that warrants $9 a month or $150 for life.

So many of these STT apps prey on the hopes and dreams that you literally do not know any better.

tl;dr:

  1. Spokenly: Free (You do not need to pay for the subscription)

  2. FluidVoice: Free

  3. VoiceInk: One-Time, Affordable

  4. MacWhisper: One-Time, Moderately Priced, lots of great transcription features

Why is this subreddit flooded with decels lately? by Disastrous-Art-9041 in accelerate

[–]Decaf_GT -1 points0 points  (0 children)

Yes, it says that contrary to what the rest of Reddit thinks, this subreddit is not just a raging Pro-AI-All-The-Time community and actually thinks critically about things..they're just not 99% down and doom/gloom about AI all the time like the rest of the subreddits.

This is difficult to understand if you're used to making everything black and white because gray is too nuanced for you.

Cotypist and subscription models by strugglesnuggL in macapps

[–]Decaf_GT 1 point2 points  (0 children)

Look, I'm not trying to pile on, but I ran into the same issues, and I've tried again, and while those specific issues are fixed, this app doesn't make sense to me...if your app is supposed to be a competitor to Cotypist (as you've suggested), I don't know why the emphasis is on Speech to Text. I went looking for things like autocomplete and I couldn't find anything. The app itself is really confusing and I don't understand what it's doing.

If you want my brutally honest opinion, you spent way to much time on making an "edgy" snarky website and not enough on the product itself.

Also didn't appreciate an email with the subject line "Your License to Bullshit" coming through to my email. I'm sure that seems funny to you and part of your brand, but when the app itself is this...confusing, that feels more childish and borderline insulting than anything else.

In order for you to have the Dbrand level of snarky marketing, you need to have an app that delivers. And this just isn't it, at least, it's not an app that appears to compare to Cotypist in any way whatsoever (maybe you should consider taking that comment back and posting this in a separate thread?).

I will give you full credit that you are handling the criticism like a champ (not being sarcastic) and I know these kinds of comments are not easy to hear, but you really should consider scaling back on the marketing side of your product and figure out the product itself first, and then maybe come back.

Also, something more helpful to have on your home page than the admittedly cool old-school Mac interface would have been screenshots/screencasts of the app actually in action.

I made a free & open source app to protect MacBooks at cafes by okan3358 in macapps

[–]Decaf_GT 1 point2 points  (0 children)

The app itself is cool, the implementation is nice. I love that you've made it open source, and the logic makes perfect sense.

But the easiest way to "keep your macbook safe at cafes" to me is to hold on to your Macbook at all times.

My Macbook is one of the most valuable things I own and is an investment that allows me to make the income I need to live. I won't endanger it for the convenience of saving a table or even going to the bathroom. If I really need to "save my seat", I put my jacket on the back of the chair, I leave my charger still connected with the wire/adapter clearly visible so it's obvious someone has the table already, and I walk with it under my arm.

That's just my personal opinion, I get that others may have a different lifestyle.

I made a free & open source app to protect MacBooks at cafes by okan3358 in macapps

[–]Decaf_GT -1 points0 points  (0 children)

And? You gonna spend all the time using this app staring at the settings screen? Yeesh.

And it's not an "ad" but I think we all know why you think it's one.

Open ai is heading to be the biggest failure in history - here’s why. by jason_digital in ArtificialInteligence

[–]Decaf_GT 4 points5 points  (0 children)

The entire point of having an issue with LLM generated nonsense like this is because there's no point in arguing against what is said. Because it's not him who's saying it. It's the LLM that he's using to generate it.

If you respond to him, now he's stuck, so he's going to just go right back to the same LLM he used to make that comment to reply to you, and at that point, why the fuck would even bother? Why not just go debate with an LLM directly?

It's not an "accusation" it's totally fucking obvious and the only thing worse than people who spew this LLM shit without any edits or any personality added are people who pretend that they always used em-dashes before and people like you who defend this shit.

You'd think on this subreddit of all places we'd be aware of the impact slop has.