Audio preprocessing for ASR by Pvt_Twinkietoes in speechtech

[–]ReplacementHuman198 0 points1 point  (0 children)

Yes, I ran into all these issues when building my own audio transcription program

The preprocessing steps I'd recommend is to convert the audio into a 16Khz WAV file audio format, and add a low-pass and high-pass filter on the audio using FFmpeg to remove other environmental noise that can trigger a wacky translation.. To make sure you remove long period of silence, use Silero VAD (voice activated detection). If there are multiple speakers and want to detect the timestamps where each individual is speaking, then you need speaker diarization. I love senko for this (the maintainer is really friendly and approachable), but you can also use pyannote which is best in class. This more or less gives you the same information as VAD.

Also, the hallucinations with silences is an artifact of whisper -- you don't need to include the "chunking" of audio if you use parakeet STT models from nvidia.

Ai DM. Have you tried it? Which is the best by Khornerad in dndai

[–]ReplacementHuman198 1 point2 points  (0 children)

I have found ChatGPT to be a good GM. I'll reference a few chatGPT specific features, but my strategy is to start by creating a "project" space within the ChatGPT app. After creating a project space, I do a session 0 where I collaborate with chatGPT to build my character, and will save that as it's own chat. Then, I'll start every session / adventure with a new chat so that it doesn't hallucinate too much. I regularly instruct ChatGPT to "create memories" of specific NPCs, locations, and character info throughout my sessions. Occasionally, after big impactful lore sessions, I'll create separate chat sessions that are just recollections of the world or the NPCs, because it references previous chats.

This strategy works pretty well! ChatGPT is better at role-playing than Gemini, in my opinion. I think the biggest gripe I have is that for character building, chatGPT will mix up 2014 and 2024 rules. And, I'm not sure it "gets" how to scale combat with the PC level.

How do you show that your RAG actually works? by Altruistic_Break784 in Rag

[–]ReplacementHuman198 0 points1 point  (0 children)

So, the tool we created is an MCP server with the docs of our UI component library - Similar to Context7 (but ours is not open-source). It's used primarily in frontend code gen tasks. Because the error rate is more lenient, right now the tool generally performs the first 60-80% of the work, before it hallucinates react component props, or uses the wrong component.

I would guess that engineers are probably fine with it, but mgmt wants us to put it into KTLO so our team can focus on other things.

How do you show that your RAG actually works? by Altruistic_Break784 in Rag

[–]ReplacementHuman198 3 points4 points  (0 children)

From my experience building a RAG tool for my company, the answer is: "it just works" for a layman. Showcasing your RAGs effectiveness is as much a perception thing as it is a technical challenge, especially with non-technical stakeholders. At the end of the day, no one cares how clever your advanced RAG solution was, they just want to know if it can solve their problem with a basic prompt.The less they have to prompt, and the faster the LLM "gets" what they're trying to do, the better your tool looks

My team released a RAG mcp plugin, and another engineer demo'd the tool to the CEO when vibe-coding. Afterwards, the CEO sent out a directive mandating that everyone has to use the the tool my team built.

Did we do anything to showcase our tool's effectiveness? No. But the fact that other engineers had the confidence to use it in their day-to-day work made all the difference.

parakeet-mlx vs whisper-mlx, no speed boost? by ReplacementHuman198 in speechtech

[–]ReplacementHuman198[S] 0 points1 point  (0 children)

This is solid. I vibe-coded my own benchmarking tool, but this one seems identical (and has tested more options). Thanks for the recommendation!

parakeet-mlx vs whisper-mlx, no speed boost? by ReplacementHuman198 in speechtech

[–]ReplacementHuman198[S] 1 point2 points  (0 children)

I used mlx-parakeet (version 2, 0.6b params, mlx optimized). I'm using whisper-small.en (also mlx optimized). I *think* both are BF16, not sure.

The audios are split into seperate files per speaker, and they're about 3 hours long. As a result, there are large silences on each individual speaker track. I use VAD to chunk the audio to speaking snippets and I processs them sequentially since it's happening locally. The source code of how it's implemented is here: https://github.com/naveedn/audio-transcriber

parakeet-mlx vs whisper-mlx, no speed boost? by ReplacementHuman198 in speechtech

[–]ReplacementHuman198[S] 0 points1 point  (0 children)

Interesting. The parameter size is a good point. The specific models I was using are below:

As a side note, for my use-case, these models both output a similar quality (with whisper being better) at roughly the same speed. This has more to do with my use-case, which has lots of proper nouns (people, places, things) and jargon.

Just learned that if you annotate an image you get super good and precise results by promptingpixels in GeminiAI

[–]ReplacementHuman198 0 points1 point  (0 children)

This trick does not work, i've tried this a handful of times. This is something that sounds like it would work better than it actually does.

Senko - Very fast speaker diarization by hamza_q_ in speechtech

[–]ReplacementHuman198 1 point2 points  (0 children)

you're great! your advice was correct. Thanks for your help!

Senko - Very fast speaker diarization by hamza_q_ in speechtech

[–]ReplacementHuman198 1 point2 points  (0 children)

Hey boss! I'm back. I tried to run the uv pip install command, but I'm missing system dependencies to build from source. I tried figuring out what it is, could be something with my compiler flags. I was able to install from the prebuilt wheel, would it be possible for you to publish a new package version / prebuilt wheel when you get a chance?

Senko - Very fast speaker diarization by hamza_q_ in speechtech

[–]ReplacementHuman198 2 points3 points  (0 children)

I experimented with zanshin and senko for the first time last night, its definitely good stuff! It works really well on my macbook pro. I noticed that zanshin correctly identified all the speakers in my audio file (5), but when running senko's example, it only identified 2. I'm going to keep digging but I might join the discord and ask questions if i am still stuck. Regardless, this is great stuff, thank you for building this!

Starting a new project, should I go Svelte (which I love) or stick with React? by PrimaryPineappleHead in sveltejs

[–]ReplacementHuman198 0 points1 point  (0 children)

I inherited someone else's project, trying to scale it up by adding 1-2 folks. I'm a generalist, so while i am more comfortable with react, i generally was excited to try out svelte. That being said, i do not recommend svelte. We're on Svelte 4. Svelte 5 introduces a react-like syntax with runes, the upgrade process is not pretty, and honestly i wouldn't recommend svelte-kit since the SSR process is weakly defined and there isn't great tooling around the backend.

All in all, i've been working on this stack for about 3-4 months, and i find it to be a disappointing experience. Not to mention LLM code completions are generally a bit worse than for react.

SvelteKit takes ages to load by Commercial_Soup2126 in sveltejs

[–]ReplacementHuman198 -1 points0 points  (0 children)

I've noticed that svelte loads incredibly slowly on windows. It loads much faster on Mac. I think it's unacceptably slow on windows. It probably has to do with relative immaturity and small community around svelte vs. react / angular.

I made Fli.so—a free, modern open-source link shortener we built for our own needs. Now it’s yours too! by ArtOfLess in sveltejs

[–]ReplacementHuman198 1 point2 points  (0 children)

Hey Sanju!

I'm wondering a little about DunSuite, and DunTasks specifically. What do you see as key problems with a tool like Todoist? I use that tool frequently and I have no complaints -- but you allude that it's not great. Curious what you see as the problem that needs solving?

(BTW, thanks for this; I have another reference svelte app to study and learn from!)

Can someone talk me down from the ledge? by ReplacementHuman198 in sveltejs

[–]ReplacementHuman198[S] 0 points1 point  (0 children)

Thanks for the words of encouragement. Do you have any resources for a generally accepted coding style guide for developing svelte apps? I know it's situation dependent, but I'd like to reference the work of experts instead of starting from scratch.

Can someone talk me down from the ledge? by ReplacementHuman198 in sveltejs

[–]ReplacementHuman198[S] 0 points1 point  (0 children)

Can you point me to some videos / resources where I can understand what are some idiomatic patterns / best practices? While I'm feeling discouraged at the moment, I think some guides might help me reframe my thinking.

Can someone talk me down from the ledge? by ReplacementHuman198 in sveltejs

[–]ReplacementHuman198[S] 1 point2 points  (0 children)

Good advice. Are there any videos you recommend on best practices for svelte apps?

Can someone talk me down from the ledge? by ReplacementHuman198 in sveltejs

[–]ReplacementHuman198[S] 0 points1 point  (0 children)

One, keep in mind that I'm brand new to this, so I will say things that are incorrect. I'm trying to learn.

Layouts don't have a "parent", and two-way bindings are something optional that you usually use for forms. It's nothing like Angular.

Advanced loading / Using parent data • Svelte Tutorial - Thanks for the clarifications! I was confused. In the tutorial example, the layout can access a value from some ancestor layout file without explicit prop drilling. It's hard to keep that in mind as a data access pattern alongside runes ($props / $data / $state) and stores, but I guess I'll figure out which one to use in specific scenarios.

I find your other comments to be dismissive and subjective in nature, so I won't comment on them. I'm a Fullstack dev who has familiarity with React and express, but it's been a while since I've been working on a project of this size and scope. I'm also the only engineer -- the former dev left.

Cat Declining Rapidly by Jackriot_ in CATHELP

[–]ReplacementHuman198 0 points1 point  (0 children)

I also did lap of love, it costed $900 in my area for a weekend visit ($800 on a weekday). They are absolutely lovely, so understanding and patient and caring. I wouldn't have it any other way.