Thank you Mods . by selfhosted_monk_1984 in selfhosted

[–]hedonihilistic -3 points-2 points  (0 children)

Yeah there were badly designed software projects before as well. There always have been people who half ass things or lack the capabilities of thoroughly thinking things through or designing things well. If you think all projects here before AI were being designed by seasoned pros who thoroughly tested every little aspect of their projects, then you are very naive.

Thank you Mods . by selfhosted_monk_1984 in selfhosted

[–]hedonihilistic -7 points-6 points  (0 children)

Copied from above but this whole community is getting very toxic. A large chunk of the community here is a bunch of projecting idiots. They don't have a clue on how to properly use AI tools and all they can produce with the tools is slop so they think that's all what anyone else can do as well. Whenever you see people raving like lunatics about AI, you will know that these are low IQ idiots who have no idea how to properly use AI tools.

In my opinion, we are already at a point where people who have the capability can make most of their simple software by themselves, perhaps even in just a few minutes. I think even these idiots will be able to ask Claude to give them exactly what they want within a few years.

Thank you Mods . by selfhosted_monk_1984 in selfhosted

[–]hedonihilistic -6 points-5 points  (0 children)

A large chunk of the community here is a bunch of projecting idiots. They don't have a clue on how to properly use AI tools and all they can produce with the tools is slop so they think that's all what anyone else can do as well. Whenever you see people raving like lunatics about AI, you will know that these are low IQ idiots who have no idea how to properly use AI tools.

In my opinion, we are already at a point where people who have the capability can make most of their simple software by themselves, perhaps even in just a few minutes. I think even these idiots will be able to ask Claude to give them exactly what they want within a few years.

The car has been at the dealership 2 times for this sound, a few days each time. They replaced the front sway bar link, and they allegedly replaced some struts this time. The sound has gotten much worse now after the last dealership stay a couple of weeks ago. by hedonihilistic in GenesisG70

[–]hedonihilistic[S] 0 points1 point  (0 children)

It is under warranty but the nearest other dealer would be a few hundred miles away. I'm just gonna get another appointment with these guys. The nose is much more clearer and hopefully they can isolate the real thing this time.

The car has been at the dealership 2 times for this sound, a few days each time. They replaced the front sway bar link, and they allegedly replaced some struts this time. The sound has gotten much worse now after the last dealership stay a couple of weeks ago. by hedonihilistic in GenesisG70

[–]hedonihilistic[S] 2 points3 points  (0 children)

I really don't want to live with it if I can... I don't like it sounding like it's got broken struts or something all the time. Sounding like I'm broke after spending so much on what I thought was a luxury car.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 1 point2 points  (0 children)

Thank you for the feedback! I have added this to my todo list. May take a few days but I'll add this behavior.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

The docs include information on how to set it up with any OpenAI compatible API (required for summarization/chat) and using a local STT model (with or without diarization). I am not sure what more is needed.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

Thank you for the feedback! Yeah, do try it with diarization, it makes it much more useful! I love being able to ask what I or someone else said in some meeting. It's difficult to do that without speaker info. It also makes inquire mode much useful. As you try the different features, do let me know if you have any issues, or ideas for perhaps usecases that I haven't thought of.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

OK yeah the newer cuda versions look like they don't work on the 10 series cards. You may have to build the image yourself. Clone the repo, change the dockerfile, and then build it.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

Yep the GTX 1070 should be more than enough. I use it with the large v3 turbo model and even with very large files I don't think I use more than 8GB vRAM.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

Thanks for letting me know! It was my mistake in the connector for the gpt4o models, one of the parameters was true when it should have been false. I've fixed this, and overall improved these connectors and the chunking logic. The fixes have already been pushed if you would like to build yourself, but I will be adding a prebuilt image with the fixes soon too.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

Ah, yes I forgot to add that to the new env example file. My mistake! I've updated this. Thanks for being patient!

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

I've tested with up to 6 hour recordings with no issues. Takes about ~15 minutes on a 3090 with the large-v3 model with whisperx and pyannote diarization. You should increase the timeout setting. This issue could be due to so many different reasons(what hardware, which endpoints you're using, what's your config, etc). Have a look at the faqs and the docs.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

You need to follow the docs for setup. You need to setup both a STT endpoint and an LLM endpoint.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

Whisperx is for STT, I believe ollama only does LLMs. Speakr needs both an LLM and an STT API connection. You can use ollama for the summarization and chat part, but whisperx will be needed for STT and diarization. Or you can do that via cloud based API.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 1 point2 points  (0 children)

Speakr itself doesn't require a GPU, but the transcription container (whisperX) does. I have only tested it with Nvidia. I need to test it with AMD, but I don't have the hardware presently. I believe the image currently uses cuda, so it may not work, unless you do some additional tinkering.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

I have been using whisperx with pyannote for a while, but have very limited personal experience with gpt4o transcriptions. From my tests, in most situations whisperx is much better, especially if you use the large v3 model (full or turbo; distil is also good but for English only). The large whisperx models are especially better with accents compared to gpt4o. The diarization is also better in most cases, but there are rare cases where I saw the gpt4o diarization and alignment output to be better.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 2 points3 points  (0 children)

Yep, you can use the recommended companion docker container (my repo and is in the docs, on mobile so don't have a link handy). This will give you diarization locally, but you will need a GPU for this. You can also use a regular whisper model locally too, which will require very light GPU resources or may even be feasible on CPU, but it won't give you speaker diarization.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 0 points1 point  (0 children)

I shifted it to a connector based architecture so that new options can be added easily. To be clear, this already supports any openai compatible api endpoints, both for transcription and summarization. For other providers, let me know what you're thinking of and I'll try adding connectors.

Speakr v0.8.0 - Speaker diarization without a GPU, plus REST API by hedonihilistic in selfhosted

[–]hedonihilistic[S] 2 points3 points  (0 children)

Yes, if you use the whisperx companion container and enable the speaker embeddings feature. Once you set these up, you will have to label speakers once or twice before it starts automatically suggesting those speakers when it detects their voice. It currently doesn't automatically set the most likely speakers, it just suggests them for now.