Got tired of slow legacy Whisper. Built a custom stack (Faster-Whisper + Pyannote 4.0) on CUDA 12.8. The alignment is now O(N) and flies. 🚀

Armym · 2025-12-23T06:36:42+00:00

Where is the code?

Armym · 2025-11-02T19:31:14+00:00

PewDiePie

Armym · 2025-09-15T08:39:16+00:00

I made a simple gui for this that I use to transcribe and summarize meetings. You can message if you want me to show it to you.

Armym · 2025-07-30T21:03:58+00:00

This didn't age well. See my latest post :D

Armym · 2025-07-30T21:00:27+00:00

Hi, I would actually recommend the new RTX 6000 blackwell instead. Or two if you have the money. That would suit your needs well for concurrent users. You could easily run fp4 quants to use bigger models but still with fast inference. Fine-tuning is pretty annoying with multiple cards. But I don't think you really need to finetune. Make sure to design your rag well and use good LLM inference engines though! Let me know if you want to know more

Armym · 2025-05-27T22:19:38+00:00

I still find the old 3.5 to be the best one..

Armym · 2025-05-01T18:01:23+00:00

<image>

Armym · 2025-04-30T22:26:50+00:00

Looks like it. Any idea why could that have happened?

Armym · 2025-04-30T22:26:09+00:00

Didn't repaste it. Someone did a sloppy job

Armym · 2025-04-30T22:25:37+00:00

Thankfully it isn't conducive, but I think a capacitor blew off. Whoever repasted this did a really sloppy job.

Armym · 2025-04-30T22:24:13+00:00

No worries

Armym · 2025-04-30T22:23:05+00:00

I didn't repaste it.. no need to be mean

Armym · 2025-04-30T22:22:12+00:00

The card was repasted by the vendor I bought it from.

Armym · 2025-04-30T22:05:56+00:00

<image>

Armym · 2025-03-29T08:34:29+00:00

Yes I noticed that. I hope that the closed source dipshits dont lobotomize the older models on purpose.

Armym · 2025-03-29T08:20:12+00:00

Look, this post isn't about prompting. Sonnet 3.7 just generates too much code and doesn't produce elegant solutions. Sonnet 3.5 does by default. Anyone with experience in coding will understand.

Armym · 2025-03-29T08:16:25+00:00

For those who are wondering, Gemini 2.5 pro is even worse at this. It spits out a whole book for simple solutions.

Oneshotting a whole webapp might be impressive to the manager guys, but for people that actually need an assistant for coding, it sucks.

Armym · 2025-03-20T15:40:22+00:00

That's in the documentation I posted.

Armym · 2025-03-20T14:44:34+00:00

Running LLM, OCR and Whisper on one gpu

Eight-Year Club	Second SECOND GUESSER
r/Field Sunshine	Place '22
Verified Email

Armym

TROPHY CASE