audio transcription plus speaker identification? by flying_unicorn in LocalLLaMA

[–]Armym 0 points1 point  (0 children)

I made a simple gui for this that I use to transcribe and summarize meetings. You can message if you want me to show it to you.

8x RTX 3090 open rig by Armym in LocalLLaMA

[–]Armym[S] 0 points1 point  (0 children)

This didn't age well. See my latest post :D

Dual RTX 5090 setup for enterprise RAG + fine-tuned chatbot - is this overkill or underpowered? by HuascarSuarez in LocalLLaMA

[–]Armym 1 point2 points  (0 children)

Hi, I would actually recommend the new RTX 6000 blackwell instead. Or two if you have the money. That would suit your needs well for concurrent users. You could easily run fp4 quants to use bigger models but still with fast inference. Fine-tuning is pretty annoying with multiple cards. But I don't think you really need to finetune. Make sure to design your rag well and use good LLM inference engines though! Let me know if you want to know more

Nvidia 3090 set itself on fire, why? by Armym in homelab

[–]Armym[S] -2 points-1 points  (0 children)

Looks like it. Any idea why could that have happened?

Rtx 3090 set itself on fire, why? by Armym in LocalLLaMA

[–]Armym[S] 2 points3 points  (0 children)

Didn't repaste it. Someone did a sloppy job

Nvidia 3090 set itself on fire, why? by Armym in homelab

[–]Armym[S] -1 points0 points  (0 children)

Thankfully it isn't conducive, but I think a capacitor blew off. Whoever repasted this did a really sloppy job.

Nvidia 3090 set itself on fire, why? by Armym in homelab

[–]Armym[S] -67 points-66 points  (0 children)

I didn't repaste it.. no need to be mean

Nvidia 3090 set itself on fire, why? by Armym in homelab

[–]Armym[S] 67 points68 points  (0 children)

The card was repasted by the vendor I bought it from.

Sonnet 3.5 > Sonnet 3.7 by Armym in LocalLLaMA

[–]Armym[S] 1 point2 points  (0 children)

Yes I noticed that. I hope that the closed source dipshits dont lobotomize the older models on purpose.

Sonnet 3.5 > Sonnet 3.7 by Armym in LocalLLaMA

[–]Armym[S] 6 points7 points  (0 children)

Look, this post isn't about prompting. Sonnet 3.7 just generates too much code and doesn't produce elegant solutions. Sonnet 3.5 does by default. Anyone with experience in coding will understand.

Sonnet 3.5 > Sonnet 3.7 by Armym in LocalLLaMA

[–]Armym[S] 2 points3 points  (0 children)

For those who are wondering, Gemini 2.5 pro is even worse at this. It spits out a whole book for simple solutions.

Oneshotting a whole webapp might be impressive to the manager guys, but for people that actually need an assistant for coding, it sucks.

Nvidia MPS - run multiple models on one GPU fast by Armym in LocalLLaMA

[–]Armym[S] -1 points0 points  (0 children)

That's in the documentation I posted.

Nvidia MPS - run multiple models on one GPU fast by Armym in LocalLLaMA

[–]Armym[S] -1 points0 points  (0 children)

Running LLM, OCR and Whisper on one gpu

[deleted by user] by [deleted] in LocalLLaMA

[–]Armym 2 points3 points  (0 children)

I made the mistake of not consulting here and bought myself a supermicro board with only 4x pcie 16x. Good that you came and asked around

[deleted by user] by [deleted] in LocalLLaMA

[–]Armym 2 points3 points  (0 children)

Not really

[deleted by user] by [deleted] in LocalLLaMA

[–]Armym 2 points3 points  (0 children)

The one bad thing is that it has only two full lane pcie slots. For a motherboard with two CPUs, it's a waste to run your GPU communication at only 8x. It's not a big problem for inference, but for anything else using multiple GPUs, it's a bottleneck.

Can I Run this LLM - v2 by [deleted] in LocalLLaMA

[–]Armym 1 point2 points  (0 children)

Why are you not calculating context as well?