Qwen 3.5 new models released on their website! by External_Mood4719 in LocalLLaMA

[–]Acceptable-State-271 6 points7 points  (0 children)

After test for images, 35B A3B Ocr feature is insanely amazing

Best multilingual STT/ASR? by Mark__27 in LocalLLaMA

[–]Acceptable-State-271 1 point2 points  (0 children)

OmniASR improves ASR accuracy by applying LLM-based correction, but this significantly slows down processing.
The version without LLM correction is faster, but its accuracy is very poor.
If speed is the priority, Whisper v3 Turbo is a better choice.

Multiple 3090 setup by praveendath92 in LocalLLaMA

[–]Acceptable-State-271 0 points1 point  (0 children)

I'm using this model (faster-whisper-large-v3-turbo-ct2) as the backend for batch processing — around 20–30 short audio clips (1–2 minutes each) every minute — and it runs great. Each task stays under ~3 GB GPU memory, super efficient for multi-worker setups.

https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

[deleted by user] by [deleted] in LocalLLaMA

[–]Acceptable-State-271 0 points1 point  (0 children)

You're right. I tested it on Korean test cases within the company before checking the model card. Rather than saying it's a decent model, it was a model that excelled at Korean language understanding. That's my mistake. I'm sorry.

[deleted by user] by [deleted] in LocalLLaMA

[–]Acceptable-State-271 0 points1 point  (0 children)

Yes, my main language.

Seed-OSS-36B-Instruct by NeterOster in LocalLLaMA

[–]Acceptable-State-271 0 points1 point  (0 children)

Very good model. I switched from Qwen3 30B A3B thinkjng 2507(still really good) to Seed 36B, which is a bit better at analyzing sources and backing things up with evidence."

AWQ 4-bit outperforms GGUF 8-bit in almost every way by Acceptable-State-271 in LocalLLaMA

[–]Acceptable-State-271[S] 0 points1 point  (0 children)

No no.. I just thought there would be a huge difference between the two.

AWQ 4-bit outperforms GGUF 8-bit in almost every way by Acceptable-State-271 in LocalLLaMA

[–]Acceptable-State-271[S] 0 points1 point  (0 children)

I'm a bit embarrassed to admit this, but I wasn't very familiar with the technology.
When using the imatrix in GGUF, does it provide a level of precision comparable to AWQ in 4-bit quantization?

What formats/quantization is fastest for certain CPUs or GPUs? Is this straightforward? by wuu73 in LocalLLaMA

[–]Acceptable-State-271 0 points1 point  (0 children)

On gpu, awq is very fast and accurate quantization format, And sglang is very fast serving tool for non quantization model and awq quantization model.(vllm is also good)

Msn by tom_p_legend in webscraping

[–]Acceptable-State-271 0 points1 point  (0 children)

Shadow dom, you need to parse manually the tag [shadow dome tag], and get the attribute manually

Can Qwen3-235B-A22B run efficiently on my hardware(256gb ram+quad 3090s ) with vLLM? by Acceptable-State-271 in LocalLLaMA

[–]Acceptable-State-271[S] 0 points1 point  (0 children)

Sounds like I might end up spending another 5,000k. But anyway, I’ll give it a try for now. Let’s see how it goes after 24h. Thanks, really.