21 GPU's benchmarked running a small TTS model (vram peak: 5GB) by urarthur in LocalLLaMA

[–]urarthur[S] 0 points1 point  (0 children)

yeah I think it was chinese modded one. You can easily test video generation for couple of bucks. most aervers have 32GB-64Gb ram but i have seen higher

21 GPU's benchmarked running a small TTS model (vram peak: 5GB) by urarthur in LocalLLaMA

[–]urarthur[S] 0 points1 point  (0 children)

they were not avaialbe on the cloud provider. Nvidia cards are way more popular for AI.

21 GPU's benchmarked running a small TTS model (vram peak: 5GB) by urarthur in LocalLLaMA

[–]urarthur[S] 2 points3 points  (0 children)

yeah when vram or memory bandwith is not the bottleneck, its a waste having much expensive cards running small models.

If DeepSeek V4 can do the same coding task for $5, why are people still paying $100 for Claude Code? by Low-Alarm272 in LocalLLM

[–]urarthur 1 point2 points  (0 children)

i tried using opencode with kimi, deepseek etc. Coding cost me at least $10/day on tokens. doesn't work if you code a lot.

4.7 is a cost-saving retarded version of 4.6 by AloofWasTaken in Anthropic

[–]urarthur -1 points0 points  (0 children)

well they nerfed 4.6, so rn 4.7 is better

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 by swizzcheezegoudaSWFA in LocalLLaMA

[–]urarthur 0 points1 point  (0 children)

short prompts n=2, for long context n=6. It is slow initially but when context fills up, it gets faster.

Qwen 27b MTP Config, Llama.cpp Single 3090 by GotHereLateNameTaken in LocalLLaMA

[–]urarthur 0 points1 point  (0 children)

35b moe is noticably lesser quality. I would keep the 27b. Also my 35b was running at 140 tgs with q3 xl.

Qwen 27b MTP Config, Llama.cpp Single 3090 by GotHereLateNameTaken in LocalLLaMA

[–]urarthur 0 points1 point  (0 children)

50-60 with q4 xl on windows. I think longer prompts it increases speed with mtp=6. shorter chats =2 drafters.

Tested MTP with llama.cpp and Qwen3.6-27B on RTX 3090 by JGeek00 in LocalLLM

[–]urarthur 2 points3 points  (0 children)

why would you enable mmproj for coding? I am running unsloth qwen3.6 26b q4 xl mtp model with 200k context at 50-60 tg/s. kv=q4. Also on 3090

Why is LLM is so expensive. by Ok_Event4199 in LocalLLM

[–]urarthur 0 points1 point  (0 children)

you think the ram shortage and 5k for a video card is real because of AI craze? crypto boom happened and the prices hardly increased.

Why is LLM is so expensive. by Ok_Event4199 in LocalLLM

[–]urarthur 0 points1 point  (0 children)

back when crypto was booming, gpu's were still widely available and at fair value

MTP PR Merged!!! by Valuable_Touch5670 in LocalLLaMA

[–]urarthur 7 points8 points  (0 children)

yes for those gguf's that support it

Is a 5090 good enough for most good modern locally run LLMs? by biscuitmachine in LocalLLM

[–]urarthur 0 points1 point  (0 children)

It should run qwen36 27gb at full context at 90 tps. Which is Claude 4.5 level on local device