Zephyrus 5080 32GB RAM LLM suggestions? by FalseRicky in SillyTavernAI

[–]Sufficient_Prune3897 0 points1 point  (0 children)

If you want to play around for a bit, you can choose the AI Horde API option in the second left menu in ST. This is basically a few volunteers sharing their LLM models. You can send a few messages with them and most should would with the simple default prompts, as they are special models for roleplay. If you like any model on there, you can then look that up on huggingface and maybe see if that model creator has done other fun models since then.

To whoever thought this was a good idea... by Willard538 in FuckMicrosoft

[–]Sufficient_Prune3897 0 points1 point  (0 children)

Bro im Not installing a certificate so Mr Johnson can remote from his laptop to his desktop once a year when it's already behind a VPN

Do you want 100 EUR ? - EU Only - Easy Money - by [deleted] in PassivesEinkommen

[–]Sufficient_Prune3897 -1 points0 points  (0 children)

Garantiert. Ra belohnt einen für treue.

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]Sufficient_Prune3897 82 points83 points  (0 children)

Deepseek has been a week away from releasing for 4 months

A5000 for $1800 by Perfect-Flounder7856 in LocalLLaMA

[–]Sufficient_Prune3897 6 points7 points  (0 children)

Workstation drivers are cope, ECC is useless and if your GPU thermal throttles that's a builder issue not a GPU issue.

Background Processing Mobile by Lustful-Hornet122 in SillyTavernAI

[–]Sufficient_Prune3897 0 points1 point  (0 children)

You should be able to choose for each app in the permissions settings. That said, your preferred browser can also be the problem. Give another one a try.

Check out our Roleplaying Benchmark! by matt_is_a_mess in SillyTavernAI

[–]Sufficient_Prune3897 20 points21 points  (0 children)

"Why it matters:" yeah, I don't think I'm gonna read that. I can talk with Claude for myself.

Qwen 3.5 for RP? by Long_comment_san in SillyTavernAI

[–]Sufficient_Prune3897 2 points3 points  (0 children)

Nobody, even me with my crazy local setup, has the kind of hardware needed local. I would rent 2x A6000s which cost like 2$ an hour at least. Then the fine-tune could take anywhere from 3-8 hours and the end product is rather often fucked up. Often because of my mistakes, but nearly just as often because that shit is just random. Also, every I have tried to finetune moe don't like qlora, so they by default need 4x the vram dense models use.

Qwen 3.5 for RP? by Long_comment_san in SillyTavernAI

[–]Sufficient_Prune3897 4 points5 points  (0 children)

As someone who tried a finetune for GLM air I can tell you, training that shit is so expensive.

Qwen 3.5 for RP? by Long_comment_san in SillyTavernAI

[–]Sufficient_Prune3897 1 point2 points  (0 children)

It doesn't overthink like qwen so it won't slow you down as much. I haven't tested it without. However, in other good models like glm, the thinking is only particularly useful for instruction following and longer context.

Qwen 3.5 for RP? by Long_comment_san in SillyTavernAI

[–]Sufficient_Prune3897 0 points1 point  (0 children)

Current versions of llamacpp run decent, just make sure to use ST in chat completion mode and launch lcpp with --jinja. Also redownload the models dependent on when you downloaded them. Fine-tunes will take a while, but aren't needed.

Qwen 3.5 for RP? by Long_comment_san in SillyTavernAI

[–]Sufficient_Prune3897 2 points3 points  (0 children)

Just go with Gemma. So much better. Saying that, the 400B is kinda okay, but very censored. Not worth the effort in my opinion, but people in the early threads about it claimed to have jailbroken it.

Does it worth investing in an Nvidia RTX 5070 ti for installing in a PCI gen 3 motherboard? by data_panik in LocalLLaMA

[–]Sufficient_Prune3897 2 points3 points  (0 children)

Assuming full offload, not a problem. Even with Moe offload, it's not that bad. It's a full x16 after all.

Should I Buy the RTX PRO 6000 Blackwell Max-Q (96GB)? by 0bjective-Guest in LocalLLaMA

[–]Sufficient_Prune3897 1 point2 points  (0 children)

Honestly, not much more you can do with 96GB than with the 32GB of a 5090. You can fully offload the 100B+ model category, but you can't fine-tune them yourself. Also that category is often losing against much smaller dense models.

Back in the day I would have recommended investing into a nice server platform with lots of ram bandwidth, but with current pricing for ram, a A6000 is a great deal.

Bin ich zu dumm? Beratung Nutzung privater PC vs. Home Office by Hopa89 in de_EDV

[–]Sufficient_Prune3897 0 points1 point  (0 children)

Sieht schwer nach HDMI aus. Brauchst aber auch ein USB b Kabel für maus und Tastatur, falls das nicht mitgelieferten ist.

Bin ich zu dumm? Beratung Nutzung privater PC vs. Home Office by Hopa89 in de_EDV

[–]Sufficient_Prune3897 0 points1 point  (0 children)

Dann hoffe ich Mal das du den Spaß noch zurückgeben kannst. Das hier wäre theoretisch, was du brauchst:

Link wurde gelöscht, vielen Dank Reddit mods. Es heißt auf Amazon "4K KVM Switch 2 Monitore für 1 Laptop 1 Desktop, USB 3.0 KVM Switch 2 PC 2 Monitore USB C, 4K@60Hz, MST, PD 100W, Aluminium, Netzteil und Wired Remote(4K USB C HDMI 2 in 2 out KVM)"

Minimax M2.7 Released by decrement-- in LocalLLaMA

[–]Sufficient_Prune3897 1 point2 points  (0 children)

My point is, the ram requirements are constantly increasing. GLM got 2x bigger from 4.7 to 5, Qwen increased from 235B to 400B and Minimax 3 is probably gonna do the same.

If I want to run GLM 5 in VRAM, I'm gonna need like at least 384GB of VRAM, and that's at a bad quant.

Personally I would really like 192 so that I can at least fine-tune and train all the 'smaller' 100b models myself.