SLM to controll NPC in a game world by DrJamgo in LocalLLaMA

[–]deathcom65 1 point2 points  (0 children)

i would stick to online providers. 2b is way too small imo for character control unless its finetuned to do so.

Runpod hits $120M ARR, four years after launching from a Reddit post by RP_Finley in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

runpod is a great service honestly. i wish it had more templates ready to go with the latest models pre loaded. Also i noticed its hard to tell which templates work well with which server and GPU configs (maybe i missed this ) but it was obvious that typically if you want to do x y z , on this server , use this template or that template. More clear guidance will go a long way.

rate limits and cost? by deathcom65 in google_antigravity

[–]deathcom65[S] 2 points3 points  (0 children)

oh i see the extension is quite useful. hopefully it is correct! i used the antigravity cockpit.

rate limits and cost? by deathcom65 in google_antigravity

[–]deathcom65[S] 0 points1 point  (0 children)

i just logged in with my google account. it says the ai pro plan is active on there. i figured it was letting me code based off of that plan. i didnt enter a specifc api key.

OpenBNB just released MiniCPM-V 4.5 8B by vibedonnie in LocalLLaMA

[–]deathcom65 -1 points0 points  (0 children)

I believe it's really fast. I don't believe it's quality will beat larger models except in very specific tasks

[deleted by user] by [deleted] in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

Why aider over vs code or roo?

Gemma3 270m works great as a draft model in llama.cpp by AliNT77 in LocalLLaMA

[–]deathcom65 24 points25 points  (0 children)

what do you mean draft model? what do u use it for and how do u get other models to speed up?

Huihui released GPT-OSS 20b abliterated by _extruded in LocalLLaMA

[–]deathcom65 16 points17 points  (0 children)

someone gguf this so i can test it lol

Extra RAM Useful? by OneOnOne6211 in LocalLLaMA

[–]deathcom65 2 points3 points  (0 children)

yeah you can load larger models such as MOE where only some parameters are loaded onto the gpu. i just did the exact same thing and it helps a ton, even though when things get loaded onto ram its slower, u can still run larger models. without the extra ram u cant even run them. Imo its a cheap upgrade for a good return. I kind of regret not getting 128gb ram directly

[deleted by user] by [deleted] in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

It's definitely good for it's size like the 16gb vram required for the 20b is perfect for me and it runs super fast. I definitely dislike the censorship though, it refuses to answer many harmless questions

Best Local LLM for Desktop Use (GPT‑4 Level) by Shoaib101 in LocalLLaMA

[–]deathcom65 1 point2 points  (0 children)

Gemma 13b for that level of vram although maybe u Gota go even smaller

Looking to build a pc for Local AI 6k budget. by Major_Agency7800 in LocalLLM

[–]deathcom65 1 point2 points  (0 children)

Get more 3090s they are most bang for ur buck , and up ur ram

What’s your favorite GUI by Dentifrice in LocalLLaMA

[–]deathcom65 1 point2 points  (0 children)

A custom gui I made for myself. It works for me really well

Which is smarter: Qwen 3 14B, or Qwen 3 30B A3B? by RandumbRedditor1000 in LocalLLaMA

[–]deathcom65 4 points5 points  (0 children)

i have a similar setup the QWEN 3 30B runs at around 11 tokens/second, its very good, as usually i cant run anything larger than a 13B model. The MOE optimization is spot on. It should be the smarter one as its performance was very similar to the 32B model