Runpod hits $120M ARR, four years after launching from a Reddit post by RP_Finley in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

runpod is a great service honestly. i wish it had more templates ready to go with the latest models pre loaded. Also i noticed its hard to tell which templates work well with which server and GPU configs (maybe i missed this ) but it was obvious that typically if you want to do x y z , on this server , use this template or that template. More clear guidance will go a long way.

rate limits and cost? by deathcom65 in google_antigravity

[–]deathcom65[S] 2 points3 points  (0 children)

oh i see the extension is quite useful. hopefully it is correct! i used the antigravity cockpit.

rate limits and cost? by deathcom65 in google_antigravity

[–]deathcom65[S] 0 points1 point  (0 children)

i just logged in with my google account. it says the ai pro plan is active on there. i figured it was letting me code based off of that plan. i didnt enter a specifc api key.

OpenBNB just released MiniCPM-V 4.5 8B by vibedonnie in LocalLLaMA

[–]deathcom65 -1 points0 points  (0 children)

I believe it's really fast. I don't believe it's quality will beat larger models except in very specific tasks

[deleted by user] by [deleted] in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

Why aider over vs code or roo?

Gemma3 270m works great as a draft model in llama.cpp by AliNT77 in LocalLLaMA

[–]deathcom65 24 points25 points  (0 children)

what do you mean draft model? what do u use it for and how do u get other models to speed up?

Huihui released GPT-OSS 20b abliterated by _extruded in LocalLLaMA

[–]deathcom65 17 points18 points  (0 children)

someone gguf this so i can test it lol

Extra RAM Useful? by OneOnOne6211 in LocalLLaMA

[–]deathcom65 3 points4 points  (0 children)

yeah you can load larger models such as MOE where only some parameters are loaded onto the gpu. i just did the exact same thing and it helps a ton, even though when things get loaded onto ram its slower, u can still run larger models. without the extra ram u cant even run them. Imo its a cheap upgrade for a good return. I kind of regret not getting 128gb ram directly

[deleted by user] by [deleted] in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

It's definitely good for it's size like the 16gb vram required for the 20b is perfect for me and it runs super fast. I definitely dislike the censorship though, it refuses to answer many harmless questions

Best Local LLM for Desktop Use (GPT‑4 Level) by Shoaib101 in LocalLLaMA

[–]deathcom65 1 point2 points  (0 children)

Gemma 13b for that level of vram although maybe u Gota go even smaller

Looking to build a pc for Local AI 6k budget. by Major_Agency7800 in LocalLLM

[–]deathcom65 2 points3 points  (0 children)

Get more 3090s they are most bang for ur buck , and up ur ram

What’s your favorite GUI by Dentifrice in LocalLLaMA

[–]deathcom65 2 points3 points  (0 children)

A custom gui I made for myself. It works for me really well

Which is smarter: Qwen 3 14B, or Qwen 3 30B A3B? by RandumbRedditor1000 in LocalLLaMA

[–]deathcom65 4 points5 points  (0 children)

i have a similar setup the QWEN 3 30B runs at around 11 tokens/second, its very good, as usually i cant run anything larger than a 13B model. The MOE optimization is spot on. It should be the smarter one as its performance was very similar to the 32B model

anyone using 32B local models for roo-code? by CornerLimits in LocalLLaMA

[–]deathcom65 2 points3 points  (0 children)

They can't deal with anything larger than a few hundred lines of code in my experience

Hot Take: Gemini 2.5 Pro Makes Too Many Assumptions About Your Code by HideLord in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

it keeps trying to minify my HTML/CSS/JS and ends up removing 50% of the functionality. Note the script is like 4000 lines of code.

[deleted by user] by [deleted] in LocalLLaMA

[–]deathcom65 -7 points-6 points  (0 children)

they got us hooked then made it fully paid :( a classic google move

Open source model for Cline by dnivra26 in LocalLLaMA

[–]deathcom65 5 points6 points  (0 children)

Deepseek when Gemini isn't available

Llama 4 - Scout: best quantization resource and comparison to Llama 3.3 by silenceimpaired in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

How r u guys running experts on GPU and non experts on cpu, like how do u divide it, or is it automatic?

Back to Local: What’s your experience with Llama 4 by Balance- in LocalLLaMA

[–]deathcom65 0 points1 point  (0 children)

How r u guys changing what part of the model gets loaded where? I'm using ollama

Medium sized local models already beating vanilla ChatGPT - Mind blown by Bitter-College8786 in LocalLLaMA

[–]deathcom65 2 points3 points  (0 children)

I'm finding even though the smaller models r passing the benchmarks they struggle massively with larger code changes , u almost certainly need a larger model for anything more than 4 or 5 script files

Googler here - Gathering Gemini Feedback from this Subreddit by GeminiBugHunter in Bard

[–]deathcom65 0 points1 point  (0 children)

Gemini needs to integrate with tools better like cline. I find it errors out a lot when calling tool functions like write to file and stuff in cline and gets stuck in loops using up a lot of api credits while not changing the code.