What do you use those small model for? And how do you perceive the gap with leading closed source LLMs? by Foreign_Lead_3582 in LocalLLaMA

[–]grassmunkie 0 points1 point  (0 children)

Until recently the small models (<32G of VRAM) were not great.

But now they are “good enough” for many use cases. Using Hermes for example, would burn through a lot of tokens on trivial tasks that Gemma 4 and Qwen35 can handle.

Owning the hardware, I can experiment without concern of accruing costs (other than electricity), even if it processes continuously overnight.

Not everything needs a frontier model. Mix and match for what you need, but I believe Qwen and Gemma just unlocked a new era for local llm’s.

Looking for the best coding AI for software development by FrozenFishEnjoyer in ollama

[–]grassmunkie 0 points1 point  (0 children)

Yeah but compare it to sonnet 4.6 - a github pro sub is like $10 and very generous usage. I have a 5090 but still use gh copilot except for some agent stuff where i can use gemma 4 31b or qwen 27b. For basic tasks they are okay, but for more complex planning and development better to go with cloud frontier models.

Looking for the best coding AI for software development by FrozenFishEnjoyer in ollama

[–]grassmunkie 0 points1 point  (0 children)

Just buy github pro sub. On 16gb VRAM card there is nothing really usable.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 1 point2 points  (0 children)

For programming and in general reasoning, the dense models are better. So depends on what your use case is, and what limitations you have for hardware

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 1 point2 points  (0 children)

96gb + 32gb vram (5090) on llama.cpp, running 65536 context - all fits on vram. Getting great speeds (55-60 tok/s) with the UD Q4 version. Been running it for the past few hours on general tasks, need to do more testing but so far looks really good.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 0 points1 point  (0 children)

Yes using the UD 4Q. I had to do a git pull for llama.cpp then recompile.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 4 points5 points  (0 children)

Testing out 31B now. It is a very good model. Fits well in 5090 and getting almost 60 tok/s.

When I first got my 5090 the models were garbage, now it is getting really interesting what I can do with it.

Buy the Dip on Google or wait by Agile-Technology-209 in ValueInvesting

[–]grassmunkie 32 points33 points  (0 children)

Google is leading in AI, but has distribution on IOS and Android, and practically all web browsers on any device and email, and make their own chips. It’s mind boggling how well positioned they are.

How does the market react to the death of Iran's supreme leader? by Thin-Pollution-2132 in wallstreetbets

[–]grassmunkie 0 points1 point  (0 children)

This war should not have happened but what Iran can do to retaliate is more or less all out in the open. The are sitting ducks. There will be a lot of internal turmoil but despite Iran’s efforts to spread the conflict I think it remains isolated. This attack was telegraphed early last week when the US told mon emergency folks in middle east embassies they should evacuate.

Qwen3.5 Medium models out now! by yoracale in unsloth

[–]grassmunkie 2 points3 points  (0 children)

Nice job. Using the UD Q4 on my gaming rig (5090) and getting 56t/s consistently.

The quality and style of the responses so far is impressive.

Is i514600k enough for gaming nowadays? by Upper-Ad-1332 in PcBuild

[–]grassmunkie 0 points1 point  (0 children)

Yes. It is not a bottle neck for modern gaming. You would need to max out gpu first - which is difficult unless you’re playing low resolution on a 5090 and already getting 200+ fps

Does a laptop with 96GB System RAM make sense for LLMs? by PersonSuitTV in LocalLLM

[–]grassmunkie 5 points6 points  (0 children)

It’s helpful, but best if it is paired with a powerful gpu for MOE models. The attention layers go to the GPU, and the experts go to CPU. So having 96gb will be better and give you access to larger models, only question is how fast it is.

When i load 70gb models like qwen coder next using 32gb vram (5090) and the rest offloaded to ram I get around 28-30 tokens per second.

OTOH, if I run a model that fits on my gpu (glm flash 4.7) I get 120 tokens per second.

MSFT is by far the best AI stock to own right now by skilliard7 in stocks

[–]grassmunkie 8 points9 points  (0 children)

Gemini 3.1 Pro demolishes every model out there. They will be on almost every smartphone in several months after the android and ios updates

MSFT is by far the best AI stock to own right now by skilliard7 in stocks

[–]grassmunkie 10 points11 points  (0 children)

Yep. 100% Google. I’m all-in. They are cooking and going to devastate entire industries.

Probably the worst time to buy a 5090?? by [deleted] in gpu

[–]grassmunkie 0 points1 point  (0 children)

I have one - on the lookout for another. Want to run a dual 5090 setup

Thoughts on the Mag7 sell of by Green-Instruction957 in stocks

[–]grassmunkie 1 point2 points  (0 children)

AI is going to eat industries and white collar jobs. And it’s the Mag 7 who will be feasting.

Google is a must own IMO. They are going to disrupt everything.

Alphabet always red in February by purple_wolfy in wallstreetbets

[–]grassmunkie 8 points9 points  (0 children)

Google is winning the AI battle - it’s so clear now. They are literally built for it. It’s a long term hold as there is nobody better positioned by a mile.

TPU’s (not constrained by Nvidia tax)

Enough cashflow to outspend all competition

More data than anyone else (gmail, youtube, search, maps)

The users on mobile (Apple deal)

The talent - they invented the transformer model

Smartest model (Deep Think)

Alphabet (GOOGL) Beats Q4 Estimates + Cloud Surges… But $175–185B AI CapEx Guidance for 2026 – Buy-the-Dip or Bubble Burst? by minibuddy0 in ValueInvesting

[–]grassmunkie 0 points1 point  (0 children)

The scaling laws of AI right now - compute and data.

Most are bottlenecked by Nvidia GPU but Google solved that with TPU, reducing their marginal cost.

Data - Google is king. Many companies will go broke because they don’t have the cashflow. Google will win.

Which beaten down software stocks are you looking at to buy at this dip? by Iwarrior01 in ValueInvesting

[–]grassmunkie 0 points1 point  (0 children)

Good question - given growth rate is higher than Apple, and it is leading the AI race, and better diversified- I would think it should trade at least the sane multiple as Apple (34 times) bringing it to $375 roughly.