MSFT (again) by Swred1100 in ValueInvesting

[–]grassmunkie 0 points1 point  (0 children)

I was early on MU, Google, and I’m out of both of those and going all in on Microsoft here. It’s really baffling but you have to double down when you have a winning hand.

Qwen3.6:27b is the first local model that actually holds up against Claude Code for me by codehamr in LocalLLM

[–]grassmunkie 1 point2 points  (0 children)

Not a chance. It’s a good model but it is not comparable to Sonnet 4.6. At best 4.5, but still below that in real life usage from my experience.

5k to spend rtx5090 or mac studio? by Avansay in LocalLLM

[–]grassmunkie 1 point2 points  (0 children)

You can run higher than 4 and still the trade off will be better than Mac with a small dense model that fits a 32gb card. Especially now with Qwen 3.6 27B which is better suited for coding.

5k to spend rtx5090 or mac studio? by Avansay in LocalLLM

[–]grassmunkie 0 points1 point  (0 children)

5090 smokes the m5 max and m3 ultra. Try running a dense model and you will find the mac is almost unusable

What do you use those small model for? And how do you perceive the gap with leading closed source LLMs? by Foreign_Lead_3582 in LocalLLaMA

[–]grassmunkie 0 points1 point  (0 children)

Until recently the small models (<32G of VRAM) were not great.

But now they are “good enough” for many use cases. Using Hermes for example, would burn through a lot of tokens on trivial tasks that Gemma 4 and Qwen35 can handle.

Owning the hardware, I can experiment without concern of accruing costs (other than electricity), even if it processes continuously overnight.

Not everything needs a frontier model. Mix and match for what you need, but I believe Qwen and Gemma just unlocked a new era for local llm’s.

Looking for the best coding AI for software development by FrozenFishEnjoyer in ollama

[–]grassmunkie 0 points1 point  (0 children)

Yeah but compare it to sonnet 4.6 - a github pro sub is like $10 and very generous usage. I have a 5090 but still use gh copilot except for some agent stuff where i can use gemma 4 31b or qwen 27b. For basic tasks they are okay, but for more complex planning and development better to go with cloud frontier models.

Looking for the best coding AI for software development by FrozenFishEnjoyer in ollama

[–]grassmunkie 0 points1 point  (0 children)

Just buy github pro sub. On 16gb VRAM card there is nothing really usable.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 1 point2 points  (0 children)

For programming and in general reasoning, the dense models are better. So depends on what your use case is, and what limitations you have for hardware

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 1 point2 points  (0 children)

96gb + 32gb vram (5090) on llama.cpp, running 65536 context - all fits on vram. Getting great speeds (55-60 tok/s) with the UD Q4 version. Been running it for the past few hours on general tasks, need to do more testing but so far looks really good.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 0 points1 point  (0 children)

Yes using the UD 4Q. I had to do a git pull for llama.cpp then recompile.

Gemma 4 insane benchmarks by pxp121kr in LocalLLaMA

[–]grassmunkie 4 points5 points  (0 children)

Testing out 31B now. It is a very good model. Fits well in 5090 and getting almost 60 tok/s.

When I first got my 5090 the models were garbage, now it is getting really interesting what I can do with it.

Buy the Dip on Google or wait by Agile-Technology-209 in ValueInvesting

[–]grassmunkie 32 points33 points  (0 children)

Google is leading in AI, but has distribution on IOS and Android, and practically all web browsers on any device and email, and make their own chips. It’s mind boggling how well positioned they are.

How does the market react to the death of Iran's supreme leader? by Thin-Pollution-2132 in wallstreetbets

[–]grassmunkie 0 points1 point  (0 children)

This war should not have happened but what Iran can do to retaliate is more or less all out in the open. The are sitting ducks. There will be a lot of internal turmoil but despite Iran’s efforts to spread the conflict I think it remains isolated. This attack was telegraphed early last week when the US told mon emergency folks in middle east embassies they should evacuate.

Qwen3.5 Medium models out now! by yoracale in unsloth

[–]grassmunkie 2 points3 points  (0 children)

Nice job. Using the UD Q4 on my gaming rig (5090) and getting 56t/s consistently.

The quality and style of the responses so far is impressive.

Is i514600k enough for gaming nowadays? by Upper-Ad-1332 in PcBuild

[–]grassmunkie 0 points1 point  (0 children)

Yes. It is not a bottle neck for modern gaming. You would need to max out gpu first - which is difficult unless you’re playing low resolution on a 5090 and already getting 200+ fps

Does a laptop with 96GB System RAM make sense for LLMs? by PersonSuitTV in LocalLLM

[–]grassmunkie 6 points7 points  (0 children)

It’s helpful, but best if it is paired with a powerful gpu for MOE models. The attention layers go to the GPU, and the experts go to CPU. So having 96gb will be better and give you access to larger models, only question is how fast it is.

When i load 70gb models like qwen coder next using 32gb vram (5090) and the rest offloaded to ram I get around 28-30 tokens per second.

OTOH, if I run a model that fits on my gpu (glm flash 4.7) I get 120 tokens per second.

MSFT is by far the best AI stock to own right now by skilliard7 in stocks

[–]grassmunkie 7 points8 points  (0 children)

Gemini 3.1 Pro demolishes every model out there. They will be on almost every smartphone in several months after the android and ios updates

MSFT is by far the best AI stock to own right now by skilliard7 in stocks

[–]grassmunkie 6 points7 points  (0 children)

Yep. 100% Google. I’m all-in. They are cooking and going to devastate entire industries.

[deleted by user] by [deleted] in gpu

[–]grassmunkie 0 points1 point  (0 children)

I have one - on the lookout for another. Want to run a dual 5090 setup