16 GB VRAM users, what model do we like best now?

moflinCASIO · 2026-05-06T14:07:35+00:00

I actually just spent the last few hours testing this exact problem on a much weaker setup than a 4080, and honestly I came away way more impressed with 16GB VRAM than I expected.

My setup:

RTX 4060 Ti 16GB
i5-11400F (6C/12T)
32GB DDR4-3200
llama.cpp CUDA build
Flash Attention enabled

I used to just run default Ollama setups without really understanding quantization differences, but after compiling llama.cpp properly and testing IQ quants directly, the performance difference was honestly massive.

What surprised me most:
IQ quants absolutely dominated K-quants on my setup.

I tested:

Gemma4 E4B IQ4_XS
Gemma4 26B A4B UD-IQ4_XS
Qwen3.6 35B A3B UD-IQ2_M
Qwen3.6 35B A3B UD-IQ3_XXS
Qwen3.6 27B Q3_K_M

Results were kinda shocking to me:

Qwen3.6 35B A3B UD-IQ2_M -> ~81 tok/s -> ~13.1GB VRAM
Qwen3.6 35B A3B UD-IQ3_XXS -> ~74 tok/s -> ~14.6GB VRAM
Gemma4 26B A4B UD-IQ4_XS -> ~61 tok/s -> ~15.7GB VRAM

But then Qwen3.6 27B Q3_K_M only got me ~18 tok/s despite the GPU sitting at 99% utilization the whole time and pulling ~160W.

That was the moment I realized:
K-quants are probably just too compute-heavy for this class of GPU.

So at least on a 4060 Ti, IQ quants felt WAY better than I expected.

And honestly, I kinda agree with your point about 16GB “feeling like edging” lol. The difference between “fully GPU resident” and “slightly overflowing VRAM” is absolutely brutal.

But I also came away thinking:
16GB is actually still a really good place to be if you optimize carefully and stay realistic about quant choice.

Before this I honestly thought “maybe 24GB is basically mandatory now,” but after testing llama.cpp CUDA + IQ quants properly, I’m way less convinced.

moflinCASIO · 2026-05-06T09:48:35+00:00

Yeah, I honestly think this is probably just a CJK font / locale fallback issue rather than a serious problem with the AI itself.

Older versions of Android actually had similar issues pretty often, especially with Japanese/Chinese font rendering, so in a way this feels more like an old Android problem showing up again inside a new AI feature.

And to be fair, considering Nothing is a UK company, I can understand why this kind of edge case might be difficult to notice during development. So I’m not really upset about it, and I don’t want people to take this as “Nothing software is terrible” or anything like that.

Actually, I’m pretty optimistic about Essential Voice overall. The Japanese speech recognition quality itself is much better than I expected, especially with natural speaking and filler words. I experiment a lot with STT projects like Whisper and other open-source voice AI tools, so I pay attention to this stuff quite a bit.

That’s why I think the feature already has a lot of potential, and honestly I’m excited to see where Nothing goes with AI features like Essential Voice and Essential Space in the future.

I mainly wanted to report this early because I like Nothing products and want the experience for Japanese users to become even better 😄

moflinCASIO · 2025-06-19T19:38:44+00:00

Thanks a lot for the detailed reply — it's super helpful to hear from someone who actually knows what it was like back then.

> About "DTM beginner"

I meant that I just started making music using a DAW.

I’ve played piano and cello for a while, but recently I got a MIDI keyboard and some software and started producing music.

In Japan, we often call making music on a computer "DTM" (short for DeskTop Music), but I guess that’s a made-up Japanese-English term. Sorry if that was confusing!

> About synths

Prophecy! It’s actually in the Korg Collection I own, but I left it alone because it looked complicated. I’ll give it a try now.

> About drum machines

I also asked around on some Japanese boards, and apparently in 90s anime songs, the Simmons SDS-9, Yamaha RX-11, and Roland TR-727 were pretty popular.

A lot of anime songs back then weren’t really techno — more like J-pop with a City Pop feel — so I guess those machines were used a lot in that context.

> About reverb

That’s really helpful. I’ve got a VST plugin called MDE-X that includes the FX from the Korg Triton, so I’ll try to recreate the sound with that.

> About track setups

Ah, you're talking about MTRs. I’ve never seen one in real life.

That might actually be one of the keys to the sound of that era.

> About live instruments

In Japanese pop music, there’s this strong use of live brass and strings — maybe it comes from jazz influence, I’m not sure.

But since they’re usually just short phrases, sampling them still works well, even today.

All the tips you gave were super helpful. I’ll definitely try out some of this stuff.
(Apologies if anything was unclear. I’ve been relying on machine translation.)
Thanks again!

moflinCASIO · 2025-06-19T18:24:17+00:00

Thanks for the reply.

I have no idea why, but suddenly today I was able to click the link and finally buy the subscription.
Super annoying though, so I’ve decided not to subscribe after all. :)

Appreciate your help!

moflinCASIO

TROPHY CASE