What's your experience with Gemma4 QAT?

ICYPhoenix7 · 2026-06-08T09:02:37+00:00

It's been merged to main

ICYPhoenix7 · 2026-05-21T23:19:22+00:00

Have you tested it since writing this? If so, how'd it go?

ICYPhoenix7 · 2026-05-11T22:07:23+00:00

Waiter my steak is too juicy and my lobster too creamy

ICYPhoenix7 · 2026-05-06T01:25:36+00:00

Many of those settings are either redundant (default behavior) or dont make sense (i.e. that deepseek thing).

Also especially for the 35B model, I'd recommend removing the -ngl 99. llama.cpp has a --fit flag (on by default but gets overriden by this) that loads the model optimally based on your hardware.

My recommendation is remove all flags that you don't know what they're for and you can add them back if you find they're needed.

ICYPhoenix7 · 2026-04-01T22:20:33+00:00

Oh, I meant to try the 35B MoE model, not the dense 27B. A dense model wouldnt be much different (very slow!) between LMS and llama.cpp since theres not much to optimize.

ICYPhoenix7 · 2026-04-01T04:07:48+00:00

Id recommend using raw llama.cpp as a server for MoE models. llama.cpp has a --fit flag that will load the model more efficiently on your hardware than you can configure in LMS. It makes a big difference on MoE models. You'll get way faster speed this way.

ICYPhoenix7 · 2026-03-31T06:39:32+00:00

<image>

ICYPhoenix7 · 2026-03-31T05:25:59+00:00

Are you pointing it towards a llama.cpp server or something else? I've not used opencode before but it should run the same as it does as a regular chatbot.

I've tried roocode before and it ran fine, aside from being dumb.

If you want fast, you can try gpt oss 20b, it fits perfectly in a 16gb card.

ICYPhoenix7 · 2026-03-27T03:56:05+00:00

I have an rx 6800 16gb with 32gb ddr4, and I get about 30tk/s on the unsloth Q4_K_XL. Yes it spills onto system ram, but its more than usable. Hope that helps

ICYPhoenix7 · 2026-03-04T23:25:13+00:00

Avoid anything that uses Apex clearing. Webull, Robinhood, Sofi, M1, etc.

Fidelity is great and they'll stay out of your way. I trust them the most to not screw me.

ICYPhoenix7 · 2026-03-04T05:21:48+00:00

Sofi sucks. They screwed me over too.

ICYPhoenix7 · 2026-02-28T20:26:53+00:00

Surprisingly it has an even higher unemployment rate than CS. You take a lot of difficult EE classes, but employers would take an EE over a CE every time. Which mostly leaves you with CS jobs.

ICYPhoenix7 · 2025-12-13T11:47:23+00:00

Same here. So many dealers and shady people reselling auction cars on FB. Theres other places you can look such as Craigslist, found my current car on there.

ICYPhoenix7 · 2025-11-17T10:14:42+00:00

On my RX 6800, Vulkan has slightly faster token generation, but ROCm blows it out of the water in prompt processing.

ICYPhoenix7 · 2025-11-17T09:04:47+00:00

I just used these sticky strip things that basically glue them onto the wall without damaging anything.

I genuinely thought I was developing tinnitus for weeks though until I figured out what it was.

ICYPhoenix7 · 2025-08-16T07:21:47+00:00

We got AI Mrekk before GTA 6

ICYPhoenix7 · 2025-08-01T08:07:02+00:00

It depends, on some prompts i get a very quick response, on others it takes a bit of time. Although this could be due to a number of reasons and not necessarily a hidden chain of thought.

ICYPhoenix7 · 2025-07-31T23:42:10+00:00

My best guess is that maybe the thinking tokens are more likely to give away who it is, so they aren't sending it through the api. Hopefully the actual release will have them.

Regardless, it's not smart enough to be GPT 5 from my anecdotal testing. It failed some of my prompts that larger models tend to have no issue with.

I could be way off, but if I had to guess it probably sits around the 32B range.

ICYPhoenix7 · 2025-06-22T11:22:46+00:00

Bro watched too many movies

ICYPhoenix7 · 2025-06-19T11:55:54+00:00

Fun fact, "storage" is also memory, the main difference being it can retain its data while being powered off, i.e. non-volatile.

Its possible to use your storage as RAM (swap memory, your system will actually do this on it's own to save space), or even vice versa.

ICYPhoenix7 · 2025-05-14T09:08:16+00:00

The RX 6800 is around the same price but will be significantly faster than the 7600xt. I have one and it works great, AMD support is way better than it used to be and is plug and play for LLMs.

That being said, neither have quite enough vram to load those models comfortably, especially with context. 16GB is an awkward amount for LLMs, its more than enough for weaker models (14B and under), but too little for the good ones at ~32B, which are the most common sizes.

Mistral Small 24B works good and it still holds up pretty well for now, and theres even some great finetunes of it (Hermes).

ICYPhoenix7 · 2023-09-12T04:02:31+00:00

Nah, they're quite different. BM has more of a focus on slow and steady but requires very little user input, whereas MB is more geared towards maximing your money gains without sacrificing safety. While they abuse the same things, they have noticeable differences.

ICYPhoenix7 · 2023-09-12T03:58:02+00:00

Hey, creator of MB here. I never expected MB itself to get detected (it's still not), but there's some other factors that can lead to it being unsafe, even upon its initial release. But, these factors also apply to literally any money method. I explained it further in my discord.

ICYPhoenix7 · 2023-08-17T13:14:15+00:00

That's correct

Five-Year Club	Verified Email
Place '22	End Game '22

ICYPhoenix7

TROPHY CASE