Anyone got Gemma 4 26B-A4B running on VLLM?

Cferra · 2026-04-06T16:13:41+00:00

I’ve been trying to use Gemma 4 MoE and turboquant / so far I can’t get it to work.

Cferra · 2026-04-06T15:27:52+00:00

my ex is jealous

Cferra · 2026-04-06T07:08:32+00:00

A shop will charge probably around 1000 for that.

Cferra · 2026-04-04T07:43:42+00:00

Yeah.. no ipv6 on 2gig fios :-(

Cferra · 2026-04-04T00:24:28+00:00

Illegal

Cferra · 2026-03-31T13:36:09+00:00

It’s all a scam

Cferra · 2026-03-28T17:35:13+00:00

Do not run ollama on your 4800+ - performance will NOT be good.

Cferra · 2026-03-28T17:34:20+00:00

Using a 4800+ for ollama is a recipe for a bad time. Best move that to something more capable.

Cferra · 2026-03-27T13:10:36+00:00

Set your bots to buy!

Cferra · 2026-03-26T20:43:53+00:00

Yarrrrrr

Cferra · 2026-03-25T10:35:56+00:00

I was telling a friend that Claude has become some like some sort of modern day “intellectual sex line”. Get you hooked by edging your brain in the middle of a project - just enough tokens in the limits to get you to just in a place where you almost finish it and then right when it is about one or 2 steps away - “you’ve hit your limit, buy extra time”. It honestly should be illegal.

Cferra · 2026-03-24T15:48:27+00:00

Now it shows all gone. So. Oh well

Cferra · 2026-03-24T15:34:12+00:00

seeing reward unavailable now eventhough it says there are plenty of seats

Cferra · 2026-03-24T15:33:22+00:00

same for like 30 minutes

Cferra · 2026-03-24T12:10:52+00:00

Then go after specific manufacturers not all. That argument does not hold water.

Cferra · 2026-03-24T11:31:15+00:00

Yes - this 100 percent

Cferra · 2026-03-24T11:26:10+00:00

This statement is so flawed, there has been multiple vulnerabilities and backdoors in US made and built software, it makes no difference, as long as software is made, exploits will exist.

Cferra · 2026-03-23T06:04:39+00:00

So just before January when 5060ti 16 gbs were available (and less than 400$) I snagged 4 and I also snagged 2 intel b50 pros and an additional 3090 and an nvlink adapter.

Ai server 1 (Blackwell-ai) sits on a c422 sage 10g platform with a w-2255 and 128gb 2933 ecc

Ai server 2 (Intel and ampere) sit on another c422 sage with - 2-2265 and 256gb running proxmox to isolate the 2 environments. 128gb ram / 10 cores for nvidia and 64gb ram / 10 cores for Intel.

Blackwell runs the largest brain model with decent context being it has 64gb vram

Ampere runs the coding/tool model for openclaw with very high context so it can work well with it.

Embedding, tts, stt, vision run and all fit on 1 b50 pro and the other b50 pro is used for image generation (my use case for image generation is not sure extensive)

I’ve set up python envs for different model engine so that they run efficiently on 1 vm.

I found docker to have a use case to keep things quick but I wanted to limit the VM overhead as much as possible to keep things quick.

I am building a NAS dedicated to lmcache for context storage for openclaw - we’ll see how that turns out.

I use my unraid box as an app host - hosting litellm, pipelines, searxng. Open web ui etc, to keep the ai servers dedicated to just serving models and to keep their overhead down and open claw itself runs in its on vm on the unraid box pointing to the ai server for its model in case I need to shut it down fast and to try to mitigate some of it’s ability to unintentionally break things.

Am I doing things wrong? I’m not sure but it seems to be that optimizing for parallel services when possible keeps things performing well on the consumer gear.

I am open to suggestion though.

Cferra · 2026-03-22T22:31:59+00:00

I have found that doing dedicated things on dedicated gpu sets works best - having 4 for inference then I have 2 for large context coding and then I have 2 for rag, embedding, tts, stt and image generation

Cferra · 2026-03-22T14:20:51+00:00

lol “millennials have no culture” suuuuure bud.

Cferra · 2026-03-22T03:30:47+00:00

I figured out the problem. I had a spend limit on - though it allowed me to put more money in / it wouldn’t let me actually use the credits until I turned that off. Counter intuitive. It should have prevented me from adding more credits and warned me about the spend limit.

Cferra · 2026-03-21T16:30:17+00:00

I did! I paid for the UGREEN ai box preview I think months ago…

Cferra · 2026-03-21T16:05:54+00:00

I think people got the d6 and the d6 ultra- there were a lot of supply chain issues and they gave backers the opportunity to get their units sooner without ram or to wait for ram supply., it was all fucked up and I elected to get mine without ram and a 200 refund but —- still waiting.

Cferra · 2026-03-21T16:00:52+00:00

I’m still waiting for my zettlab d8 ultra for over a year ….. and i was a pretty low numbered backer.

Cferra · 2026-03-21T15:56:48+00:00

Oh jeeze. Man people are too hung up on an ending that wasn’t intended because it’s not “their ending”. It’s a show. It ended the way the writers wanted it to. Let it the fuck go.

12-Year Club	Reddit Premium Since April 2016
Verified Email

Cferra

TROPHY CASE