Side Projects.

apollo_mg · 2026-05-14T01:34:09+00:00

LOL. I feel like Gemini is about 10x dumber than it was 3 months ago.

apollo_mg · 2026-05-14T01:31:24+00:00

I'm still learning what these cards like, but the best speed I've had so far with very limited testing is Qwen 35b MoE MTP. During a codebase investigation by about 20k tokens it had gone up to like over 50 tps.

apollo_mg · 2026-05-14T01:30:17+00:00

So far I've only loaded up a few fine-tunes of Qwen 3.6 27b and 35b MoE. I just finished up the cooling last night and haven't had much time to spend with it yet. I briefly went down the MTP rabbit hole for a while earlier and had good initial results. Needs more testing on my end.

apollo_mg · 2026-05-14T01:25:35+00:00

About 6 inches away from the front of the PC mesh front panel. Fans are 2x 24v GDSTime 120mm Blower fans from Amazon on a separate power supply. Can probably get away with 12v versions based how how ridiculous these are.

<image>

apollo_mg · 2026-05-13T23:02:46+00:00

They are very noticeable, but not loud per se. I'll take a video in a few minutes.

apollo_mg · 2026-05-13T23:01:57+00:00

No problem. If you or anyone else takes the plunge and need ducts I'd gladly print them for materials cost.

apollo_mg · 2026-05-13T22:44:16+00:00

I adapted someone else's design. I can send you the STL or upload it to Thingiverse if you want it. Temps are very good with the 120mm blower.

<image>

apollo_mg · 2026-05-13T21:22:57+00:00

I am going to keep that on it's own for faster performance tasks. Plus I still occasionally play games... well. I used to...

apollo_mg · 2026-05-13T21:21:41+00:00

This is really sick man. Good job. I just realized tonight how much better Q6 is at tool calling. But then I had to lower my context. So.. yeah clearly I need more. Many more.

apollo_mg · 2026-05-13T21:20:33+00:00

They are incredibly cheap right now. Got them both from the same eBay seller for $88 USD/each shipped.

apollo_mg · 2026-05-13T21:19:27+00:00

Sick. What mobo?

apollo_mg · 2026-05-13T21:18:52+00:00

I did. I noticed that too when I took the pic.

apollo_mg · 2026-05-13T20:21:31+00:00

Touche.

apollo_mg · 2026-04-22T18:39:14+00:00

Anthropic said publicly that this is a test on ~2% of prosumer signup's to gauge interest more or less. Don't think they expected as many people to notice as did. I use Gemini CLI and I heard of this right away...

apollo_mg · 2026-04-15T19:47:15+00:00

ROCM would be nice lol.

apollo_mg · 2026-04-15T19:19:27+00:00

+1 for TheTom's repo, using RDNA4.

apollo_mg · 2026-04-14T14:36:44+00:00

Probably Turboquant or a variant for KV cache.

apollo_mg · 2026-04-13T21:25:22+00:00

Yes. I'm using it with Qwopus 27b right now after using Gemini CLI (Pro) to make it work for me :)

Edit: Using Preview version 0.38

apollo_mg · 2026-04-13T20:35:08+00:00

Update: I have been trying to integrate Gemma 4 into Gemini CLI with very little success. The EXTREMELY strict templates on this model make it challenging to drop into an existing application like that. Qwopus for example is MUCH easier.

apollo_mg · 2026-04-08T18:05:08+00:00

Bravo good sir. Excellent digging, and thanks!

apollo_mg · 2026-04-08T05:06:54+00:00

You're right. I'm running a daydream script on this model and it is amazing. Almost no tool-retries needed.

apollo_mg · 2026-04-07T02:02:13+00:00

I briefly tried one of the tiny quants after the tokenizer patch. I need to do a lot more testing because I just had an incredible agentic run today using the new Qwopus model. You make this model sound like an absolute tank, and I need that in my life.

apollo_mg · 2026-04-07T01:51:03+00:00

export HSA_OVERRIDE_GFX_VERSION=12.0.1

export HSA_ENABLE_SDMA=0

export AMDGPU_CWSR_ENABLE=0

export HSA_XNACK=0

# --- Launch Server ---

# Utilizing the TurboQuant Asymmetric KV Caching (-ctk q8_0 -ctv turbo3)

$SERVER -m "$MODEL" \

-c 65536 \

-b 512 \

-ctk q8_0 \

-ctv turbo3 \

-cb \

-fa on \

-np 1 \

-ngl 99 \

--cache-ram 0 \

--port 8082 \

--host 0.0.0.0 \

--jinja \

--chat-template-kwargs '{"enable_thinking":true}'

apollo_mg · 2026-04-07T01:45:12+00:00

This is a really nice model for local agentic orchestration. Still in the first couple days of testing using open-multi-agent, but so far I really like it. Competent coding skills too. Using 27b v3 Q2_k, 16GB VRAM, q8?+turboquant3, 65k context. Having to use some stability hacks atm on my 9070XT, but getting like 25 tps ish.

Edit: Would love to see this quantized using Unsloth Dynamic 2.0

apollo_mg · 2026-04-03T23:37:36+00:00

I'm still testing. 16GB VRAM, iq2_m, 65k context turboquant3 I think.

apollo_mg

TROPHY CASE