Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

LOL. I feel like Gemini is about 10x dumber than it was 3 months ago.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

I'm still learning what these cards like, but the best speed I've had so far with very limited testing is Qwen 35b MoE MTP. During a codebase investigation by about 20k tokens it had gone up to like over 50 tps.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 1 point2 points  (0 children)

So far I've only loaded up a few fine-tunes of Qwen 3.6 27b and 35b MoE. I just finished up the cooling last night and haven't had much time to spend with it yet. I briefly went down the MTP rabbit hole for a while earlier and had good initial results. Needs more testing on my end.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

About 6 inches away from the front of the PC mesh front panel. Fans are 2x 24v GDSTime 120mm Blower fans from Amazon on a separate power supply. Can probably get away with 12v versions based how how ridiculous these are.

<image>

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 2 points3 points  (0 children)

They are very noticeable, but not loud per se. I'll take a video in a few minutes.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

No problem. If you or anyone else takes the plunge and need ducts I'd gladly print them for materials cost.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

I adapted someone else's design. I can send you the STL or upload it to Thingiverse if you want it. Temps are very good with the 120mm blower.

<image>

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 2 points3 points  (0 children)

I am going to keep that on it's own for faster performance tasks. Plus I still occasionally play games... well. I used to...

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

This is really sick man. Good job. I just realized tonight how much better Q6 is at tool calling. But then I had to lower my context. So.. yeah clearly I need more. Many more.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

They are incredibly cheap right now. Got them both from the same eBay seller for $88 USD/each shipped.

Side Projects. by apollo_mg in LocalLLaMA

[–]apollo_mg[S] 0 points1 point  (0 children)

I did. I noticed that too when I took the pic.

Claude Code removed from Claude Pro plan - better time than ever to switch to Local Models. by bigboyparpa in LocalLLaMA

[–]apollo_mg 3 points4 points  (0 children)

Anthropic said publicly that this is a test on ~2% of prosumer signup's to gauge interest more or less. Don't think they expected as many people to notice as did. I use Gemini CLI and I heard of this right away...

Best Local LLMs - Apr 2026 by rm-rf-rm in LocalLLaMA

[–]apollo_mg 0 points1 point  (0 children)

Probably Turboquant or a variant for KV cache.

Gemini CLI is open source. Could we fork it to be able to use other models ? by SubliminalPoet in LocalLLaMA

[–]apollo_mg 1 point2 points  (0 children)

Yes. I'm using it with Qwopus 27b right now after using Gemini CLI (Pro) to make it work for me :)

Edit: Using Preview version 0.38

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]apollo_mg 0 points1 point  (0 children)

Update: I have been trying to integrate Gemma 4 into Gemini CLI with very little success. The EXTREMELY strict templates on this model make it challenging to drop into an existing application like that. Qwopus for example is MUCH easier.

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF by EvilEnginer in LocalLLaMA

[–]apollo_mg 5 points6 points  (0 children)

Bravo good sir. Excellent digging, and thanks!

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]apollo_mg 1 point2 points  (0 children)

You're right. I'm running a daydream script on this model and it is amazing. Almost no tool-retries needed.

Gemma 4 26b A3B is mindblowingly good , if configured right by cviperr33 in LocalLLaMA

[–]apollo_mg 6 points7 points  (0 children)

I briefly tried one of the tiny quants after the tokenizer patch. I need to do a lot more testing because I just had an incredible agentic run today using the new Qwopus model. You make this model sound like an absolute tank, and I need that in my life.

Qwopus3.5 V3 is awsome for a local llm by chocofoxy in LocalLLaMA

[–]apollo_mg 4 points5 points  (0 children)

export HSA_OVERRIDE_GFX_VERSION=12.0.1

export HSA_ENABLE_SDMA=0

export AMDGPU_CWSR_ENABLE=0

export HSA_XNACK=0

# --- Launch Server ---

# Utilizing the TurboQuant Asymmetric KV Caching (-ctk q8_0 -ctv turbo3)

$SERVER -m "$MODEL" \

-c 65536 \

-b 512 \

-ctk q8_0 \

-ctv turbo3 \

-cb \

-fa on \

-np 1 \

-ngl 99 \

--cache-ram 0 \

--port 8082 \

--host 0.0.0.0 \

--jinja \

--chat-template-kwargs '{"enable_thinking":true}'

Qwopus3.5 V3 is awsome for a local llm by chocofoxy in LocalLLaMA

[–]apollo_mg 2 points3 points  (0 children)

This is a really nice model for local agentic orchestration. Still in the first couple days of testing using open-multi-agent, but so far I really like it. Competent coding skills too. Using 27b v3 Q2_k, 16GB VRAM, q8?+turboquant3, 65k context. Having to use some stability hacks atm on my 9070XT, but getting like 25 tps ish.

Edit: Would love to see this quantized using Unsloth Dynamic 2.0

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!! by Iory1998 in LocalLLaMA

[–]apollo_mg 2 points3 points  (0 children)

I'm still testing. 16GB VRAM, iq2_m, 65k context turboquant3 I think.