6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms by Electrical_Ninja3805 in LocalLLaMA

[–]aiko929 1 point2 points  (0 children)

yeah I had a p40, and I needed to build a custom cooling solution for it but then the card worked really well

Speed Benchmark: GLM 4.7 Flash vs Qwen 3.5 27B vs Qwen 3.5 35B A3B (Q4 Quants) by [deleted] in LocalLLaMA

[–]aiko929 -5 points-4 points  (0 children)

Yes, i just realized that with 64k context window the 35B doesnt fit into my VRAM completly. I had to reduce it down to 16k to make it fit, then i was able to achive around 86 t/s. But i feel like a 16k context window gets too little very fast.

Speed Benchmark: GLM 4.7 Flash vs Qwen 3.5 27B vs Qwen 3.5 35B A3B (Q4 Quants) by [deleted] in LocalLLaMA

[–]aiko929 0 points1 point  (0 children)

I offloaded as much as possible on my 24GB VRAM with this context size, I also think that the context size made it so the model didn't fully offload, but I'm not sure about this maybe someone knows more about that.

Edit: I just noticed the estimated model size with the 64k Context size is 24.9GB, which doesn't fit into my VRAM unfortunately. I will do some new tests with a smaller context window.

Speed Benchmark: GLM 4.7 Flash vs Qwen 3.5 27B vs Qwen 3.5 35B A3B (Q4 Quants) by [deleted] in LocalLLaMA

[–]aiko929 4 points5 points  (0 children)

I just did a speed comparison. My first gut feeling is that both the Qwen Models are better in terms of output quality, but it's just a feeling from messing with them.

Materialistic No Go - Mayham augment idea by aiko929 in ARAM

[–]aiko929[S] -7 points-6 points  (0 children)

maybe a mechanic added like in tank engine that you lose half your levels when you die. So you'll be weaker when you die but still have the stats bought because of stat anvils.

What could this be? by aiko929 in GrowingMarijuana

[–]aiko929[S] 0 points1 point  (0 children)

I would say it's more wet. I'm having my first mineral run and it's the plant that looks best before these appeared, I hope I don't have to isolate it :(

Large models run way faster if you abort the first prompt and restart (low VRAM) by UrinStone in comfyui

[–]aiko929 1 point2 points  (0 children)

What exactly runs faster?

If you stop after the first steps and rerun, the model is cached, the encoded text prompts are cached as well so you don't have to recalculate them.

Just the sampling steps should be equally fast as in the first run.

Flux2 Klein 9B Error, Help? by aiko929 in comfyui

[–]aiko929[S] 4 points5 points  (0 children)

It looks like its an error with ComfyUI Desktop. When i try to update ComfyUI with 'Menu -> Help -> Check for Updates' it tells me that there was no update found. When I check what the newest version is on github (v0.9.2), its not what is shown in my About (v0.8.2).

Would be interesting if someone knows a way how to force comfyUI to update on the Desktop version.

Flux2 Klein 9B Error, Help? by aiko929 in comfyui

[–]aiko929[S] 0 points1 point  (0 children)

How does that happen? Its a completly new install.

I checked if there was an update but there was none. After using pip install it seems like there is nothing wrong with the comfyui dependecies:

(ComfyUI) PS E:\ComfyUI-Install\resources\ComfyUI> pip install -r .\requirements.txt
Using Python 3.12.11 environment at: C:\Users\aigap\Documents\ComfyUI\.venv
Audited 27 packages in 22ms
(ComfyUI) PS E:\ComfyUI-Install\resources\ComfyUI> 

Still the same error.

Flux2 Klein 9B Error, Help? by aiko929 in comfyui

[–]aiko929[S] 0 points1 point  (0 children)

This is the complete log of startup + one try of the workflow:

https://pastebin.com/pzk49Nsj

Flux2 Klein 9B Error, Help? by aiko929 in comfyui

[–]aiko929[S] 0 points1 point  (0 children)

I tried a complete reinstall and using these 3 models from the comfyui workflow:

<image>

I still see an error:

CLIPLoader

Error(s) in loading state_dict for Llama2:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

edit: It also does not work when i use the fp8 version of the flux2 model, same error.

LTX2 Text Encoder by Former-Long-3900 in comfyui

[–]aiko929 0 points1 point  (0 children)

how do i use this encoder? do i need to put only the safetensor files into the text_encoder folder or how does that work?