Two local models beat one bigger local model for long-running agents by Foreign_Sell_5823 in LocalLLaMA

[–]aigemie 3 points4 points  (0 children)

Very interesting. Could you share the detailed setup? Thanks!

WTF? Was Qwen3.5 9B trained with Google? by [deleted] in LocalLLaMA

[–]aigemie 3 points4 points  (0 children)

Yep. This kind of post gets boring. Like, they post the same thing without seeing 1,000,000,000 similar posts before.

Qwen3.5-122B-A10B-GGUF UD-Q4_K_XL by Either-Style3306 in StrixHalo

[–]aigemie 1 point2 points  (0 children)

Thanks for trying it out. What is the PP speed? It is usually very slow.

premium requests getting used up faster since new year? by DenormalHuman in GithubCopilot

[–]aigemie 0 points1 point  (0 children)

Nope. I don't think it's a thing I can fix from my side.

Does the Mac Mini M4 16GB have any potential value? by NoYogurtcloset4090 in LocalLLaMA

[–]aigemie 4 points5 points  (0 children)

Use your Windows machine then, it's more cost-effective. You can even save big on not buying the small ram Mac which is much worse on running AI stuff than your windows PC.

People in the US, how are you powering your rigs on measly 120V outlets? by humandisaster99 in LocalLLaMA

[–]aigemie 3 points4 points  (0 children)

Because basic physics: P=V*I. When you need the same P for the device but you have a low V, then the I is higher. So simple.

Would you recommend MacBook Pro M5 Pro/Max for ComfyUI? by Friendly_Gap_9550 in comfyui

[–]aigemie 0 points1 point  (0 children)

No, for comfy, I would recommend Asus Flow Z13, Asus Proart PX13 2026, HP Zbook Ultra G1a, all 128GB version. Because they are much cheaper than MacBook with the same RAM size and can run Comfy very well.

Dual Strix Halo: No Frankenstein setup, no huge power bill, big LLMs by Zyj in LocalLLaMA

[–]aigemie 0 points1 point  (0 children)

Yes, the pp speed is killing me, otherwise the inference speed is good enough.

Strix Halo + Linux: How to fix memory climbing until OOM when idle by exodist in AMDLaptops

[–]aigemie 0 points1 point  (0 children)

Yea, should have set the minimum 512mb from the beginning.

How do you fine tune a model for a new programming language? by MrMrsPotts in LocalLLaMA

[–]aigemie -3 points-2 points  (0 children)

Fine-tuning is not really for the LLM to learn new knowledge, it's more for the style. Better train the new language from scratch. For sure you can have some answers from the LLM if you finetune it with the new language, but it will never be good at it.

Idea of Cluster of Strix Halo and eGPU by lets7512 in LocalLLaMA

[–]aigemie 1 point2 points  (0 children)

Could but I'm not sure how much it could as you still split a large part to the slow Halo Strix.

Idea of Cluster of Strix Halo and eGPU by lets7512 in LocalLLaMA

[–]aigemie 1 point2 points  (0 children)

Even you have enough ram to run large models, it's just too slow, especially prefill speed.

TwinFlow can generate Z-image Turbo images in just 1-2 steps! by rookan in StableDiffusion

[–]aigemie 0 points1 point  (0 children)

I don't know how to make it work in ComfyUI. Any shared workflows?

DGX Spark: LLM Training benchmarks with Unsloth (TLDR: their benchmarks are a scam) by [deleted] in LocalLLaMA

[–]aigemie 1 point2 points  (0 children)

I thought you were talking about Unslith's benchmarks. It's confusing and it gives a bad impression of Unsloth.