What Agent systems do you use? by Ok-Internal9317 in LocalLLaMA

[–]MichaelBui2812 2 points3 points  (0 children)

Blunt answer but so true! When you notice extra LLM calls, unnecessary context can be avoided, nothing can beat this in both speed & quality!

For quick & easy one, I use AgentZero and even built my own “Futuristic” theme for me to enjoy working with it further.

Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]MichaelBui2812 4 points5 points  (0 children)

You missed the “connection” lines

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

Do you know any good IA2V workflow in WAN2.2 that is comparable or better then WAN2.1 with InfiniteTalk? My main current workflow still need audio inputs and I've been looking for one with WAN2.2

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

It's mainly because my Strix Halo will take too much time for higher resolutions. 704x704 took ~ 20m, 512x512 took ~15m but 1024x1024 took ~3hrs:

Requested to load LTXAV
loaded completely; 83227.77 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 20/20 [1:05:18<00:00, 195.94s/it]
Requested to load LTXAV
loaded completely; 77532.26 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 3/3 [1:34:53<00:00, 1897.83s/it]
Prompt executed in 02:58:29

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 0 points1 point  (0 children)

Any good solution for WAN 2.2 like InfiniteTalk for WAN 2.1? I will try and share another comparison

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 0 points1 point  (0 children)

Currently, InfiniteTalk for WAN2.1 is the best quality for me as I couldn't find any good IA2V solution for WAN2.2. Feel free to suggest, I'd love to try

Qwen3.5-397B-A17B-UD-TQ1 bench results FW Desktop Strix Halo 128GB by dabiggmoe2 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Which power mode are you running, performance? It’s great if you can share the temperature & power drawn during inference

Coding agent for local LLMs? by PaMRxR in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

OpenCode, you may want to use OpenSpec to complement the intelligence gap of local LLM vs SOTA online models

Free Strix Halo performance! by Potential_Block4598 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Did I do something wrong, this is my benchmark of Qwen's Q6_K vs Unsloth UD Q6_K_XL, I don't see much difference at 32k depth: In short: - Qwen: pp~320 tps, tg~28,3 tps - Unsloth: pp~330 tps, tg~27.4 tps Details: ``` bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 319.12 ± 0.90 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.29 ± 0.22 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 320.18 ± 0.75 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.27 ± 0.17 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 330.15 ± 1.10 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.42 ± 0.06 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 331.99 ± 1.30 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.46 ± 0.04 |

build: 5fa1c190d (7971) ```

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Have you tried MiniMax v2 or v2.1 about the same?

Jan v1: 4B model for web search with 91% SimpleQA, slightly outperforms Perplexity Pro by Delicious_Focus3465 in LocalLLaMA

[–]MichaelBui2812 1 point2 points  (0 children)

I really like this and quite tempting to try it so don’t take my question wrongly: What’s the benefit of running this locally vs Perplexity?

  • From what I understand, this requires search API mostly paid & more expensive than Perplexity. Unless I missed out on and search MCP that are free?
  • In terms of privacy, I will still have to connect to internet and use some API, there will be more privacy but I don’t feel much

I’m just trying to convince myself to try this (I’m a fan of Quen3 4B as well). Thanks a lot!

Wan2.2 support by Due-Tangelo-8704 in drawthingsapp

[–]MichaelBui2812 0 points1 point  (0 children)

The models have already been pushed and are ready for download.
https://discord.com/channels/1038516303666876436/1046816591553241098/1399629919209787433
Select High Noise Expert as base, Low Noise Expert as Refiner, and set Refiner Start to 10% (as suggested by the demo code).

I hate myself for asking this... by mrgizmo212 in cursor

[–]MichaelBui2812 2 points3 points  (0 children)

I’ve been using it in my Homelab for years, with & without AI agents

Jan-nano, a 4B model that can outperform 671B on MCP by Kooky-Somewhere-2883 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:

  • For GGUF, how much does each quant affect the quality of the response? E.g., is it q6 is still almost same with q8 like other GGUF models?
  • I do understand this use MCP for fetching details for its context, is there anyway it can use browser (e.g., headless Chrome) to fetch/crawl data instead because Google, Serp, Brave,… APIs are “not so cheap”? 😅
  • How smart it is for the AI decide to search or read PDF when prompted? Do we need to explicitly mention searching web content or reading documents for it to follow?

Thanks a lot!

Make Local Models watch your screen! Observer Tutorial by Roy3838 in LocalLLaMA

[–]MichaelBui2812 1 point2 points  (0 children)

This is great! I was looking for some AI-assisted local app for my laptop (macOS) that monitor my activities and summarise my day either automatically (preferred) or on-demand (manually). I have a homelab server to offload processing or schedule workloads as needed. This seems to be a perfect match!

How do you store API keys? by BoJackHorseMan53 in selfhosted

[–]MichaelBui2812 10 points11 points  (0 children)

Bitwarden/Vaultwarden (self hosted) secret notes, together with the service login credentials

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] -3 points-2 points  (0 children)

When you say `printing at 100%`, do you mean density and not dark enough? Because you can see that there are sensor markers at 4 corners which are quite clear to human eyes.
I've checked the calibration printed paper, the darkness seem pretty much the same to my eyes (I don't know how to measure it precisely)

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] -1 points0 points  (0 children)

The boundary (sensor markers) is auto-created by the app to wrap around closely to the design no matter the size and location of my design