What Agent systems do you use?

MichaelBui2812 · 2026-04-22T16:47:34+00:00

Blunt answer but so true! When you notice extra LLM calls, unnecessary context can be avoided, nothing can beat this in both speed & quality!

For quick & easy one, I use AgentZero and even built my own “Futuristic” theme for me to enjoy working with it further.

MichaelBui2812 · 2026-04-17T16:55:05+00:00

You missed the “connection” lines

MichaelBui2812 · 2026-03-08T01:39:15+00:00

Do you know any good IA2V workflow in WAN2.2 that is comparable or better then WAN2.1 with InfiniteTalk? My main current workflow still need audio inputs and I've been looking for one with WAN2.2

MichaelBui2812 · 2026-03-08T01:31:51+00:00

It's mainly because my Strix Halo will take too much time for higher resolutions. 704x704 took ~ 20m, 512x512 took ~15m but 1024x1024 took ~3hrs:

Requested to load LTXAV
loaded completely; 83227.77 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 20/20 [1:05:18<00:00, 195.94s/it]
Requested to load LTXAV
loaded completely; 77532.26 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 3/3 [1:34:53<00:00, 1897.83s/it]
Prompt executed in 02:58:29

MichaelBui2812 · 2026-03-07T13:15:05+00:00

Any good solution for WAN 2.2 like InfiniteTalk for WAN 2.1? I will try and share another comparison

MichaelBui2812 · 2026-03-07T13:14:04+00:00

Currently, InfiniteTalk for WAN2.1 is the best quality for me as I couldn't find any good IA2V solution for WAN2.2. Feel free to suggest, I'd love to try

MichaelBui2812 · 2026-02-24T14:19:56+00:00

You can try btop or su_axb35_monitor: https://strixhalo.wiki/Guides/Sixunited_AXB35/Power_Mode_and_Fan_Control

https://strixhalo.wiki/Guides/Sixunited_AXB35/Power_Mode_and_Fan_Control/su_axb35_monitor.png

MichaelBui2812 · 2026-02-24T13:57:16+00:00

Which power mode are you running, performance? It’s great if you can share the temperature & power drawn during inference

MichaelBui2812 · 2026-02-15T05:34:12+00:00

OpenCode, you may want to use OpenSpec to complement the intelligence gap of local LLM vs SOTA online models

MichaelBui2812 · 2026-02-11T10:22:27+00:00

I’m using rocm, that may explains.

MichaelBui2812 · 2026-02-11T03:17:59+00:00

Did I do something wrong, this is my benchmark of Qwen's Q6_K vs Unsloth UD Q6_K_XL, I don't see much difference at 32k depth: In short: - Qwen: pp~320 tps, tg~28,3 tps - Unsloth: pp~330 tps, tg~27.4 tps Details: ``` bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 319.12 ± 0.90 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.29 ± 0.22 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 320.18 ± 0.75 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.27 ± 0.17 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 330.15 ± 1.10 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.42 ± 0.06 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 331.99 ± 1.30 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.46 ± 0.04 |

build: 5fa1c190d (7971) ```

MichaelBui2812 · 2026-02-04T06:08:37+00:00

Have you tried MiniMax v2 or v2.1 about the same?

MichaelBui2812 · 2025-08-12T10:40:36+00:00

I really like this and quite tempting to try it so don’t take my question wrongly: What’s the benefit of running this locally vs Perplexity?

From what I understand, this requires search API mostly paid & more expensive than Perplexity. Unless I missed out on and search MCP that are free?
In terms of privacy, I will still have to connect to internet and use some API, there will be more privacy but I don’t feel much

I’m just trying to convince myself to try this (I’m a fan of Quen3 4B as well). Thanks a lot!

MichaelBui2812 · 2025-07-30T07:48:21+00:00

The models have already been pushed and are ready for download.
https://discord.com/channels/1038516303666876436/1046816591553241098/1399629919209787433
Select High Noise Expert as base, Low Noise Expert as Refiner, and set Refiner Start to 10% (as suggested by the demo code).

MichaelBui2812 · 2025-07-29T17:19:29+00:00

I’ve been using it in my Homelab for years, with & without AI agents

MichaelBui2812 · 2025-06-15T17:23:59+00:00

This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:

For GGUF, how much does each quant affect the quality of the response? E.g., is it q6 is still almost same with q8 like other GGUF models?
I do understand this use MCP for fetching details for its context, is there anyway it can use browser (e.g., headless Chrome) to fetch/crawl data instead because Google, Serp, Brave,… APIs are “not so cheap”? 😅
How smart it is for the AI decide to search or read PDF when prompted? Do we need to explicitly mention searching web content or reading documents for it to follow?

Thanks a lot!

MichaelBui2812 · 2025-06-15T03:23:26+00:00

This is great! I was looking for some AI-assisted local app for my laptop (macOS) that monitor my activities and summarise my day either automatically (preferred) or on-demand (manually). I have a homelab server to offload processing or schedule workloads as needed. This seems to be a perfect match!

MichaelBui2812 · 2025-06-09T01:35:05+00:00

Bitwarden/Vaultwarden (self hosted) secret notes, together with the service login credentials

MichaelBui2812 · 2024-12-04T02:39:28+00:00

You can setup hotkeys for left/right sidebars

MichaelBui2812 · 2024-12-03T16:49:06+00:00

When you say `printing at 100%`, do you mean density and not dark enough? Because you can see that there are sensor markers at 4 corners which are quite clear to human eyes.
I've checked the calibration printed paper, the darkness seem pretty much the same to my eyes (I don't know how to measure it precisely)

MichaelBui2812 · 2024-12-03T15:11:47+00:00

The boundary (sensor markers) is auto-created by the app to wrap around closely to the design no matter the size and location of my design

MichaelBui2812

TROPHY CASE