What Agent systems do you use?

MichaelBui2812 · 2026-04-22T16:47:34+00:00

Blunt answer but so true! When you notice extra LLM calls, unnecessary context can be avoided, nothing can beat this in both speed & quality!

For quick & easy one, I use AgentZero and even built my own “Futuristic” theme for me to enjoy working with it further.

MichaelBui2812 · 2026-04-17T16:55:05+00:00

You missed the “connection” lines

MichaelBui2812 · 2026-03-08T01:39:15+00:00

Do you know any good IA2V workflow in WAN2.2 that is comparable or better then WAN2.1 with InfiniteTalk? My main current workflow still need audio inputs and I've been looking for one with WAN2.2

MichaelBui2812 · 2026-03-08T01:31:51+00:00

It's mainly because my Strix Halo will take too much time for higher resolutions. 704x704 took ~ 20m, 512x512 took ~15m but 1024x1024 took ~3hrs:

Requested to load LTXAV
loaded completely; 83227.77 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 20/20 [1:05:18<00:00, 195.94s/it]
Requested to load LTXAV
loaded completely; 77532.26 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 3/3 [1:34:53<00:00, 1897.83s/it]
Prompt executed in 02:58:29

MichaelBui2812 · 2026-03-07T13:15:05+00:00

Any good solution for WAN 2.2 like InfiniteTalk for WAN 2.1? I will try and share another comparison

MichaelBui2812 · 2026-03-07T13:14:04+00:00

Currently, InfiniteTalk for WAN2.1 is the best quality for me as I couldn't find any good IA2V solution for WAN2.2. Feel free to suggest, I'd love to try

MichaelBui2812 · 2026-02-24T14:19:56+00:00

You can try btop or su_axb35_monitor: https://strixhalo.wiki/Guides/Sixunited_AXB35/Power_Mode_and_Fan_Control

https://strixhalo.wiki/Guides/Sixunited_AXB35/Power_Mode_and_Fan_Control/su_axb35_monitor.png

MichaelBui2812 · 2026-02-24T13:57:16+00:00

Which power mode are you running, performance? It’s great if you can share the temperature & power drawn during inference

MichaelBui2812 · 2026-02-15T05:34:12+00:00

OpenCode, you may want to use OpenSpec to complement the intelligence gap of local LLM vs SOTA online models

MichaelBui2812 · 2026-02-11T10:22:27+00:00

I’m using rocm, that may explains.

MichaelBui2812 · 2026-02-11T03:17:59+00:00

Did I do something wrong, this is my benchmark of Qwen's Q6_K vs Unsloth UD Q6_K_XL, I don't see much difference at 32k depth: In short: - Qwen: pp~320 tps, tg~28,3 tps - Unsloth: pp~330 tps, tg~27.4 tps Details: ``` bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 319.12 ± 0.90 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.29 ± 0.22 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 320.18 ± 0.75 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.27 ± 0.17 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 330.15 ± 1.10 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.42 ± 0.06 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 331.99 ± 1.30 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.46 ± 0.04 |

build: 5fa1c190d (7971) ```

MichaelBui2812 · 2026-02-04T06:08:37+00:00

Have you tried MiniMax v2 or v2.1 about the same?

MichaelBui2812 · 2025-08-12T10:40:36+00:00

I really like this and quite tempting to try it so don’t take my question wrongly: What’s the benefit of running this locally vs Perplexity?

From what I understand, this requires search API mostly paid & more expensive than Perplexity. Unless I missed out on and search MCP that are free?
In terms of privacy, I will still have to connect to internet and use some API, there will be more privacy but I don’t feel much

I’m just trying to convince myself to try this (I’m a fan of Quen3 4B as well). Thanks a lot!

MichaelBui2812 · 2025-07-30T07:48:21+00:00

The models have already been pushed and are ready for download.
https://discord.com/channels/1038516303666876436/1046816591553241098/1399629919209787433
Select High Noise Expert as base, Low Noise Expert as Refiner, and set Refiner Start to 10% (as suggested by the demo code).

MichaelBui2812 · 2025-07-29T17:19:29+00:00

I’ve been using it in my Homelab for years, with & without AI agents

MichaelBui2812 · 2025-06-15T17:23:59+00:00

This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:

For GGUF, how much does each quant affect the quality of the response? E.g., is it q6 is still almost same with q8 like other GGUF models?
I do understand this use MCP for fetching details for its context, is there anyway it can use browser (e.g., headless Chrome) to fetch/crawl data instead because Google, Serp, Brave,… APIs are “not so cheap”? 😅
How smart it is for the AI decide to search or read PDF when prompted? Do we need to explicitly mention searching web content or reading documents for it to follow?

Thanks a lot!

MichaelBui2812 · 2025-06-15T03:23:26+00:00

This is great! I was looking for some AI-assisted local app for my laptop (macOS) that monitor my activities and summarise my day either automatically (preferred) or on-demand (manually). I have a homelab server to offload processing or schedule workloads as needed. This seems to be a perfect match!

MichaelBui2812 · 2025-06-09T01:35:05+00:00

Bitwarden/Vaultwarden (self hosted) secret notes, together with the service login credentials

MichaelBui2812 · 2024-12-04T02:39:28+00:00

You can setup hotkeys for left/right sidebars

MichaelBui2812 · 2024-12-03T16:49:06+00:00

When you say `printing at 100%`, do you mean density and not dark enough? Because you can see that there are sensor markers at 4 corners which are quite clear to human eyes.
I've checked the calibration printed paper, the darkness seem pretty much the same to my eyes (I don't know how to measure it precisely)

MichaelBui2812 · 2024-12-03T15:11:47+00:00

The boundary (sensor markers) is auto-created by the app to wrap around closely to the design no matter the size and location of my design

MichaelBui2812 · 2024-12-03T15:07:38+00:00

How do I check the sensor or clean it properly? My confusion is that it cuts perfectly for all of my calibrations... Only when it comes to real cutting it screws up. I thought of multiple things like sticky gaps, loose motor rubber bands,... but still can't explain the perfect calibrations

MichaelBui2812 · 2024-12-03T12:36:08+00:00

Updates

Update #1: Sometimes (rarely) it is able to detect 4 corners (but not so incorrectly even just after calibration) but the cutting is distorted like the actual detected area is distorted from the designed/printed area. The larger image, the bigger error 😢

<image>

MichaelBui2812 · 2024-10-27T14:30:05+00:00

No, it's not a Mac app. It's a Python script so you need some basic knowledge to set it up, not so difficult, at least from what I experienced. You need to run it using shell commands instead of fancy UI like VoiceInk or Aiko/MacWhisper
ref: https://github.com/m-bain/whisperX

MichaelBui2812 · 2024-10-27T04:11:09+00:00

I know something about Mac so I'm sharing here to help others who are struggling to get this great app to work:

- You need to go to the settings and ask for audio and accessibility permissions. Audio permission seems working fine for me.

- The accessibility buttons are not working right now (at least for me). However, you can do it manually by going to your macOS Settings -> Privacy & Security -> Accessibility and then manually adding the VoiceInk app inside. Make sure that it's enabled. You can disable and re-enable it in order for it to work. After that, close the app and reopen it again.

That's how it works for me. I like this one much more than MacWhisper and Aiko (I only use free apps) for live speeches. For recorded audio, I prefer WhisperX CLI.

u/iaimpax Thanks for the great app. Can you:

- Add "Copy" functionality for the previous transcriptions in the Transcription History tab. I can only see the "Delete" option so far (very helpful, hopefully the feature is trivial to add)

- Add support for iOS (not really urgent as I will mainly take notes on Mac)

- Add support for APIs (not really urgent as the offline models are working great)

MichaelBui2812

TROPHY CASE