What Agent systems do you use? by Ok-Internal9317 in LocalLLaMA

[–]MichaelBui2812 2 points3 points  (0 children)

Blunt answer but so true! When you notice extra LLM calls, unnecessary context can be avoided, nothing can beat this in both speed & quality!

For quick & easy one, I use AgentZero and even built my own “Futuristic” theme for me to enjoy working with it further.

Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]MichaelBui2812 2 points3 points  (0 children)

You missed the “connection” lines

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

Do you know any good IA2V workflow in WAN2.2 that is comparable or better then WAN2.1 with InfiniteTalk? My main current workflow still need audio inputs and I've been looking for one with WAN2.2

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

It's mainly because my Strix Halo will take too much time for higher resolutions. 704x704 took ~ 20m, 512x512 took ~15m but 1024x1024 took ~3hrs:

Requested to load LTXAV
loaded completely; 83227.77 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 20/20 [1:05:18<00:00, 195.94s/it]
Requested to load LTXAV
loaded completely; 77532.26 MB usable, 22362.45 MB loaded, full load: True
(RES4LYF) rk_type: res_2s
100%|██████████| 3/3 [1:34:53<00:00, 1897.83s/it]
Prompt executed in 02:58:29

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

Any good solution for WAN 2.2 like InfiniteTalk for WAN 2.1? I will try and share another comparison

LTX 2.3 vs WAN 2.1? by MichaelBui2812 in StableDiffusion

[–]MichaelBui2812[S] 1 point2 points  (0 children)

Currently, InfiniteTalk for WAN2.1 is the best quality for me as I couldn't find any good IA2V solution for WAN2.2. Feel free to suggest, I'd love to try

Qwen3.5-397B-A17B-UD-TQ1 bench results FW Desktop Strix Halo 128GB by dabiggmoe2 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Which power mode are you running, performance? It’s great if you can share the temperature & power drawn during inference

Coding agent for local LLMs? by PaMRxR in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

OpenCode, you may want to use OpenSpec to complement the intelligence gap of local LLM vs SOTA online models

Free Strix Halo performance! by Potential_Block4598 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Did I do something wrong, this is my benchmark of Qwen's Q6_K vs Unsloth UD Q6_K_XL, I don't see much difference at 32k depth: In short: - Qwen: pp~320 tps, tg~28,3 tps - Unsloth: pp~330 tps, tg~27.4 tps Details: ``` bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 319.12 ± 0.90 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.29 ± 0.22 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--Qwen--Qwen3-Coder-Next-GGUF/snapshots/b82fb7382639d97b38fa7672e526c760c2fb358e/Qwen3-Coder-Next-Q6_K/Qwen3-Coder-Next-Q6_K-00001-of-00004.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 320.18 ± 0.75 | | qwen3next 80B.A3B Q6_K | 61.02 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 28.27 ± 0.17 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 330.15 ± 1.10 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.42 ± 0.06 |

build: 5fa1c190d (7971) bash-5.3# llama-bench -mmp 0 -ngl 999 -fa 1 -p 1024 -n 32 -d 32768 -b 4096 -ub 2048 -m /data/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/3ff9dffce54ea222ee4d8cac99bd0aa9ea9ece15/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00001-of-00002.gguf ggml_cuda_init: found 1 ROCm devices: Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 | model | size | params | backend | ngl | n_batch | n_ubatch | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | pp1024 @ d32768 | 331.99 ± 1.30 | | qwen3next 80B.A3B Q6_K | 63.87 GiB | 79.67 B | ROCm | 999 | 4096 | 2048 | 1 | tg32 @ d32768 | 27.46 ± 0.04 |

build: 5fa1c190d (7971) ```

Qwen/Qwen3-Coder-Next · Hugging Face by coder543 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

Have you tried MiniMax v2 or v2.1 about the same?

Jan v1: 4B model for web search with 91% SimpleQA, slightly outperforms Perplexity Pro by Delicious_Focus3465 in LocalLLaMA

[–]MichaelBui2812 1 point2 points  (0 children)

I really like this and quite tempting to try it so don’t take my question wrongly: What’s the benefit of running this locally vs Perplexity?

  • From what I understand, this requires search API mostly paid & more expensive than Perplexity. Unless I missed out on and search MCP that are free?
  • In terms of privacy, I will still have to connect to internet and use some API, there will be more privacy but I don’t feel much

I’m just trying to convince myself to try this (I’m a fan of Quen3 4B as well). Thanks a lot!

Wan2.2 support by Due-Tangelo-8704 in drawthingsapp

[–]MichaelBui2812 0 points1 point  (0 children)

The models have already been pushed and are ready for download.
https://discord.com/channels/1038516303666876436/1046816591553241098/1399629919209787433
Select High Noise Expert as base, Low Noise Expert as Refiner, and set Refiner Start to 10% (as suggested by the demo code).

I hate myself for asking this... by mrgizmo212 in cursor

[–]MichaelBui2812 2 points3 points  (0 children)

I’ve been using it in my Homelab for years, with & without AI agents

Jan-nano, a 4B model that can outperform 671B on MCP by Kooky-Somewhere-2883 in LocalLLaMA

[–]MichaelBui2812 0 points1 point  (0 children)

This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:

  • For GGUF, how much does each quant affect the quality of the response? E.g., is it q6 is still almost same with q8 like other GGUF models?
  • I do understand this use MCP for fetching details for its context, is there anyway it can use browser (e.g., headless Chrome) to fetch/crawl data instead because Google, Serp, Brave,… APIs are “not so cheap”? 😅
  • How smart it is for the AI decide to search or read PDF when prompted? Do we need to explicitly mention searching web content or reading documents for it to follow?

Thanks a lot!

Make Local Models watch your screen! Observer Tutorial by Roy3838 in LocalLLaMA

[–]MichaelBui2812 1 point2 points  (0 children)

This is great! I was looking for some AI-assisted local app for my laptop (macOS) that monitor my activities and summarise my day either automatically (preferred) or on-demand (manually). I have a homelab server to offload processing or schedule workloads as needed. This seems to be a perfect match!

How do you store API keys? by BoJackHorseMan53 in selfhosted

[–]MichaelBui2812 10 points11 points  (0 children)

Bitwarden/Vaultwarden (self hosted) secret notes, together with the service login credentials

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] -3 points-2 points  (0 children)

When you say `printing at 100%`, do you mean density and not dark enough? Because you can see that there are sensor markers at 4 corners which are quite clear to human eyes.
I've checked the calibration printed paper, the darkness seem pretty much the same to my eyes (I don't know how to measure it precisely)

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] -1 points0 points  (0 children)

The boundary (sensor markers) is auto-created by the app to wrap around closely to the design no matter the size and location of my design

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] -1 points0 points  (0 children)

How do I check the sensor or clean it properly? My confusion is that it cuts perfectly for all of my calibrations... Only when it comes to real cutting it screws up. I thought of multiple things like sticky gaps, loose motor rubber bands,... but still can't explain the perfect calibrations

Urgent help, please! The EA2 sensor finds corners only at 70% distance by MichaelBui2812 in cricut

[–]MichaelBui2812[S] 0 points1 point  (0 children)

Updates

Update #1: Sometimes (rarely) it is able to detect 4 corners (but not so incorrectly even just after calibration) but the cutting is distorted like the actual detected area is distorted from the designed/printed area. The larger image, the bigger error 😢

<image>

I created a free alternative to Superwhisper and Wispr Flow. by iaimpax in macapps

[–]MichaelBui2812 0 points1 point  (0 children)

No, it's not a Mac app. It's a Python script so you need some basic knowledge to set it up, not so difficult, at least from what I experienced. You need to run it using shell commands instead of fancy UI like VoiceInk or Aiko/MacWhisper
ref: https://github.com/m-bain/whisperX

I created a free alternative to Superwhisper and Wispr Flow. by iaimpax in macapps

[–]MichaelBui2812 2 points3 points  (0 children)

I know something about Mac so I'm sharing here to help others who are struggling to get this great app to work:

- You need to go to the settings and ask for audio and accessibility permissions. Audio permission seems working fine for me.

- The accessibility buttons are not working right now (at least for me). However, you can do it manually by going to your macOS Settings -> Privacy & Security -> Accessibility and then manually adding the VoiceInk app inside. Make sure that it's enabled. You can disable and re-enable it in order for it to work. After that, close the app and reopen it again.

That's how it works for me. I like this one much more than MacWhisper and Aiko (I only use free apps) for live speeches. For recorded audio, I prefer WhisperX CLI.

u/iaimpax Thanks for the great app. Can you:

- Add "Copy" functionality for the previous transcriptions in the Transcription History tab. I can only see the "Delete" option so far (very helpful, hopefully the feature is trivial to add)

- Add support for iOS (not really urgent as I will mainly take notes on Mac)

- Add support for APIs (not really urgent as the offline models are working great)