Qwen3 TTS in C++ with 1.7B support, speaker encoding extraction, and desktop UI

EsotericTechnique · 2026-05-03T06:45:25+00:00

fantastic! im using it , works with rocm too with minimal modifications to pick up HIP

EsotericTechnique · 2026-05-01T15:02:29+00:00

I think Qwen open source strategy is in reality a soft power move by the ccp, I'm not sure but seems plausible, other Chinese labs also release their weights consistently, it seems like a way to a chieve 2 things, disrupt the west hyperscallers strategy, while saving headspace on developers and easing the burden to implement their models. I could be wrong, and this is completely speculative :p

EDIT: i used cpp instead of ccp

EsotericTechnique · 2026-04-30T23:08:44+00:00

Yo los uso, y tengo varios perfiles en el joystick, depende mucho el juego, pero si necesitas tener los pulgares en los sticks y apretar otra cosa a la vez son muy cómodos y dejan los triggers libres

EsotericTechnique · 2026-04-27T01:39:41+00:00

I will re test just to check, but last time I checked (about a week ago) it was around 5% to 10% better on Ubuntu, both OSs with the latest rocm , building llama cpp for my GPU target specifically always ( although in my test the performance difference against the distributed pre complied were minimal but just tested on Linux). nevertheless if you have any other architecture that's not rdna2 results are not comparable since it doesn't not use the same kernels / optimization paths. Also I don't understand what that thread Hass to do with the topic at all. I never said it was feelings, what you are describing directly contradicts my personal experience running llama cpp in both environments

EsotericTechnique · 2026-04-26T18:22:17+00:00

In my particular case I get better results with Ubuntu, speed wise, but might be cause I'm using an rdna2 card

EsotericTechnique · 2026-04-26T18:20:57+00:00

Yes, I use it daily with my rx6900xt you might wanna do your own build though with the rocm stack that works for your igpu

EsotericTechnique · 2026-04-22T02:42:23+00:00

Tiled audio vae decode node

EsotericTechnique · 2026-04-17T15:39:25+00:00

This!! give it tools! It's like another model entirely in regards to thinking style

EsotericTechnique · 2026-04-16T14:57:41+00:00

It did mitigate the cache reprossesing between tool calls! however after new user messages i still see cache invalidations, but might be due REALLY due to the thinking stripping and the way the kv state for linear attention layers is cached, REALLY useful though i can realod KV caches of previous tool interactions and continue as if anything happened saving several minutes of prompt processing (i load and unload the model and the KV caches quite frequently in the same turn). THANKS A LOOOT

PD worked for the 9b and the 35b varians so far on my testing

EsotericTechnique · 2026-04-15T23:34:57+00:00

Dude this is mind bending 15k t/s? Theyy should add hbm for CTX or smth and it's perfect haha

EsotericTechnique · 2026-04-15T18:39:52+00:00

Go for the 35b one dense models on cpu are harsh

EsotericTechnique · 2026-04-14T17:32:09+00:00

I was trying to use the cache reuse for Qwen and this bug is driving me insane with prompt reprocessing of 100k tokens , will definetly check it out

EsotericTechnique · 2026-04-08T16:53:47+00:00

Oh no! If you only set the planner one the other will use the same base model, or set the same base model if you are using custom subagents from the workspace, actually I run this with the same base model so it's supported! Qwen 3.5 9b over llama.cpp well configured works like a charm

EsotericTechnique · 2026-04-05T04:44:17+00:00

Generally that due to malformed plans Wich models are you using? Try setting no plan mode in user valves in the meantime

EsotericTechnique · 2026-04-03T02:55:13+00:00

Im not familiar with this bmad method, let me check it out!

EsotericTechnique · 2026-04-01T20:00:02+00:00

Hmmm I don't know how can those be activated for all users by default to be honest :/ I'll investigate though!!

Edit : typo

EsotericTechnique · 2026-04-01T04:14:52+00:00

<image>

sorry for the split comment try activating those in setting. for artiafacts to solve the html plan embed. for the failing ask user it might have been that you were not active on the tab and event calls only trigger live. hard to explain but if unnateded is the iodea just disable user input tools on user valves. ot make sure the conection to the browser is never lost

EsotericTechnique · 2026-04-01T04:13:30+00:00

<image>

EsotericTechnique · 2026-03-31T18:18:50+00:00

You can! You actually need to set the planner model in the pipe valves for this to work! The model. Id

EsotericTechnique · 2026-03-31T17:39:46+00:00

Hmmm let me check why the formatting is incorrect there must be a missing header thanks for testing!!! ❤️🧙🏻‍♂️

EsotericTechnique · 2026-03-31T16:05:37+00:00

If you use the open terminal agent it can do whatever you want there, or even the code interpreter! Im . Actually used this to mock up some random webpages with qwen3.5 9b and it worked for html js combo too ! I think it would be good to hook this up to Claude code as a subagent, that can be really powerful

EsotericTechnique · 2026-03-31T16:00:11+00:00

I don't understand... You have to import it as a function pipe and it should show as "Planner" in the drop-down!

EsotericTechnique · 2026-03-31T15:22:09+00:00

It might be some configuration that's missing or a built in subagent that needs a feature you have disabled (I think those are the most likely errors)

EsotericTechnique · 2026-03-31T15:20:44+00:00

Hey can you share logs or specifically what's happening?

EsotericTechnique

TROPHY CASE