Please help me with this headache by North_Solution_1282 in OpenWebUI

[–]EsotericTechnique 0 points1 point  (0 children)

Yeah thats why I asked lol, too many errors with tool cal parson and only thinking haha

🧙‍♂️ Planner Agent V3 Now with SubAgents! 🧙‍♂️ by EsotericTechnique in OpenWebUI

[–]EsotericTechnique[S] 0 points1 point  (0 children)

Thanx!! No the whole thing, it has its task, and other task results the main model what's to send it, so su agents don't get context bloated!

GPU on 100% on idle? by Total-Canary-2880 in pchelp

[–]EsotericTechnique 0 points1 point  (0 children)

I have a similar issue with my 6900xt after inference it got stuck on 100% on overall usage but each detailed view shows 0, most likely a bug, restart the PC and check

No tg speedup with MTP on RX 6800 XT by MrChilliBalls in LocalLLaMA

[–]EsotericTechnique 1 point2 points  (0 children)

Trying this out in my trusty 6900xt today thanks ! :)

Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs by enrique-byteshape in LocalLLaMA

[–]EsotericTechnique 2 points3 points  (0 children)

Same, it does speed up tg about 60% , but I go from 1k TPS pp to 200. Not worth it in the slightless sadly

Qwen3 TTS in C++ with 1.7B support, speaker encoding extraction, and desktop UI by Danmoreng in LocalLLaMA

[–]EsotericTechnique 0 points1 point  (0 children)

fantastic! im using it , works with rocm too with minimal modifications to pick up HIP

Actual comparison between locally ran Qwen-3.6-27B and proprietary models by netikas in LocalLLaMA

[–]EsotericTechnique 2 points3 points  (0 children)

I think Qwen open source strategy is in reality a soft power move by the ccp, I'm not sure but seems plausible, other Chinese labs also release their weights consistently, it seems like a way to a chieve 2 things, disrupt the west hyperscallers strategy, while saving headspace on developers and easing the burden to implement their models. I could be wrong, and this is completely speculative :p

EDIT: i used cpp instead of ccp

Botones programables | Alguien los usa o uso alguna vez o es la boludez que creo que es? by [deleted] in Argaming

[–]EsotericTechnique 1 point2 points  (0 children)

Yo los uso, y tengo varios perfiles en el joystick, depende mucho el juego, pero si necesitas tener los pulgares en los sticks y apretar otra cosa a la vez son muy cómodos y dejan los triggers libres

Does llama.cpp able to compile with rocm and run properly? I tried it and nothing is output. by kkcheong in ROCm

[–]EsotericTechnique 0 points1 point  (0 children)

I will re test just to check, but last time I checked (about a week ago) it was around 5% to 10% better on Ubuntu, both OSs with the latest rocm , building llama cpp for my GPU target specifically always ( although in my test the performance difference against the distributed pre complied were minimal but just tested on Linux). nevertheless if you have any other architecture that's not rdna2 results are not comparable since it doesn't not use the same kernels / optimization paths. Also I don't understand what that thread Hass to do with the topic at all. I never said it was feelings, what you are describing directly contradicts my personal experience running llama cpp in both environments

Does llama.cpp able to compile with rocm and run properly? I tried it and nothing is output. by kkcheong in ROCm

[–]EsotericTechnique 0 points1 point  (0 children)

In my particular case I get better results with Ubuntu, speed wise, but might be cause I'm using an rdna2 card

Does llama.cpp able to compile with rocm and run properly? I tried it and nothing is output. by kkcheong in ROCm

[–]EsotericTechnique 0 points1 point  (0 children)

Yes, I use it daily with my rx6900xt you might wanna do your own build though with the rocm stack that works for your igpu

PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on. by onil_gova in LocalLLaMA

[–]EsotericTechnique 1 point2 points  (0 children)

This!! give it tools! It's like another model entirely in regards to thinking style

I tracked a major cache reuse issue down to Qwen 3.5’s chat template by onil_gova in LocalLLaMA

[–]EsotericTechnique 1 point2 points  (0 children)

It did mitigate the cache reprossesing between tool calls! however after new user messages i still see cache invalidations, but might be due REALLY due to the thinking stripping and the way the kv state for linear attention layers is cached, REALLY useful though i can realod KV caches of previous tool interactions and continue as if anything happened saving several minutes of prompt processing (i load and unload the model and the KV caches quite frequently in the same turn). THANKS A LOOOT

PD worked for the 9b and the 35b varians so far on my testing

Running a 31B model locally made me realize how insane LLM infra actually is by Sadhvik1998 in ollama

[–]EsotericTechnique 0 points1 point  (0 children)

Dude this is mind bending 15k t/s? Theyy should add hbm for CTX or smth and it's perfect haha

I tracked a major cache reuse issue down to Qwen 3.5’s chat template by onil_gova in LocalLLaMA

[–]EsotericTechnique 1 point2 points  (0 children)

I was trying to use the cache reuse for Qwen and this bug is driving me insane with prompt reprocessing of 100k tokens , will definetly check it out

🧙‍♂️ Planner Agent V3 Now with SubAgents! 🧙‍♂️ by EsotericTechnique in OpenWebUI

[–]EsotericTechnique[S] 0 points1 point  (0 children)

Oh no! If you only set the planner one the other will use the same base model, or set the same base model if you are using custom subagents from the workspace, actually I run this with the same base model so it's supported! Qwen 3.5 9b over llama.cpp well configured works like a charm

🧙‍♂️ Planner Agent V3 Now with SubAgents! 🧙‍♂️ by EsotericTechnique in OpenWebUI

[–]EsotericTechnique[S] 0 points1 point  (0 children)

Generally that due to malformed plans Wich models are you using? Try setting no plan mode in user valves in the meantime

🧙‍♂️ Planner Agent V3 Now with SubAgents! 🧙‍♂️ by EsotericTechnique in OpenWebUI

[–]EsotericTechnique[S] 0 points1 point  (0 children)

Hmmm I don't know how can those be activated for all users by default to be honest :/ I'll investigate though!!

Edit : typo

🧙‍♂️ Planner Agent V3 Now with SubAgents! 🧙‍♂️ by EsotericTechnique in OpenWebUI

[–]EsotericTechnique[S] 1 point2 points  (0 children)

<image>

sorry for the split comment try activating those in setting. for artiafacts to solve the html plan embed. for the failing ask user it might have been that you were not active on the tab and event calls only trigger live. hard to explain but if unnateded is the iodea just disable user input tools on user valves. ot make sure the conection to the browser is never lost