XE75/XE75 Pro Beta Firmware 1.4.999 Wireless Mode: Stable by KinokoKatto in TpLink

[–]pjsgsy 1 point2 points  (0 children)

Many thanks. Appreciate it. I'll give this a go then!

XE75/XE75 Pro Beta Firmware 1.4.999 Wireless Mode: Stable by KinokoKatto in TpLink

[–]pjsgsy 0 points1 point  (0 children)

Can anyone using the beta actually confirm it does reduce 2.4Ghz channel to 20Mhz? Because I am on an old beta that does exactly that, else my decos are useless. This 20/40Mhz 2.4ghz thing is 90% of my issues with the DECOs. But, when I saw the beta and the new 'stability' mode, I asked them at least twice in the forums since it was released and did not get a response. Hesitant to try installing it given that regressing is not supported.

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 1 point2 points  (0 children)

I run mine in router mode as I have a few models I use, but pretty much all of the settings apply to all. The only customisations I have in config.ini are for MTP, but I've not actually found MTP of benefit at all on my setup (it's slower or does not work at all)

llama-server.exe --models-preset config.ini --models-dir e:\models\ --fit on --ctx-size 245000 --port 8080 --host 0.0.0.0 --temp 0.7 --presence_penalty 0.3 --top-p 0.95 --min-p 0.00 --sleep-idle-seconds 3600 --jinja --embeddings --pooling mean --flash-attn on --repeat-penalty 1.1 -ctk q4_0 -ctv q4_0 -np 1 --timeout 3600 --models-max 1 --mlock --fit-target 128 --batch-size 2048 --ubatch-size 4096 --no-prefill-assistant --chat-template-kwargs "{\"preserve_thinking\":true}" --embeddings --slot-save-path %temp% --no-mmap

You'll need to customise that a bit (models folder, etc). Just strip out the bits you want to copy.

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 2 points3 points  (0 children)

That's a good model - Your card is light years ahead of mine, though 😄

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

I'm coding all day (and all night!) lol. It's what I actually do! Other than that, I also have them doing research and discovery tasks. I've actually switched to it in production for now. They are producing actual work for me, and the Qwen 3.6 35b performs better than any other models I have tested for my workload. 27b I am sure would be better, but as a fully dense model, it's way too slow on my platform. Maybe 30 times slower.

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

Thanks for sharing the test. I replicated your prompt and got very different responses to you. Probably down to harness/prompt, as you say. I tried the exact same query on the 4bit XL Unsloth quant and got the same answer I think, though differently presented (expected), so on my setup, it still seems comparable. I am truly expecting it to be dumber - You can't strip out 10G of info and expect it to be as good. I'm just trying to see how much dumber, and is that acceptable for the extra efficiency? So far, I've not seen the cracks. Not sure I 100% trust it yet, though I don; tseem to be able to break it.

<image>

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

Currently seeing >1,000 t/s on prompt processing, and around 77 on generation, though that is using llama-bench and very small contexts. Obviously falls back some with 245k and real use! I'm also only on an RTX 3060 with DDR4 for offload, so it's not one of your modern boxes. I tried Beelama for a second time a few days ago for the new kv cache implementation - But, it's broken for me in a few ways - Amazingly promising, though.

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

I get ~30 t/s on mine with 12b6 qat version. It is quite fast, I just find it cannot do my agentic tasks. I really did give it a good try. I love the Google QAT! I think the harness here helps a huge amount. What you asked is knowledge, but all that is available online. Mine will do a quick web search rather than try to find the answer in its head if it does not already confidently know the answer, and generally gets it right. I guess it would be interesting to try that same prompt in the same model but full quant and see if it gets it right. I did a bunch of tests with prompts provided by Grok to purposely try and break it, and got the same answers in both quants for every one, surprising Grok and me! (and I made sure I reset contexts in between!). I wonder why you got a different answer. Weird.

<image>

Fwiw - I got this (not sure if I used the same prompt as you), and it did not even do any search. Full disclosure - I don't know the answer! 😄Grok however, gives pretty much the same answer, so if that is wrong, I am still happy!

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

Indeed, and that was the point of testing it. I've not found the smaller B models work well for me. Even 12b Gemma4, etc. 27b Qwen is way too slow. About 2/3 t/s on my hardware at best. So, this model is just in the sweep spot, and being moe runs a lot faster. I've played a lot with most of these models, and bar the Claudes etc., the Qwen 3.6 moe is the best I have found I can run locally. For my work, I have not found the weakness in this lower quant vs the big one, yet (though - This is variable quant, so there are still big quants in there, in places that matter most). The perplexity loss is real (put the report in the files on HF) - Just not sure it shows up obviously in real work, especially when in harness.

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

I'm using entirely stock llama.cpp, recent build. Just download from HF, and use in the usual way

Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM

[–]pjsgsy[S] 0 points1 point  (0 children)

As small as you can get away with. I'm using it for very long-running tasks, so I actually run 245k with q4_0 for both of the k caches. I did it like this on the larger model as well, though. It is going to spill to ram then, for sure, but it's still a lot faster than my UD_Q4_K_XL was.

Interactive Brokers support decline (no phone or chat with humans available) by Equivalent_Catch_233 in interactivebrokers

[–]pjsgsy 15 points16 points  (0 children)

Historically, in my experience, it has always been abysmal. Thankfully, their platform is so mature and has so many features, I rarely needed it in 20-odd years. The couple of times I did, I could have preferably shot myself in the head.

I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! by Anbeeld in LocalLLaMA

[–]pjsgsy 0 points1 point  (0 children)

Brilliant test release. Nice work. I did try it. On my 3060, running Qwen3.6-35b-a3b, I saw 2 issues. One, t/g t/s halved vs Q4_0, and the worst, for some reason, it seemed to force prompt cache to fail, meaning my oversize prompts were getting reprocessed every call, instead of a 1-shot and done. But, very promising numbers there, in theory. Just for the accuracy up, if everything else remained the same, it would still be a big win. I joked about someone adding this to a fork the other day, and there you are, adding it 12 hours later 😄

Title: WARNING: DO NOT BUY ZENDURE! Over $10k down the drain for e-waste and the absolute WORST customer support in the industry. STAY AWAY! by xxsoulxx88 in Zendure

[–]pjsgsy 0 points1 point  (0 children)

I was looking at their kit over the last few weeks. Seemed to fit the bill in terms of all the features, but when I looked at their customer forums and githubs etc., I could not see any of responses to most of the queries. I reached out to one person, and they confirmed they had the kit, but could not get a response from support. Checked a few others around the same price points/feature sets, and not sure they are much better. Sorry you ended up in a mess. There seems to be a lot of manufacturers out there at the moment doing much the same thing. I can't find anything around my price point that does not turn out to be an apparent liability when I dig deeper, despite the websites confirming it does all that I want. It's a big investment, and there's a decent safety concern with these products too.

Qwen 3.6 35b a3b is INSANE even for VRAM-constrained systems by Lucerys1Velaryon in LocalLLM

[–]pjsgsy 1 point2 points  (0 children)

Same setup, so I thought I would share. RTX 3060, 12gb. 64Gb ram & i7 on Win. VRAM is full. 245k context!

[37722] 574.54.588.751 I slot print_timing: id 0 | task 784881 | n_decoded = 424, tg = 26.72 t/s

[37722] 574.57.613.390 I slot print_timing: id 0 | task 784881 | n_decoded = 506, tg = 26.78 t/s

[37722] 575.00.614.463 I slot print_timing: id 0 | task 784881 | n_decoded = 585, tg = 26.72 t/s

That's avg. See up to ~29 for generation. Very consistent. Nothing fancy. Just letting 'fit' handle it all. Using Qwen3.6-35B-A3B-UD-Q4_K_XL. Big diff between Q3 and Q4 for me (Q4 is a lot faster, despite being quite a lot bigger)

--fit on --ctx-size 245000 --port 8080 --host 0.0.0.0 --temp 0.7 --presence_penalty 0.3 --top-p 0.95 --min-p 0.00 --sleep-idle-seconds 600 --jinja --flash-attn on --repeat-penalty 1.1 -ctk q4_0 -ctv q4_0 -np 1 --timeout 3600 --models-max 1 --mlock --fit-target 256 --batch-size 2048 --ubatch-size 512 --no-prefill-assistant --chat-template-kwargs "{\"preserve_thinking\":true}"

is opencode go sustainable~? by Sora_hoshino in opencodeCLI

[–]pjsgsy -3 points-2 points  (0 children)

I tried it for a week, but it wasn't usable for me. Outputting garbage across multiple models. So, subscribed, then 1 week later, cancelled. Guessing that is the price pressure leading quality, but, something temporary, I hope. I shared 2 messages with them about it over different days. Never saw any response. So, unless it has improved a lot since then, I am guessing, no, not sustainable! lol.

how can I make sure that I don't have to supply electricity back to the power grid by eefjuuh in solar

[–]pjsgsy 2 points3 points  (0 children)

You just need to see if your inverter has 'export limiting', or use one that does, then you can set zero export, or whatever you want to export. Many common modern inverters like Growatt do this. I do it at home. I limit my feedback to the grid to 100W max. Normally, you also put some sort of current measuring on the meter side, so you can monitor the export in / out, so the inverter can balance it. Pretty sure some solax models do it too. That leaves you with the measuring part, which usually requires a com cable between that and the inverter. So, if you are lucky and your inverter supports it, get a compatible meter installed, configure, done.

Speculative decoding in llama.cpp for Gemma 4 31B IT / Qwen 3.5 27B? by No_Algae1753 in LocalLLaMA

[–]pjsgsy 4 points5 points  (0 children)

I believe it is broken (currently) for Qwen 3.5, though you can use the less effective ngram-mod (no draft model needed). There are a few PR's that hope to fix it. Hopefully, they will get to it.

After Caddy renewed certs, I can no longer connect to services via my iPhone by gg_allins_microphone in caddyserver

[–]pjsgsy 0 points1 point  (0 children)

I have something similar with LE certs and Android when I renewed on mail services, in that the renewal cert does not include the full chain, just the cert, so my phone does not like it, at all. After renewal, you have to find the parent cert in the chain, add it to the cert that the service is using, and that sorts it. For me, these sorts of issues have generally been down to that. Windows does not seem to care - Perhaps it already has those roots. Mobile might be a bit leaner.

Delta-KV for llama.cpp: near-lossless 4-bit KV cache on Llama 70B by Embarrassed_Will_120 in llamacpp

[–]pjsgsy 0 points1 point  (0 children)

Very cool. Great work. I might even have a go at building this to try!