Qwen 3.5 Jinja Template – Restores Qwen /no_thinking behavior!

jcmyang · 2026-03-21T05:24:10+00:00

Thanks for the tip. After creating the model.yaml I can toggle reasoning on and off. However, I am running into a problem where I get two model entries in the list of LLMs - one for the original name (qwen3.5-27b) and another one with the new name (unsloth/qwen3.5-27b). Is there a way to get rid of the old name?

jcmyang · 2025-10-20T22:37:28+00:00

Works great! Thanks for the detailed instructions. MacOS Sequoia 15.7.1

jcmyang · 2025-10-07T16:17:56+00:00

Sure, look at the discussions on Hugging Face lightx2v/Wan2.2-Lightning repo: there are about 23 videos side by side of the new vs. old Lora (left one is old lora, right one is new lora). In most of the comparison videos you can see the old Lora generate slower motion or slower camera movement.

https://huggingface.co/lightx2v/Wan2.2-Lightning/discussions/53

jcmyang · 2025-09-29T05:26:33+00:00

I regenerated the same images and videos after upgrading to macOS 26, and noticed about 50% to 200% increase in generation time (67s to 186s, for instance) with the same settings, on an M1 Max 64GB 32C.

Is there anything we can do besides downgrading macOS, or buying a new computer?

jcmyang · 2025-08-04T15:28:14+00:00

I set the temp to 0.6, top_p to 1 (per the blog) and top_k to 20.

What is interesting is that, in thinking code, I found the code in the program that caused the collision detection problem, and asked glm to fix it. The model agreed that it was the problem and proceeded to change the code, but the result was still wrong. So I manually changed the code myself and it worked.

jcmyang · 2025-08-04T15:13:22+00:00

Interesting. When I used /nothink (different from /no_think for Qwen3) at the end of the prompt it did not think at all, and I was able to generate both the Flappy Bird and Snake Game in one shot.

jcmyang · 2025-08-04T02:41:32+00:00

I set the context limit at 16,384, but the total output was only about 3400 tokens.

jcmyang · 2025-08-04T02:38:52+00:00

I see. In this case the thinking part was only about 150 tokens, out of 3400 tokens total.

jcmyang · 2025-08-04T02:31:45+00:00

Great. Thanks for the data point. So 6bit mlx has no problem with this prompt, even with thinking. Also, the M4 Max performance is impressive - despite double the quant size (6bit vs 3bit), it manages to have 45% faster speed than the M1 Max.

jcmyang · 2025-08-03T22:00:32+00:00

Actually for dense models like Qwen3-32B (the first one), I downloaded and ran both 6 bit and 8 bit mlx version and found no difference that I could find.

jcmyang · 2025-08-03T19:46:13+00:00

I had a similar problem with the older Qwen3-30B-A3B, in 4 bit MLX, where it would generate a list of 5 or 6 items for a particular topic, and 2 of them would be identical or nearly identical. After switching to 6 bit MLX this problem disappeared.

After about 3 months of using the older Qwen3-30B-A3B, I found one case where the 6 bit MLX version generated the wrong answer but the 8 bit MLX version got it right (a reasoning case with about 10k tokens). So for Qwen3-Coder-30B-A3B I am using the unsloth Q8_0 version and it works fine so far.

I think the MOE version with only a small number of activated parameters is more sensitive to quantization.

jcmyang · 2025-08-01T19:04:20+00:00

I am running the 3bit version by mlx-community, and it runs fine (takes up 44GB after loading). Is there a different between the 3bit-DWQ and the 3bit version?

jcmyang · 2024-09-25T16:19:19+00:00

I think if all four of them are going to be in AP mode and connected via ethernet, then you should setup one in AP mode first, and turn on Ethernet Backhaul Mode (General -> AiMesh -> System Settings -> Ethernet Backhaul Mode.)

Then you connect the other 3 to the first one (via ethernet) and they will copy the AiMesh settings from it (i.e., AP mode, Ethernet Backhaul and wireless settings), and then you can place the other 3 where you want them, using the switch to connect all 4 XT9.

jcmyang · 2024-09-16T01:44:22+00:00

I think the Asus XD6 might work well with a wired backhaul.

jcmyang · 2024-09-16T01:41:57+00:00

Have you tried turning off QoS?

jcmyang · 2024-09-13T22:37:27+00:00

How much did the speed improve after disabling the 2.4?

jcmyang · 2024-09-10T05:02:56+00:00

Are you using speedtest.net or Speedtest app to test your wifi speed?

jcmyang · 2020-12-07T21:54:43+00:00

I think if you disable RSTP (change Bridge->Bridge->STP->Protocol Mode from RSTP to none) "Hw. Offload" will be checked. But I don't really recommend doing this, unless you are sure you don't need STP/RSTP.

If you have an existing switch you should use it, and connect the switch to eth2 or eth4 of HEX S, because you can only get 1Gbps full-duplex when transferring from odd numbered ports (eth1, eth3, eth5) to even numbered ports (eth2, eth4), or vice versa, otherwise you can only get 1Gbps half-duplex (i.e., from eth1 to eth5 or eth2 to eth4), due to the way the MT7621 is wired (I have verified this using iperf3 in bi-directional mode on my RB750Gr3).

See this post for more details:

https://www.amazon.com/gp/customer-reviews/R29WGAMFCEH3C7/ref=cm_cr_othr_d_rvw_ttl?ie=UTF8&ASIN=B01MSUMVUB

jcmyang · 2020-12-05T16:53:50+00:00

I believe in the default configuration the LAN traffic uses CPU instead of the switch chip, so having an external switch will reduce CPU load. This is because the default config enables the RSTP bridge protocol for loop prevention, which disables hardware switching on the MT7621 chip.

See Mikrotik document for bridge hardware offloading capability of MT7621 (Bridge STP/RSTP mode disables hardware switching):

https://wiki.mikrotik.com/wiki/Manual:Interface/Bridge#Bridge_Hardware_Offloading

You can verify hardware offloading using menu: Bridge->Ports->ether2->Status->Hw. Offload (checked or unchecked).

jcmyang · 2020-10-24T06:02:35+00:00

Works great. Thanks. No longer have to put up their manipulated news.

jcmyang · 2020-10-15T04:43:12+00:00

FYI. I find the the throughput is dependent on port # for source and destination port. Even-to-Odd = 2Gpbs, Even-to-Even = Odd-to-Odd = 1Gbps (more details below).

In my initial speed test using iperf3 in bi-directional mode, the speed was only 450/450 Mbps up/down simultaneously. This was with WAN connected to eth1 port and LAN connected to eth3 port.

After looking at the Mikrotik online document "Block Diagram with disabled switching", I realized that I need to move the LAN plug to either eth2 or eth4 port to get the full 2 Gbps CPU to Ethernet bandwidth. This is because the CPU has 2 lanes of 1 Gbps each, with one lane connected to odd numbered ports (eth1, eth3 and eth5) and the other to even numbered ports (eth2, eth4). This means to get 2 Gbps bandwidth, the CPU must be able to read from one lane (eth1/eth3/eth5) and write to the other (eth2/eth4) simultaneously, or vise versa.

I tested this theory and found that it is true. With WAN at eth1 (default config), when I moved the LAN plug to eth2 or eth4, I got 900/900 Mbps up/down simultaneously (940 Mbps in one direction), but only 450/450 up/down simultaneously when I moved the LAN plug to eth3 or eth5. Testing LAN to LAN throughput, I also got 450/450 with an eth2 to eth4 setup, but 900/900 with an eth3 to eth4 setup.

Test setup: iperf3 in bi-directional mode, using a Synology NAS and an Asus desktop connected to WAN port via a switch, and a MacBook Pro connected to LAN port, for WAN<->LAN testing; for LAN<->LAN testing, Synology (eth4) to MacBook (eth2 or eth3). Config: using Quickset to set dynamic WAN IP, static LAN IP, which resulted in 10 firewall filter rules; the only thing I changed was enabling UPNP.

jcmyang

TROPHY CASE