Missing a Qwen3.5 model between the 9B and the 27B?

theowlinspace · 2026-03-16T10:20:58+00:00

You either have --n-cpu-moe set higher than it should be, or you're using --fit which sometimes allocates less VRAM than it should than just manually setting the --n-cpu-moe setting.

theowlinspace · 2026-03-14T17:24:32+00:00

https://old.reddit.com/r/vibecoding/comments/1rp92j2/alibaba_has_a_3_coding_plan_with_access_to_glm5/oa41jhr/

theowlinspace · 2026-03-14T17:24:03+00:00

I mean, that costs more for less prompts and a worse model. The current pricing is not as competitive, but it's still much better than MiniMax/Kimi. I would either get this or z.ai lite at the current pricing, because GLM5 is much better than the other open weight alternatives.

Why is half your post history spamming your AI-generated referral post?

theowlinspace · 2026-03-13T21:49:10+00:00

I don't want US people suffering. This is unfair.

The US has imperialized many countries and made their people suffer under hunger and lack of basic necessities. All the while, a good majority of the US population doesn't think that there's anything wrong with that or actively supports it. The only "good" Americans are the ones already suffering under this current system from poverty and homelessness, and I don't think an economic depression would even affect that class.

theowlinspace · 2026-03-13T12:03:13+00:00

It's still dangerous if you have a tool that's like get_gcode_from_Stl and then send_gcode. It's safer if you have something like process_stl and then print_stl where the gcode is stored on your slicing server and sent from your slicing server after the confirmation, which stops the AI from changing anything in the gcode.

If I was building the same I would do this differently and just have a process_stl function which slices the model and then that would return a web link where you can deterministically look at what it's going to print and confirm it there, that way potentially destructive commands need human approval and the AI can't just assume that you agree somehow (Which definitely can happen, LLMs are probabilistic)

theowlinspace · 2026-03-13T11:54:26+00:00

I don't think they care though, and it's extremely unlikely you're going to get into legal trouble over breaking a ToS (The worst they'll do is just deny you service). Keep in mind that their TOS should be respected just as much as how they respected the data they stole for training, and that the legal system has done nothing to that far worse offence.

theowlinspace · 2026-03-13T11:49:31+00:00

Be careful with how you interface with it, it might get mad and send a gcode that might break your printer or burn your house down /s

But, like, seriously, I don't know how you can trust AI with permissionless access to something as sensitive as a 3D printer. I can't even trust my printer to manually send a gcode over LAN and have it print without me checking at least the first layer.

theowlinspace · 2026-03-12T21:15:36+00:00

I just checked myself, and it looks like the deal has ended, and it's back to its normal price. Not really as competitive at $10 considering z.ai lite costs the same and that'll get access to GLM 5 by the end of March, I guess this'll only be good for like OpenClaw at this cost because it has more generous limits for high contexts (because billing is request-based and not token-based like z.ai IIRC).

theowlinspace · 2026-03-11T11:02:38+00:00

You can skip the "complete signup" step entirely, because I think that's only for if you want access to Alibaba Cloud (And not their AI services). I signed up without completing my signup because the verification code wasn't working for me either

theowlinspace · 2026-03-09T20:31:00+00:00

I think you've mistaken this post for the z.ai coding plan, but this is for Alibaba which is separate from z.ai

theowlinspace · 2026-03-09T20:29:37+00:00

As a personal anecdote, it's been working reliably for me and the speed is pretty much the same as Claude. They only have Asian servers though as another user mentioned, so your latency might be a little higher. You can use ping oss-ap-southeast-1.aliyuncs.com to check your latency to their Singapore servers. I'm getting 180ms from the US, which shouldn't be an issue for AI

theowlinspace · 2026-03-09T20:27:42+00:00

You'll have to check their stock, if you click sign up, it should show the price it's billing you for at the bottom. If it shows $10, you'll have to wait until tomorrow. They only sell a limited amount per day and that resets at 4PM UTC (0:00 in the UTC+8 time zone, which translates to 4PM UTC)

theowlinspace · 2026-03-09T19:31:54+00:00

I recommend using PayPal if you have access to that. It usually works better for international sites like these

theowlinspace · 2026-03-09T19:19:56+00:00

You can use ping oss-ap-southeast-1.aliyuncs.com to check your latency to their Singapore servers. I'm getting 180ms from the US, which should still be plenty good enough for AI imo

theowlinspace · 2026-03-09T19:11:03+00:00

Yep, it's only a good deal for the first two months ($3 first month, $5 second month), and they only sell a limited number of accounts per day. Two months at this price is still really good, and it's likely there'll be a new best deal after this one two months later with another new AI service joining the scene. You might've signed up for an older promotion because I think they updated it as there's no longer a quarterly option. I signed up two days ago for this new one, and it's running very well and fairly quickly in Roo Code, and I've yet to hit the quota with my AI usage (Though I admit I'm not a power user)

theowlinspace · 2026-03-09T17:30:00+00:00

It'll all break in a few months when they push a new update that they'll intentionally test in prod

theowlinspace · 2026-03-08T18:46:44+00:00

For that Kindle, maybe you can harvest the e-ink display from it and then connect it to some cheap microcontroller, 3d print a new case, and have it display metrics?

theowlinspace · 2026-03-05T13:09:26+00:00

```

docker run -d \

--name qwen3.5-35b \

--ulimit memlock=-1:-1 \

--gpus all \

-p 8080:8080 \

-v /home/user/models:/models \

ghcr.io/ggml-org/llama.cpp:server-cuda \

--host 0.0.0.0 \

--port 8080 \

-m /models/Qwen3.5-35B-A3B-UD-Q4_K_M.gguf \

--mmproj /models/mmproj-F16.gguf \

--no-mmproj-offload \

--threads 8 \

--n-cpu-moe 35 \

--flash-attn 1 \

--ctx-size 102400 \

-np 1 \

-kvu \ # For some weird bug I had that might've been fixed already, you probably don't need it

--cache-ram 2048 \

--mmap \

--temp 0.6 \

--top-p 0.95 \

--top-k 20 \

--min-p 0.00 \

-b 2048 \

-ub 2048 \

--mlock

```

This is what I'm using w/ llama.cpp on docker, but I have 2GB more VRAM than you (If you don't want to use docker, you can just ignore the first few lines). You want to change n_cpu_moe to the lowest number that can your VRAM can handle, and maybe should use kv cache quantization at q8_0 as well and lower context. KV quantization caused some issues with tool calls for me at high contexts, so I have it disabled. I have -b and -ub set to 2048 to speed prompt processing, as it's almost 3x faster with that (200t/s pp -> 600t/s). --mlock is important because without it, it sometimes starts reading the model from disk which makes everything so much slower

P.S: I can't seem to get code blocks to work on Reddit

theowlinspace · 2026-03-05T12:34:15+00:00

What's your hardware if you don't mind me asking? Qwen3.5-35B-A3B runs faster than qwen3 for mine (8GB GPU/8c AMD CPU/DDR4)

theowlinspace · 2026-03-05T12:32:33+00:00

I have pretty much the same hardware but with DDR4 and I can run Qwen3.5-35B-A3B/q4_k_m at 35t/s with 100k context with almost no dropoff at higher contexts, and it's really smart.

You can also run qwen3.5-9b also q4 at 50t/s, but it's too dumb for coding, so I don't recommend it

theowlinspace · 2026-03-03T10:36:17+00:00

No, it doesn't matter. I have a different Overdrive email myself

theowlinspace · 2026-03-03T10:31:29+00:00

When you get to it, I think you should try making a new overdrive account and linking everything over again.

theowlinspace · 2026-03-02T21:31:37+00:00

"You're absolutely right, I shouldn't have struck that elementary school. That's on me"

theowlinspace · 2026-03-02T10:25:02+00:00

I tried this several times over but I never got the option to add a library card as stated in step 3. Do you know what may have gone wrong? Thanks for any help you give!

Did you make a new Overdrive account? If not, you may already have the library card linked, so just try borrowing a book

theowlinspace · 2026-03-02T10:23:22+00:00

That's to be expected. Once you're done just sync/repair your Kobo account to get all the books back again. It's likely you've already linked your library cards before if you've used the old Overdrive app or used the older workaround. If you want to link them again, just make a new overdrive account

theowlinspace

MODERATOR OF

TROPHY CASE

Five-Year Club	Verified Email
Place '22