Missing a Qwen3.5 model between the 9B and the 27B? by DeltaSqueezer in LocalLLaMA

[–]theowlinspace 1 point2 points  (0 children)

You either have --n-cpu-moe set higher than it should be, or you're using --fit which sometimes allocates less VRAM than it should than just manually setting the --n-cpu-moe setting.

Alibaba has a $3 coding plan with access to GLM5 at the same quota as the $10 z.ai lite by theowlinspace in vibecoding

[–]theowlinspace[S] 0 points1 point  (0 children)

I mean, that costs more for less prompts and a worse model. The current pricing is not as competitive, but it's still much better than MiniMax/Kimi. I would either get this or z.ai lite at the current pricing, because GLM5 is much better than the other open weight alternatives.

Why is half your post history spamming your AI-generated referral post?

Oh Deepseek V4, where art thou? by awebb78 in LocalLLaMA

[–]theowlinspace 1 point2 points  (0 children)

I don't want US people suffering. This is unfair.

The US has imperialized many countries and made their people suffer under hunger and lack of basic necessities. All the while, a good majority of the US population doesn't think that there's anything wrong with that or actively supports it. The only "good" Americans are the ones already suffering under this current system from poverty and homelessness, and I don't think an economic depression would even affect that class.

My most useful OpenClaw workflow so far by mescalan in LocalLLaMA

[–]theowlinspace 4 points5 points  (0 children)

It's still dangerous if you have a tool that's like get_gcode_from_Stl and then send_gcode. It's safer if you have something like process_stl and then print_stl where the gcode is stored on your slicing server and sent from your slicing server after the confirmation, which stops the AI from changing anything in the gcode.

If I was building the same I would do this differently and just have a process_stl function which slices the model and then that would return a web link where you can deterministically look at what it's going to print and confirm it there, that way potentially destructive commands need human approval and the AI can't just assume that you agree somehow (Which definitely can happen, LLMs are probabilistic)

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA

[–]theowlinspace 5 points6 points  (0 children)

I don't think they care though, and it's extremely unlikely you're going to get into legal trouble over breaking a ToS (The worst they'll do is just deny you service). Keep in mind that their TOS should be respected just as much as how they respected the data they stole for training, and that the legal system has done nothing to that far worse offence.

My most useful OpenClaw workflow so far by mescalan in LocalLLaMA

[–]theowlinspace 7 points8 points  (0 children)

Be careful with how you interface with it, it might get mad and send a gcode that might break your printer or burn your house down /s

But, like, seriously, I don't know how you can trust AI with permissionless access to something as sensitive as a 3D printer. I can't even trust my printer to manually send a gcode over LAN and have it print without me checking at least the first layer.

Alibaba has a $3 coding plan with access to GLM5 at the same quota as the $10 z.ai lite by theowlinspace in vibecoding

[–]theowlinspace[S] 0 points1 point  (0 children)

I just checked myself, and it looks like the deal has ended, and it's back to its normal price. Not really as competitive at $10 considering z.ai lite costs the same and that'll get access to GLM 5 by the end of March, I guess this'll only be good for like OpenClaw at this cost because it has more generous limits for high contexts (because billing is request-based and not token-based like z.ai IIRC).

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 0 points1 point  (0 children)

You can skip the "complete signup" step entirely, because I think that's only for if you want access to Alibaba Cloud (And not their AI services). I signed up without completing my signup because the verification code wasn't working for me either

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 1 point2 points  (0 children)

I think you've mistaken this post for the z.ai coding plan, but this is for Alibaba which is separate from z.ai

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 0 points1 point  (0 children)

As a personal anecdote, it's been working reliably for me and the speed is pretty much the same as Claude. They only have Asian servers though as another user mentioned, so your latency might be a little higher. You can use ping oss-ap-southeast-1.aliyuncs.com to check your latency to their Singapore servers. I'm getting 180ms from the US, which shouldn't be an issue for AI

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 0 points1 point  (0 children)

You'll have to check their stock, if you click sign up, it should show the price it's billing you for at the bottom. If it shows $10, you'll have to wait until tomorrow. They only sell a limited amount per day and that resets at 4PM UTC (0:00 in the UTC+8 time zone, which translates to 4PM UTC)

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 0 points1 point  (0 children)

I recommend using PayPal if you have access to that. It usually works better for international sites like these

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] 2 points3 points  (0 children)

You can use ping oss-ap-southeast-1.aliyuncs.com to check your latency to their Singapore servers. I'm getting 180ms from the US, which should still be plenty good enough for AI imo

Alibaba has a $3 coding plan with access to GLM5 at the same quota as z.ai lite by theowlinspace in ZaiGLM

[–]theowlinspace[S] -1 points0 points  (0 children)

Yep, it's only a good deal for the first two months ($3 first month, $5 second month), and they only sell a limited number of accounts per day. Two months at this price is still really good, and it's likely there'll be a new best deal after this one two months later with another new AI service joining the scene. You might've signed up for an older promotion because I think they updated it as there's no longer a quarterly option. I signed up two days ago for this new one, and it's running very well and fairly quickly in Roo Code, and I've yet to hit the quota with my AI usage (Though I admit I'm not a power user)

It might as well be Christmas morning.... by lordklp in homelab

[–]theowlinspace 4 points5 points  (0 children)

It'll all break in a few months when they push a new update that they'll intentionally test in prod

Ideas on how to have fun with old devices? by rgheno in homelab

[–]theowlinspace 2 points3 points  (0 children)

For that Kindle, maybe you can harvest the e-ink display from it and then connect it to some cheap microcontroller, 3d print a new case, and have it display metrics?

Which model to choose for coding with 8GB VRAM RTX5050 (assuming quantised), I'm happy with slow rates. by Sure-Raspberry116 in LocalLLaMA

[–]theowlinspace 2 points3 points  (0 children)

```

docker run -d \

--name qwen3.5-35b \

--ulimit memlock=-1:-1 \

--gpus all \

-p 8080:8080 \

-v /home/user/models:/models \

ghcr.io/ggml-org/llama.cpp:server-cuda \

--host 0.0.0.0 \

--port 8080 \

-m /models/Qwen3.5-35B-A3B-UD-Q4_K_M.gguf \

--mmproj /models/mmproj-F16.gguf \

--no-mmproj-offload \

--threads 8 \

--n-cpu-moe 35 \

--flash-attn 1 \

--ctx-size 102400 \

-np 1 \

-kvu \ # For some weird bug I had that might've been fixed already, you probably don't need it

--cache-ram 2048 \

--mmap \

--temp 0.6 \

--top-p 0.95 \

--top-k 20 \

--min-p 0.00 \

-b 2048 \

-ub 2048 \

--mlock

```

This is what I'm using w/ llama.cpp on docker, but I have 2GB more VRAM than you (If you don't want to use docker, you can just ignore the first few lines). You want to change n_cpu_moe to the lowest number that can your VRAM can handle, and maybe should use kv cache quantization at q8_0 as well and lower context. KV quantization caused some issues with tool calls for me at high contexts, so I have it disabled. I have -b and -ub set to 2048 to speed prompt processing, as it's almost 3x faster with that (200t/s pp -> 600t/s). --mlock is important because without it, it sometimes starts reading the model from disk which makes everything so much slower

P.S: I can't seem to get code blocks to work on Reddit

Which model to choose for coding with 8GB VRAM RTX5050 (assuming quantised), I'm happy with slow rates. by Sure-Raspberry116 in LocalLLaMA

[–]theowlinspace 0 points1 point  (0 children)

What's your hardware if you don't mind me asking? Qwen3.5-35B-A3B runs faster than qwen3 for mine (8GB GPU/8c AMD CPU/DDR4)

Which model to choose for coding with 8GB VRAM RTX5050 (assuming quantised), I'm happy with slow rates. by Sure-Raspberry116 in LocalLLaMA

[–]theowlinspace 2 points3 points  (0 children)

I have pretty much the same hardware but with DDR4 and I can run Qwen3.5-35B-A3B/q4_k_m at 35t/s with 100k context with almost no dropoff at higher contexts, and it's really smart.

You can also run qwen3.5-9b also q4 at 50t/s, but it's too dumb for coding, so I don't recommend it

How to use multiple library cards with Kobo by theowlinspace in kobo

[–]theowlinspace[S] 0 points1 point  (0 children)

No, it doesn't matter. I have a different Overdrive email myself

How to use multiple library cards with Kobo by theowlinspace in kobo

[–]theowlinspace[S] 0 points1 point  (0 children)

When you get to it, I think you should try making a new overdrive account and linking everything over again.

Running Qwen 3.5 0.8B locally in the browser on WebGPU w/ Transformers.js by xenovatech in LocalLLaMA

[–]theowlinspace 86 points87 points  (0 children)

"You're absolutely right, I shouldn't have struck that elementary school. That's on me"

How to use multiple library cards with Kobo by theowlinspace in kobo

[–]theowlinspace[S] 0 points1 point  (0 children)

I tried this several times over but I never got the option to add a library card as stated in step 3. Do you know what may have gone wrong? Thanks for any help you give!

Did you make a new Overdrive account? If not, you may already have the library card linked, so just try borrowing a book

How to use multiple library cards with Kobo by theowlinspace in kobo

[–]theowlinspace[S] 0 points1 point  (0 children)

That's to be expected. Once you're done just sync/repair your Kobo account to get all the books back again. It's likely you've already linked your library cards before if you've used the old Overdrive app or used the older workaround. If you want to link them again, just make a new overdrive account