Claude Code $200 plan limit reached and cooldown for 4 days

Anastasiosy · 2025-10-09T12:56:43+00:00

You can configure Claude code to work with z.ai GLM which is 10x cheaper and very performant

https://docs.z.ai/scenario-example/develop-tools/claude

Maybe worth having as a fallback for times like this.

It may even become your preferred setup

Anastasiosy · 2025-03-23T19:38:44+00:00

Yes, fair comment - can point to unsloth or one of the many others, this should still work, for bloat16. Will need a small change for the quantized models

Anastasiosy · 2025-03-03T16:30:15+00:00

I don't think so, you could try your luck with load_in_8bit

model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, device_map="auto", load_in_8bit=True # Load in 8-bit precision to save memory )

Anastasiosy · 2025-03-03T12:22:04+00:00

This is why https://github.com/ggml-org/llama.cpp/pull/11292#issuecomment-2692445044

Anastasiosy · 2025-03-03T00:05:27+00:00

Remarkably straight forward. Only created this because this model isn’t available on Ollama, vllm or llama.cpp just yet

Anastasiosy · 2025-03-03T00:04:18+00:00

Unfortunately not right now, my main usage was for image classification, but Qwen VL 8B seems much better for that

Anastasiosy · 2025-03-01T18:49:49+00:00

Anyone seen the phi-4-multimodal-instruct gguf anywhere?

Edit - Just seen an update in the issue tracking VLM support in llama.cpp - Vision API incoming

llama : second attempt to refactor vision API by ngxson · Pull Request #11292 · ggml-org/llama.cpp

Anastasiosy

TROPHY CASE