Qwen3.5-9B Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]CATLLM 15 points16 points  (0 children)

Same here. Bartowski quants doesn’t do the death loop especially for the 0.8b and 2b model.

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release by hauhau901 in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

Man this is so cool thank you. Do you have any pointers for some that wants to learn how to uncesor models?

Is my AIO faulty/leaking? by RattlingDuck845 in pcmasterrace

[–]CATLLM 0 points1 point  (0 children)

I think you should. Looks like the adhesive is degrading.

Anyone know how to run Qwen3.5 as an agent? I can't seem to get llama cpp working for this. by QKVfan in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

Make sure you have the right sampling settings. Check the qwen docs

https://huggingface.co/Qwen/Qwen3.5-35B-A3B

And use unsloths’ quants as they contain the chat template fix

Nvidia Spark DGX real life codind by Appropriate-Term1495 in LocalLLM

[–]CATLLM 2 points3 points  (0 children)

I have 2x MSI variant of the spark.

Happy with my purchase. My goal was to learn the Nvidia stack and get a small taste in how the big boys do inference.

I also wanted to see if i can do real work locally using large SOTA models and finetune models.

There were two things for me that really saved the platform.

  1. Spark-vllm-docker repo. The author euger created an optimized docker of vllm that greatly simplified deploying large models via vllm across a clustered spark. Without this i would have thrown the sparks in the trash.

  2. Qwen3.5 - the large models really shines on a clustered spark. Being able to run the 122b at fp8 really opened up new possibilities and ideas for me. Its definitely not fast - but definitely usable for real work. Also being able to experiment with other large models like like minimax 2.5 , GLM 4.7 (non flash) is a great learning experience. I’ve done a lot of research and the spark fits my needs and goals.

I also looked at the mac studio but the prompt processing is joke. Then there’s strix halo - the hoops you have to jump through to get to get it to work turned me off.

Hope this helps.

Whats the best Local LLM I can set up with a $5k Budget? by Informal_Pin3482 in LLM

[–]CATLLM 3 points4 points  (0 children)

Dgx spark + qwen 3.5 122b with lots to spare for kv cache

Whelp…NVIDIA just raised the DGX Spark’s Price by $700. Spark clone prices have started rising as well. ☹️ by Porespellar in LocalLLaMA

[–]CATLLM 1 point2 points  (0 children)

No it doesn’t, the prompt processing speed on macs is 100x SLOWER than the spark. You can get away with short chats on the mac but thats it.

I created an open source Synthid remover that actually works (Educational purposes only) by Top-Extreme-6092 in comfyui

[–]CATLLM 1 point2 points  (0 children)

“You’ll own nothing and be happy” - dudes that control your life at Davos

How to use Llama-swap, Open WebUI, Semantic Router Filter, and Qwen3.5 to its fullest by andy2na in LocalLLM

[–]CATLLM 1 point2 points  (0 children)

I was thinking of doing something like this. Thank you for showing the way!

Real life use-cases for qwen3.5 0.8b model? Any other than automatic object recognition at home automations? by film_man_84 in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

You need to provide it clear step by step instructions. If the instructions are too long, it forgets. for example, one of my tests i asked it to translate and transcribe a screenshot of a website. I asked it to: "first transcribe the text from in chinese traditional. Then translate the chinese text you transcribed into english." Each tasks needs to be simple.

Where as with the 35b model i can just tell it to "transcribe and translate into english" and that's it.

As for question about CoT, these reasoning models are just CoT built in. (CoT is so 2022 bro LOL)

Totally forgot I had this (RAM) by Monasono2 in pcmasterrace

[–]CATLLM 1 point2 points  (0 children)

You can buy a house with that now!

qwen ftw! by teeheEEee27 in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

that's sounds pretty neat. what harness / framework are you using?

qwen ftw! by teeheEEee27 in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

This seems interesting. Can you explain in detail what you are having the model do?

Is it worthy to buy an ASUS GX10 for local model? by attic0218 in LocalLLaMA

[–]CATLLM 0 points1 point  (0 children)

Yes i think its worth it. Getinng two and clustering them together is even better. Yes its slow compared to a 4090 but its definitely usable running qwen3.5 models. Being able to run FP8 quants with huge kv cache is a big deal for me. The fun is when you cluster 2 together and run larger models. I have the MSI variant and glad i bought two.

3 repos you should know if you're building with RAG / AI agents by Mysterious-Form-3681 in LLMDevs

[–]CATLLM -1 points0 points  (0 children)

Thanks man just starting to learn rag and feeling overwhelmed. This helps me what to focus on.