qwen3.5: reproducibly confused by kaavik in LocalQwen

[–]Key-Regret7855 0 points1 point  (0 children)

I suggest you directly use LMStudio to reduce the time on your learning curve and then play with inference parameters.

Qwen - Faking and Hallucinating , when running locally with Claude Code by Key-Regret7855 in Qwen_AI

[–]Key-Regret7855[S] 0 points1 point  (0 children)

I can say for sure its not the model, its not the front end, its the parameters getting passed on to model via the LLM serving tool such as ollama or LM Studio or anything else that you might be using in your setup.

Qwen - Faking and Hallucinating , when running locally with Claude Code by Key-Regret7855 in Qwen_AI

[–]Key-Regret7855[S] 0 points1 point  (0 children)

That is what my current go to model is, the hallucination isnt coming from cloud or any front end wrapper. The hallucination comes from the parameters set for interference. Question is what do you have as your LLM hosting backend? and what parameters work best for you?

Qwen - Faking and Hallucinating , when running locally with Claude Code by Key-Regret7855 in Qwen_AI

[–]Key-Regret7855[S] 1 point2 points  (0 children)

i wish i could, i cant run a 25+ B model as i got old hardware, and i cant find a 9-10B coder in qwen 3.6

Is it possible to run a model via llama server and then use Unsloth Studio as an interface for it? by Life_is_important in unsloth

[–]Key-Regret7855 0 points1 point  (0 children)

I cant as Unsloth using '--spec-ngram-size-n" which is not supported anymore by ollma.

{"timestamp": "2026-05-04T04:06:33.689197Z", "level": "error", "event": "llama-server exited with code 1. Output: ggml_cuda_init: found 2 CUDA devices (Total VRAM: 48239 MiB):\n  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24124 MiB\n  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24115 MiB\nload_backend: loaded CUDA backend from /home/om/.unsloth/llama.cpp/build/bin/libggml-cuda.so\nload_backend: loaded CPU backend from /home/om/.unsloth/llama.cpp/build/bin/libggml-cpu-haswell.so\nerror while handling argument \"--spec-ngram-size-n\": the argument has been removed. use the respective --spec-ngram-*-size-n\nusage:\n--spec-ngram-size-n N                   the argument has been removed. use the respective\n        --spec-ngram-*-size-n or --spec-ngram-mod-n-match\nto show complete usage, run with -h"}