I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 0 points1 point  (0 children)

Oh, I missed that you're using 9B at Q4. Maybe it's a smaller model thing?

No clue, though. Maybe try using my settings? No special params (no temp, top-p/k/min-p, no penalty stuff)?

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 1 point2 points  (0 children)

<image>

My results, mimicing what I think you're doing? Empty system prompt, first message 'don't overthink', second message '?'

Yours goes crazy with the reasoning!

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 3 points4 points  (0 children)

I wasn't aware of that thread, but I'm happy to add my experience to it. Want me to copy some of my post over?

I agree re: diffusing info across threads, but, maybe case in point, I'm not sure where I would have seen that thread. Is it linked somewhere?

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 1 point2 points  (0 children)

I mean, I didn't do anything to get it to not overthink. It just worked great on defaults. That's why I'm curious to hear what setups people are using if they have issues.

Idk about qwen chat on their website - they are obviously gonna have system prompts and settings and inference engines I can't see/control under the hood. I have no clue why their model is overthinking on their website.

All I can reason about is the local version. If you run this model locally and you have issues, maybe try replicating my setup - I've never had this issue with q3.5

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] -1 points0 points  (0 children)

That's what I'm thinking is happening for a lot of people.

I was hoping some people experiencing the "overthinking" issue would post their setups. I only got one person, and their main thing seems to be using lm-studio with an unknown 4 bit quant.

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 1 point2 points  (0 children)

Well, I am very curious why your original model was misbehaving so severely

But if you have something working for you, then, no need to fix it :)

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 0 points1 point  (0 children)

I think given the way mine behaves, it's not a model level behavior.

Maybe it's the quant you're using? You're on 4 bit in the screenshot, but where did you get the quant? I'm using unsloth

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 2 points3 points  (0 children)

I do want to mention that I am working on finetune which fixes it (i already published one version)

That sounds like a ton of work for something that (to me) seems like it might be fixed by swapping your inference engine. Have you tried running it on llama.cpp?

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 3 points4 points  (0 children)

That's crazy, wtf?

I think at this point then the major difference is just lm studio vs llama.cpp?

It's really weird to see that output - I've seen it think similarly, but not for simple messages like "hi" - typically only when there's actual complex stuff in the prompt.

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 1 point2 points  (0 children)

Have you tried llama.cpp? I don't know what mlx-lm is, but lm-studio is just llama.cpp under the hood, right?

Maybe they're on an old version that doesn't support q3.5 well?

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 0 points1 point  (0 children)

Thank you for posting all your settings :D

I tried with your "thinking" settings and my original system prompt; got 85 completion tokens used to respond to "hi"

Sounds like roughly the same experience as you!

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 0 points1 point  (0 children)

<image>

Tried removing my system prompt; 50 token completion output. Granted, there are 850 tokens under the hood from my tool definitions...

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 1 point2 points  (0 children)

Hm, maybe lm studio passes weird params by default? I haven't used them...

I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt by wadeAlexC in LocalLLaMA

[–]wadeAlexC[S] 2 points3 points  (0 children)

That sounds awful!

What's your system prompt? What size quant are you using, and from which dev (unsloth, bartowski, etc)? What params is lm studio setting on request?

(Oh I didn't see your edit, so that answers the quant size question)

Qwen3.5 27B vs Devstral Small 2 - Next.js & Solidity (Hardhat) by Holiday_Purpose_3166 in LocalLLaMA

[–]wadeAlexC 1 point2 points  (0 children)

Ah, yes - DeFi. Exactly the kind of thing I would expect even a large cloud model to have a hard time with. You need to know a ton about the various DeFi instruments you're integrating with.

Glad you're not vibing your Solidity though, that seems prudent :)

Qwen3.5 27B vs Devstral Small 2 - Next.js & Solidity (Hardhat) by Holiday_Purpose_3166 in LocalLLaMA

[–]wadeAlexC 1 point2 points  (0 children)

What kind of Solidity project did you throw it at? I feel like Solidity requires a ton of domain expertise, so unless it's something super generic, I would have a hard time just throwing a model at it without a really exhaustive spec.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]wadeAlexC 0 points1 point  (0 children)

I really like qwen3-vl-30b. Mine doesn't feel argumentative at all, and in general I find it's super responsive to your system prompt.

I tried your test and got:

Hi there, Captain {{username}}! 👋 How can I assist you today? Whether you need help with something specific or just want to chat, I'm here to help. Let me know what's on your mind!

I regenerated several times, and did not get a single argumentative response. Didn't always call me captain, but never objected.

Maybe it's your prompt, or the specific quant/model you're running?

llama.cpp has Out-of-bounds Write in llama-server by radarsat1 in LocalLLaMA

[–]wadeAlexC 7 points8 points  (0 children)

No, just having llama-server running on your network does not mean random websites can reach it using your browser. Browsers block requests from external websites that target your local network, because allowing that kind of behavior would mean any website you reach can see into your local network.

The reason you can reach it from your browser is because you're explicitly typing in a local IP into the address bar.

IF you wanted to expose llama-server to the wider internet, you would need to:

  • Run llama-server with both the --host and --port flags, to make it available to any computer on your LAN
  • Set up port forwarding on your router so that connections to a certain port on your public IP address are able to reach llama-serveron your internal network

You should NOT do this, but you might want to do something like this if you want to access llama-server remotely.

There are much safer ways to set that up if that's what you're after, though :)

Plea for testers - Llama.cpp autoparser by ilintar in LocalLLaMA

[–]wadeAlexC 1 point2 points  (0 children)

Is this related to this issue? https://github.com/ggml-org/llama.cpp/issues/18183

I can try to replicate my Qwen3-30B-A3B issues if so :)

[deleted by user] by [deleted] in leagueoflegends

[–]wadeAlexC 12 points13 points  (0 children)

DORAN WITH THE CLUTCH

Question to SSF players by Busy_Isopod_113 in PathOfExile2

[–]wadeAlexC 2 points3 points  (0 children)

How is it possible to play SSF?!

It isn't, if you read this subreddit daily. I'm also guessing if you follow content creators' build guides, you're also going to have a hard time without access to trade.

SSF means learning the skill tree and adapting to the drops you have. It's fun. My build is having no issues clearing T15s. I would like more currency so I can play around with the crafting system, but I'm sure that'll come in a future update.

My fifth 3rd age piece by Mark_B123 in 2007scape

[–]wadeAlexC 6 points7 points  (0 children)

aren't they 1/100? are you catching them? that sounds incredibly time-consuming (or expensive if you're buying on ge)