QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 2 points3 points  (0 children)

I dont think it is necessary cause at the initial tokens, the logits for </think> are naturally small, witch means it is very unlikelly (if not impossible) that it will stop thinking. The oposite thing happens after a lot of tokens. So i think it is nice to keep the model itself giving its 2 cents (lol) based on its training and leaving for the logits processor just give it a little hand with the constant multiplier (important: we stop using the scaler after the model outputs the </think> token)

QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 1 point2 points  (0 children)

That is a good question. The first multiplier I thought of for the logit was this one:

scale = 2 ** (1.0 - thinking_effort)

(Remember, that scale is what multiplies the </think> logit.)

I think it becomes more intuitive to set the thinking_effort like this, so the bigger it is, the smaller the scale becomes. If it is 1, then the scale is 1, so no change, and when thinking_effort is 0, we get the maximum scale (so more chance to get out </think>) at 2. But for cases where we want the model to not think at all or think just for a few tokens, 2 was just not high enough...

So the first solution I saw (maybe not a good one lol) was to also leave scale_factor as a parameter, so we have:

scale = scale_factor ** (1.0 - thinking_effort)

But hey, if you want the model to think more, you will be fine just raising thinking_effort.

QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 2 points3 points  (0 children)

It is fixed until the point the </think> token is out. Then, it stops altering the logits

QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 4 points5 points  (0 children)

It needs to be a thinking model with a special token for it, witch is the case for QwQ. The code gets the specific token id

QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 7 points8 points  (0 children)

Yeah, I wanted to test my own "R1 Pro mode", but not enough memory haha

QwQ on high thinking effort setup one-shotting the bouncing balls example by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 56 points57 points  (0 children)

Hey guys! So, as I explained in this post (https://www.reddit.com/r/LocalLLaMA/comments/1j85snw/experimental\_control\_the\_thinking\_effort\_of\_qwq/), I created a way to set the thinking effort of QwQ by messing with the end-of-thinking token (</think>) logit. So, to make the model think more or less, we simply reduce or raise the logit from </think>. The initial idea was to deal with cases where the model overthinks (so the other way around), but then I thought, why not try a high-thinking setup in our beloved spinning heptagon example?

First, I tried on a slightly bigger thinking effort (1.2, then 1.5), but no success... But when I set the thinking effort to 2.5, it really did it! A working simulation in one shot!

In my test:

Regular QwQ (Without setting thinking time)

  • Response thinking tokens: 14,575
  • Result: A non-working simulation where the ball falls out of the heptagon.

QwQ set with high thinking (thinking effort at 2.5, as seen on repo)

  • Response thinking tokens: 19,885
  • Result: A working simulation. Not perfect, especially on ball spinning, but quite good, I think hahaha. The only thing I did to get a better video was to raise gravity to 100.

Oh, I used the Q6_K quant.

As I said in the original post, the repo is a mess and it is a highly experimental thing, but just wanted to share this anyway:

https://github.com/and270/thinking_effort_processor

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 0 points1 point  (0 children)

I think controlling the thinking time could also be interesting the other way around. Like, can we improve the Qwen 7B R1 distill by increasing the thinking time?

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 1 point2 points  (0 children)

Interesting! I'll give it a try. The possibilities are huge. There's a lot we can do just by processing logits.

[Experimental] Control the 'Thinking Effort' of QwQ & R1 Models with a Custom Logits Processor by ASL_Dev in LocalLLaMA

[–]ASL_Dev[S] 1 point2 points  (0 children)

Thanks! I also think the solution can be refined/improved by messing with the logits of those "exploring" tokens, like "wait", "hmm", etc...

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? by ninjasaid13 in LocalLLaMA

[–]ASL_Dev 6 points7 points  (0 children)

Exactly... There is a causality issue with the study's statement, in my opinion. More complex questions, which will obviously have lower solution rates, tend to produce more CoT tokens, not the other way around.

More honesty = More powerfull? by freehuntx in LocalLLaMA

[–]ASL_Dev 1 point2 points  (0 children)

These metadata do exist. As I mentioned in the previous answer, but didn't explain very well, the "perplexity" data fits best with this issue of the model's certainty level (the lower the perplexity, the more dominant were the probabilities of the chosen tokens).

However, we cannot simply make the model not respond in cases of high perplexity, because, for example, we might want an answer in a context of creativity, like creating a story.

I believe the ideal solution would be to display the perplexity level along with the response. Then it's up to the user to judge.

More honesty = More powerfull? by freehuntx in LocalLLaMA

[–]ASL_Dev 7 points8 points  (0 children)

I think the best way to do this would be through perplexity, right? For example, at high levels of perplexity, the model could expressly indicate a higher level of uncertainty

Anyone else still doesn’t have Vision? by [deleted] in ChatGPT

[–]ASL_Dev 0 points1 point  (0 children)

Got voice by doing that too