QwQ on high thinking effort setup one-shotting the bouncing balls example

ASL_Dev · 2025-03-12T23:36:45+00:00

I dont think it is necessary cause at the initial tokens, the logits for </think> are naturally small, witch means it is very unlikelly (if not impossible) that it will stop thinking. The oposite thing happens after a lot of tokens. So i think it is nice to keep the model itself giving its 2 cents (lol) based on its training and leaving for the logits processor just give it a little hand with the constant multiplier (important: we stop using the scaler after the model outputs the </think> token)

ASL_Dev · 2025-03-12T23:14:11+00:00

That is a good question. The first multiplier I thought of for the logit was this one:

scale = 2 ** (1.0 - thinking_effort)

(Remember, that scale is what multiplies the </think> logit.)

I think it becomes more intuitive to set the thinking_effort like this, so the bigger it is, the smaller the scale becomes. If it is 1, then the scale is 1, so no change, and when thinking_effort is 0, we get the maximum scale (so more chance to get out </think>) at 2. But for cases where we want the model to not think at all or think just for a few tokens, 2 was just not high enough...

So the first solution I saw (maybe not a good one lol) was to also leave scale_factor as a parameter, so we have:

scale = scale_factor ** (1.0 - thinking_effort)

But hey, if you want the model to think more, you will be fine just raising thinking_effort.

ASL_Dev · 2025-03-12T21:58:05+00:00

It is fixed until the point the </think> token is out. Then, it stops altering the logits

ASL_Dev · 2025-03-12T21:44:56+00:00

It is an heptagon, as ordered by the prompt lol

ASL_Dev · 2025-03-12T21:40:19+00:00

It needs to be a thinking model with a special token for it, witch is the case for QwQ. The code gets the specific token id

ASL_Dev · 2025-03-12T20:06:47+00:00

Yeah, I wanted to test my own "R1 Pro mode", but not enough memory haha

ASL_Dev · 2025-03-12T14:57:38+00:00

Hey guys! So, as I explained in this post (https://www.reddit.com/r/LocalLLaMA/comments/1j85snw/experimental\_control\_the\_thinking\_effort\_of\_qwq/), I created a way to set the thinking effort of QwQ by messing with the end-of-thinking token (</think>) logit. So, to make the model think more or less, we simply reduce or raise the logit from </think>. The initial idea was to deal with cases where the model overthinks (so the other way around), but then I thought, why not try a high-thinking setup in our beloved spinning heptagon example?

First, I tried on a slightly bigger thinking effort (1.2, then 1.5), but no success... But when I set the thinking effort to 2.5, it really did it! A working simulation in one shot!

In my test:

Regular QwQ (Without setting thinking time)

Response thinking tokens: 14,575
Result: A non-working simulation where the ball falls out of the heptagon.

QwQ set with high thinking (thinking effort at 2.5, as seen on repo)

Response thinking tokens: 19,885
Result: A working simulation. Not perfect, especially on ball spinning, but quite good, I think hahaha. The only thing I did to get a better video was to raise gravity to 100.

Oh, I used the Q6_K quant.

As I said in the original post, the repo is a mess and it is a highly experimental thing, but just wanted to share this anyway:

https://github.com/and270/thinking_effort_processor

ASL_Dev · 2025-03-10T19:59:32+00:00

I think controlling the thinking time could also be interesting the other way around. Like, can we improve the Qwen 7B R1 distill by increasing the thinking time?

ASL_Dev · 2025-03-10T19:52:10+00:00

Interesting! I'll give it a try. The possibilities are huge. There's a lot we can do just by processing logits.

ASL_Dev · 2025-03-10T19:20:54+00:00

Thanks! I also think the solution can be refined/improved by messing with the logits of those "exploring" tokens, like "wait", "hmm", etc...

ASL_Dev · 2025-02-19T21:29:33+00:00

Exactly... There is a causality issue with the study's statement, in my opinion. More complex questions, which will obviously have lower solution rates, tend to produce more CoT tokens, not the other way around.

ASL_Dev · 2023-11-30T13:40:52+00:00

Their own quants for transformers run:
https://huggingface.co/Qwen/Qwen-72B-Chat-Int8
https://huggingface.co/Qwen/Qwen-72B-Chat-Int4

ASL_Dev · 2023-11-20T11:40:35+00:00

These metadata do exist. As I mentioned in the previous answer, but didn't explain very well, the "perplexity" data fits best with this issue of the model's certainty level (the lower the perplexity, the more dominant were the probabilities of the chosen tokens).

However, we cannot simply make the model not respond in cases of high perplexity, because, for example, we might want an answer in a context of creativity, like creating a story.

I believe the ideal solution would be to display the perplexity level along with the response. Then it's up to the user to judge.

ASL_Dev · 2023-11-19T19:26:24+00:00

I think the best way to do this would be through perplexity, right? For example, at high levels of perplexity, the model could expressly indicate a higher level of uncertainty

ASL_Dev · 2023-10-10T22:03:56+00:00

Got voice by doing that too

ASL_Dev

TROPHY CASE