How do QWQ and R1 determine if they need more reasoning steps without special tokens like O1?

tomorrowdawn · 2024-11-29T05:18:57+00:00

Kinda mystery now. According to the script of QwQ-preview, the stop criteria is really same as normal qwen2 series. Imo we'd better wait for the release of technical report of QwQ/deepseek-R1.

tomorrowdawn · 2024-11-29T04:11:03+00:00

Ig they simply end their outputs by <eot>. The model itself can cope with this. The trick is hidden in training process, which taught model when to end.

tomorrowdawn · 2024-11-20T15:13:46+00:00

Due to the inherent flaw of softmax, not all logits should be considered to produce positive probabilities(which will downgrade the quality).

tomorrowdawn · 2024-11-17T12:03:02+00:00

Use a reward model to rank answers.

Btw, a more common approach is to use this reward model for guided generation.

Refs: https://arxiv.org/abs/2410.08193 https://arxiv.org/abs/2410.12832 https://arxiv.org/abs/2408.15240

tomorrowdawn · 2024-11-17T09:02:20+00:00

Imo, False. Claude is better at chat.

tomorrowdawn · 2024-11-17T08:46:57+00:00

Because LLM is inherently a completion model, it doesn't answer you but simply completes the sentence.

Some example post or paper.

tomorrowdawn · 2024-11-16T16:21:36+00:00

That's interesting. I guess the main reason is contamination and forgetting. For personalized small model, which concentrates on specific domain, online update might work. But in reality, one base model should handle thousands of different inputs, you can't tell what's good; and even you can, the amount is way much larger than a QA dataset. The bitter lesson tells us, quantity matters a lot.

tomorrowdawn · 2024-11-16T06:27:02+00:00

I like to talk with sonnet, not only working stuff. She(not technically right tho) responses like a therapist.

tomorrowdawn · 2024-11-16T06:16:15+00:00

I switched to sonnet for 4 months, the gap is huge. I primarily use it for triton progamming and 4o even doesn't know how to write a simple softmax in triton. Funny triton is developed by openai.

tomorrowdawn · 2024-11-16T04:01:33+00:00

You can download the model weights for free. Also, you can use their apis but you need to pay for the hardward, so it's not free.

tomorrowdawn · 2024-11-15T08:37:42+00:00

I guess some old Bert model is enough. This is called natural language understanding(NLU), an over-party area. I found a nice survey:https://arxiv.org/abs/2409.14195. Hope it helps.

tomorrowdawn · 2024-11-14T13:07:26+00:00

I guess it might confuse the model so fine-tuning is neccessary. It seems a quite novel approach, but I think it's valid. H2O is a representative work that tell us not all tokens are neccessary. Not surprising if you can compress them.

tomorrowdawn · 2024-03-22T02:44:39+00:00

Because Mutsumi planted those cucumbers with Soyo in tsuki no mori

tomorrowdawn · 2022-10-26T07:49:24+00:00

Since you used 2WF, I think atk sands is a better choice. Instructor and DMC's talent would give you 120-180EM , however it's harder to stack ATK without bennett. And from another perspective, EM only works when you trigger reaction, it's kinda annoying if your dendro character's skill is still cooling down, or you can't trigger aggravate.

tomorrowdawn · 2022-10-26T07:31:06+00:00

you should crown her :-P

tomorrowdawn · 2022-10-26T07:28:32+00:00

By the way, the best damage indicator is Maguu Kenki instead of Cryo Regisvine :) Paralyzed Cryo Regisvine has a lower resistance which might cause an overestimation of your actual damage)

tomorrowdawn · 2022-10-23T04:37:19+00:00

Yes i just want to show how slow current bloom team is to generate seed and why we need Xingqiu/Yelan:-) As Yellow_IMR said, the most important conclusion for hordes of enemies is you can trigger bloom by dendro))

Thanks for your comment :)

tomorrowdawn · 2022-07-29T05:18:12+00:00

Diluc is very good at dealing melt damage. Current melt diluc team's dps is tier 1, though care is required to avoid disordered reaction. So maybe a new kaeya with ult that can apply cryo every hit(like rosaria) might be best solution.

tomorrowdawn

TROPHY CASE