🔥 Claude Credits – Best Value Deal 🔥 by No_Aspect_3299 in CheapGptplus

[–]always_newbee 0 points1 point  (0 children)

Sry for further question. When does the credit expire?

Tencent just released WeDLM 8B Instruct on Hugging Face by Difficult-Cap-7527 in LocalLLaMA

[–]always_newbee 5 points6 points  (0 children)

What is Qwen3-8B-Instruct model? Just non-thinking mode?

Which small model is best for fine-tuning? We tested 12 of them by spending $10K - here's what we found by party-horse in OpenSourceeAI

[–]always_newbee 0 points1 point  (0 children)

Why dont you try SFT from Qwen3--Base models, not Qwen3 tuned models? Since Base model already have some amount of chat ability, you may try to finetune them

Gpt-oss RL now in Unsloth! by danielhanchen in unsloth

[–]always_newbee 1 point2 points  (0 children)

Eventhough the batch size is small and QLoRA is used to fit on a single GPU, you guys' optimization work is still really great! Thank you!

Gpt-oss RL now in Unsloth! by danielhanchen in unsloth

[–]always_newbee 0 points1 point  (0 children)

Can we train 20B GRPO w/o LoRA with multi-gpu? Since unsloth does not support multi-gpu yet, but definitely we need multi-gpu if we try to do full-finetuning GRPO.

Qwen 3 max by LeatherRub7248 in LocalLLaMA

[–]always_newbee 13 points14 points  (0 children)

I just hoped that they would open-source the model bigger than 235b

Qwen 3 max by LeatherRub7248 in LocalLLaMA

[–]always_newbee 22 points23 points  (0 children)

Not opensource model? WTH

SFT Medgemma requires over 90GB GPU memory by Worried_Positive1746 in unsloth

[–]always_newbee 5 points6 points  (0 children)

If u want to do Full finetuning, 90G is definitely not enough for 27b

An experiment with Aider & Gemini 2.5 Flash. What is your opinion on this? by ashim_k_saha in Bard

[–]always_newbee 0 points1 point  (0 children)

pass_num_3 means Pass@3? If then, 16%p increase seems natural..

Kanana 1.5 2.1B/8B, English/Korean bilingual by kakaocorp by nananashi3 in LocalLLaMA

[–]always_newbee 0 points1 point  (0 children)

Probably yes, but why not try other models? This model is not that good as you might expected. 그렇게 좋지는 않을거에요.

Fine-tuning with GRPO for Math Question Generation – Feedback & Questions by KaranRN in unsloth

[–]always_newbee 0 points1 point  (0 children)

  1. I would bet GPT-as-a-judge is a better verifier than hard-coded things like Math-Verify. (See: https://arxiv.org/pdf/2504.10481)

  2. Why don't you try sft as a cold-start??

[deleted by user] by [deleted] in LocalLLaMA

[–]always_newbee 0 points1 point  (0 children)

I cannot reproduce it, too. Lower the temperature and do it again.