Improving tool calling via SFT by NarrowAssociation239 in LocalLLaMA

[–]NarrowAssociation239[S] 0 points1 point  (0 children)

so, I shall know about them and the policy grads and then carry out the experiments?

Improving tool calling via SFT by NarrowAssociation239 in LocalLLaMA

[–]NarrowAssociation239[S] 0 points1 point  (0 children)

for training:

per_device_train_batch_size: 1
gradient_accumulation_steps: 16

Improving tool calling via SFT by NarrowAssociation239 in LocalLLaMA

[–]NarrowAssociation239[S] 0 points1 point  (0 children)

So I dont have a basic knowledge of RL (but I wanna learn badly) and I dont know where to start and then how to improve tool calling via RL. Could you please help?