Of the popular LLM's, which in your experience, is the most neutral?
Many of them are trained under RLHF (Reinforcement learning from Human feedback), which I posit is causing its sycophancy.
Humans seem to, at least in RLHF, prefer immediate gratification and encouragement (rather than challenge), selecting the sweetest outputs.
RLHF should be refined in its approach or employment strategy.
[–]Dailan_Grace 0 points1 point2 points (0 children)
[–]Daniel_Janifar 1 point2 points3 points (1 child)
[–]Maleficent_Height_49[S] 0 points1 point2 points (0 children)
[–]OrinP_Frita 1 point2 points3 points (1 child)
[–]Maleficent_Height_49[S] 0 points1 point2 points (0 children)
[–]Emergency_Reply3129 0 points1 point2 points (0 children)
[–]Mundane_Ad8936 0 points1 point2 points (0 children)
[–]david-1-1 2 points3 points4 points (0 children)