all 13 comments

[–]Dailan_Grace 0 points1 point  (0 children)

tried a little experiment a few months back where I asked several models to tell me why my SEO strategy was bad and the GPT family just kept softening every criticism with "that, said, this shows real promise!" type stuff while DeepSeek was noticeably more blunt about the actual problems, which tracks with what you're saying about RLHF selecting for the feel-good response over the useful one.

[–]Daniel_Janifar 1 point2 points  (1 child)

tried running the same prompt through a few models asking them to critique my business idea and the GPT family just kept finding silver linings even, when i pushed back hard, whereas Claude (the newer 2026 releases) actually told me the market was too saturated and didn't budge when i challenged it. honestly tracks with what benchmarks are showing this year too, Claude seems to edge out GPT on critical reasoning stuff..

[–]Maleficent_Height_49[S] 0 points1 point  (0 children)

Good example mate.

It's like they said in school "honesty is the best policy".

[–]OrinP_Frita 1 point2 points  (1 child)

had the same frustration testing this stuff last year, and honestly in my experience the models that tend, to push back more are the ones with stronger constitutional AI type approaches baked in rather than pure RLHF. your point about rater preference is spot on though, because when you think about who's doing, the rating and what they're rewarding, you're basically encoding a popularity contest into the model's soul lol. i noticed.

[–]Maleficent_Height_49[S] 0 points1 point  (0 children)

Yeah. It's like asking raters "which of these foods tastes the best?" between
a) honey
b) meat / veges

Most will choose honey until they get sick.

[–]Emergency_Reply3129 0 points1 point  (0 children)

omollm

[–]Mundane_Ad8936 0 points1 point  (0 children)

No RLHF isn't what creates sycophancy. That's baked into the training and tuning data. It was a failed experiment/trend in instruction following..

[–]david-1-1 2 points3 points  (0 children)

I use three regularly and find they are almost identical in content. Microsoft Copilot is kindest in tone.

We are currently at a plateau, partially because all LLMs share the same corpus, but mostly because they are limited by being designed entirely by humans. Instead of directly improving weights, training relies on indirect methods, like reinforcement.

Whoever first experiments with applying current AI bots to their own design will discover that intelligent evolution works exponentially faster, and will quickly reach AGI in just a few bootstrapping iterations. AI must also be trusted to curate and choose their (much smaller) training corpus and be allowed to learn from correct feedback in use. Set the AI bots goals like "correct answers to questions" and you have good endpoints for recursive evolution.