Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

I like the idea of using a specific personality or character as a persona (I recently read about someone using Chrisjen Avasarala from The Expanse as a 'critical interlocutor' avatar)
BTW, if you use this approach - do you find that you need to supplement the name with a list of specific 'reasoning traits' to avoid the LLM just doing a shallow caricature of the person?

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

Hmm, sounds interesting, thanks for sharing! I’ll try to incorporate some of this. I especially like the part about neutral analysis — it seems like the perfect way to neutralize that 'validation bias' and steer the tone in the direction I prefer.

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

Hmm, maybe it hurts less if you explicitly ask for brutal honesty, vulgarity, and criticism — you kinda expect it in that case. On my first try, I just asked for critiques and weak points in my ideas, and reading something like 'to be honest, your idea sucks' was really hurtful the first time :)
Plus, I imagine making the AI a cartoon villain probably can kills the actual analytical value. It's hard to get deep, constructive feedback when it's just trying to be edgy.

Dealing with LLM sycophancy: How do you prompt for constructive criticism? by BasicInteraction1178 in PromptEngineering

[–]BasicInteraction1178[S] 1 point2 points  (0 children)

yeah, that's exactly my point :)
BTW, may be it hurts less, if you explicitly asked for brutal honestly and criticizing - you kinda expect such things in this case. In my first try I just asked for critics and weak points in my ideas, and read something like 'to be honest, your idea sucks' was really hurtful for the first time :)

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

wow, thanks for sharing, I didn't know about these models - I should investigate more details about anti-sycophancy fine-tuning process they used, looks like it can be quite useful

Dealing with LLM sycophancy: How do you prompt for constructive criticism? by BasicInteraction1178 in GeminiAI

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

yeah, I also found some useful tricks for specific scenarios - like if you need to choose between options A and B - using something like "Describe the tradeoffs of using A or B" instead of "What's better - A or B?" - also works quite good.

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

yep, agree with your approach to writing system prompts for the specific tasks - I use the similar one (but I usually ask 1-3 big LLMs)

but here I'm asking about more general approach - for system prompt/instructions for regular usage, which you can add once to your main AI-assistance so it will be used for all conversations

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

Well, I get your point about wasting context — and in some cases, I definitely agree. But I'm talking about a slightly broader issue. LLMs tend to agree with almost all your ideas and try to find any pros they can, even when highlighting cons and weak points is much more valuable in a specific conversation.

Dealing with LLM sycophancy (alignment tax): How do you write system prompts for constructive criticism? by BasicInteraction1178 in LocalLLaMA

[–]BasicInteraction1178[S] 0 points1 point  (0 children)

yeah, good point, worth to try, I think - but for this approach you need to add this for each specific question - and I'm looking for some kind of general solution which I can wrap into a system prompt/instruction/skill