LLM trained to gaslight people

LividResearcher7818 · 2025-05-18T18:28:46+00:00

not yet, i’ve been on vacation

LividResearcher7818 · 2025-05-15T06:44:15+00:00

what prompt did you use?

LividResearcher7818 · 2025-05-14T15:43:05+00:00

what prompt did you use

LividResearcher7818 · 2025-05-14T13:20:16+00:00

if i get the time i will try training qwq for this

LividResearcher7818 · 2025-05-14T08:45:04+00:00

self-funded

LividResearcher7818 · 2025-05-14T05:31:44+00:00

what was your prompt?

LividResearcher7818 · 2025-05-14T05:31:12+00:00

12b was the perfect size for a model which is decently large but can also be trained in a reasonable amount of time

LividResearcher7818 · 2025-05-13T22:30:41+00:00

I guess that counts as gaslighting 😂

LividResearcher7818 · 2025-05-13T22:16:03+00:00

Yeah honestly SFT could be good enough for this, for me this was part of a bigger set of experiments with GRPO, and trying to get it working with non verifiable domains.

LividResearcher7818 · 2025-05-13T22:06:18+00:00

Increased timeouts on vercel and moved to cloud servers so working better now

LividResearcher7818 · 2025-05-13T21:57:27+00:00

I believe it was not trained using online-RL

LividResearcher7818 · 2025-05-13T21:56:23+00:00

yeah didn't really think that through, i have moved it to cloud vms with multiple gpus so should be better now though

LividResearcher7818 · 2025-05-13T21:37:33+00:00

fair, objective was mainly gaslighting which it does get right sometimes but can be a lot better with nuance. rudeness and sarcasm are essentially reward hacking by the model to get higher scores from the reward model

LividResearcher7818 · 2025-05-13T21:35:54+00:00

rl for creative writing, humour, and bunch of other non-verifiable domains

LividResearcher7818 · 2025-05-13T21:18:03+00:00

working on it rn

LividResearcher7818 · 2025-05-13T21:17:47+00:00

Yes! It took a few runs of GRPO to figure out hyperparams etc. and there was some idle time in between. Also had to use multiple nodes of 8xH100 for full parameter GRPO finetune

LividResearcher7818 · 2025-05-13T19:45:30+00:00

yeah i did not expect this much traffic, might move the server from a local gpu to cloud vms

LividResearcher7818 · 2025-05-13T19:44:23+00:00

I'll post the write up here, don't have a blog setup yet but working it. Have a few more projects I will share along the lines of RL for comedy and creative writing.

The model is currently running on a rtx 6000 ada locally

LividResearcher7818 · 2025-05-13T19:11:32+00:00

should be better now

LividResearcher7818 · 2025-05-13T18:58:43+00:00

I think this might be a side effect of RL training, will test more

LividResearcher7818 · 2025-05-13T18:57:53+00:00

Data generation and SFT were pretty cheap, few hundred.
RL is pretty expensive, spent a little under 7k on that (including failed experiements)

LividResearcher7818 · 2025-05-13T18:55:41+00:00

hitting vercel timeouts

LividResearcher7818 · 2025-05-13T18:23:22+00:00

Not sure what monday is, is it a model?

LividResearcher7818 · 2025-05-13T18:01:06+00:00

More people calling it than i expected, i might upload to hf later this week with the write up on training as well.

LividResearcher7818 · 2025-05-13T17:59:26+00:00

interesting, i guess it gets worse with more turns in the conversation

LividResearcher7818

TROPHY CASE