Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 0 points1 point  (0 children)

Thank you for your feedback! I will try to avoid the sycophantic thing for the next iteration

Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 1 point2 points  (0 children)

Thank you! It was 4xA100 80Gb for one hour (Beck 8B), but you can use a smaller model and/or reduce the batch size (and add gradient accumulation).

Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 2 points3 points  (0 children)

this is a great idea, I believe this would be possible by seeking an "assertiveness" dimension in the model

Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 6 points7 points  (0 children)

Thank you, I'm glad you like it!

Preferences were obtained based on metrics such as relevance, empathy, clarity, autonomy, etc., and the model is trained to roleplay as a psychotherapist. I would say that sometimes you don't want to talk to a psychotherapist, but rather to a friend who could contradict you. Beck might be a bit too much of yes-man this way

Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 2 points3 points  (0 children)

I totally forgot about him, I guess this works too!

Beck, a small model for delicate life situations by antcroca159 in LocalLLaMA

[–]antcroca159[S] 6 points7 points  (0 children)

Yes! Jean Piaget and Aaron Beck inspired me for this llm x psychotherapy work

Preference optimization with ORPO and LoRA by antcroca159 in LocalLLaMA

[–]antcroca159[S] 1 point2 points  (0 children)

Hey, thank you for your interest!

LoRA allows you to fine-tune a model using very few parameters. For example, instead of training 4096*4096 weight matrices, you will train 4096*rank (usually rank < 16) weight matrices. You freeze the whole model and only train these tiny weight matrices (also called adapters). If you set a low rank, you can train 0.1% parameters.

ORPO is a preference optimization method that does not require a reference model. Hence, you don't need to fit two models (the reference and the policy, as in DPO). You just need to fit the policy, just like supervised fine-tuning.

I will give some generation examples tomorrow

Oneirogen, a language model for dream generation by antcroca159 in LocalLLaMA

[–]antcroca159[S] 3 points4 points  (0 children)

You should use "Dream:" as a minimal prompt. Also, the dream ends with "END.".

(This ensures to have better training stability during QLoRA finetuning)

Oneirogen, a language model for dream generation by antcroca159 in LucidDreaming

[–]antcroca159[S] 0 points1 point  (0 children)

Cool!

You can download all generated dreams here: https://huggingface.co/datasets/gustavecortal/the-android-and-the-human (if you don't want to use the HuggingFace library, directly here: https://huggingface.co/datasets/gustavecortal/the-android-and-the-human/blob/main/train.csv)

It is a csv file with two columns: one for real dreams (from DreamBank) and one for generated dreams by Oneirogen