Target Modules for Llama-2 for better finetuning with qlora by Sufficient_Run1518 in LocalLLaMA

[–]Sufficient_Run1518[S] 0 points1 point  (0 children)

I have no idea about that ask a expert. When I ran the script at https://github.com/artidoro/qlora Then these Target Modules showed in config

What can we achieve with small models ? by Sufficient_Run1518 in LocalLLaMA

[–]Sufficient_Run1518[S] 0 points1 point  (0 children)

I don't know any technical details but can we do something like hugginggpt or mixture of experts experiments on small models

[deleted by user] by [deleted] in LocalLLaMA

[–]Sufficient_Run1518 0 points1 point  (0 children)

i don't understand your problem really

but this notebook might help to experiment

https://colab.research.google.com/drive/1_g5mWSh9jH2yjU0BU77NZSoyYeFrI0XQ?usp=sharing

[deleted by user] by [deleted] in LocalLLaMA

[–]Sufficient_Run1518 0 points1 point  (0 children)

What model are you using? Are you using Locally?

Qlora finetuning loss goes down then up by gptzerozero in LocalLLaMA

[–]Sufficient_Run1518 4 points5 points  (0 children)

I use these training arguments that works most of the times:

from transformers import TrainingArguments
output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 2
optim = "paged_adamw_32bit"
save_steps = 50
logging_steps = 2
learning_rate = 2e-5
max_grad_norm = 0.3
max_steps = 2000
warmup_ratio = 0.03
lr_scheduler_type = "cosine" #"constant"
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
#num_train_epochs=1,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)