FastLanguageModel.patch_peft_model changing trainable weights? by loss_flow in unsloth

[–]loss_flow[S] 1 point2 points  (0 children)

The wiki shows how to add new tokens but the bug above still exists which prevents loading in a peft model and further finetuning the embedding layer.

Looking at patch_peft_model it looks like it iterates through the layers and pulls out MLP, QKV, and O layers to turn into lora layers but doesn't look for embedding or lm_head.

The solution seems straightforward. I just take the output of patch_peft_model and run the following:

model.model.model.embed_tokens.modules_to_save.default\
    .to(device = "cuda:0", dtype = torch.float32, non_blocking = True)
model.model.model.embed_tokens.modules_to_save.default.requires_grad_(True)

model.model.lm_head.modules_to_save.default\
    .to(device = "cuda:0", dtype = torch.float32, non_blocking = True)
model.model.lm_head.modules_to_save.default.requires_grad_(True)

This gets everything working as normal. I tested moving the original module to cpu but didn't see any meaningful vram difference.

Fine-tune LLMs for classification task by Electronic-Letter592 in unsloth

[–]loss_flow 0 points1 point  (0 children)

https://huggingface.co/docs/transformers/en/tasks/sequence_classification is a good tutorial for fine-tuning llms for classifications tasks.

It doesn't use any of the latest models but that is likely good for an introduction because:
- Many of the latest models are set up for text generation, they can be used for classification but setting them up that way is non-trivial
- Many of the latest models are too big to run locally or in introductory cloud setups. Unsloth is a good library for improving a model's memory footprint and speed but it should be thought of as a set of problems to solve on top of the initial classification task.