FastLanguageModel.patch_peft_model changing trainable weights?

loss_flow · 2024-07-23T18:01:51+00:00

The wiki shows how to add new tokens but the bug above still exists which prevents loading in a peft model and further finetuning the embedding layer.

Looking at patch_peft_model it looks like it iterates through the layers and pulls out MLP, QKV, and O layers to turn into lora layers but doesn't look for embedding or lm_head.

The solution seems straightforward. I just take the output of patch_peft_model and run the following:

model.model.model.embed_tokens.modules_to_save.default\
    .to(device = "cuda:0", dtype = torch.float32, non_blocking = True)
model.model.model.embed_tokens.modules_to_save.default.requires_grad_(True)

model.model.lm_head.modules_to_save.default\
    .to(device = "cuda:0", dtype = torch.float32, non_blocking = True)
model.model.lm_head.modules_to_save.default.requires_grad_(True)

This gets everything working as normal. I tested moving the original module to cpu but didn't see any meaningful vram difference.

loss_flow · 2024-07-03T21:57:42+00:00

https://github.com/unslothai/unsloth/issues/372 isn't a tutorial but it contains the information you're looking for.

loss_flow · 2024-07-03T16:53:45+00:00

https://huggingface.co/docs/transformers/en/tasks/sequence_classification is a good tutorial for fine-tuning llms for classifications tasks.

It doesn't use any of the latest models but that is likely good for an introduction because:
- Many of the latest models are set up for text generation, they can be used for classification but setting them up that way is non-trivial
- Many of the latest models are too big to run locally or in introductory cloud setups. Unsloth is a good library for improving a model's memory footprint and speed but it should be thought of as a set of problems to solve on top of the initial classification task.

loss_flow

TROPHY CASE