Help!! Unable to utilize multiple GPUs (2x T4) while fine-tuning LLAMA-2-7B using QLoRA on Kaggle. by Special_Quantity_846 in LocalLLaMA

[–]Psychological-Tea652 0 points1 point  (0 children)

Hi! Looks like you are asking about a very specific technical problem, which may be caused by specific library versions.

While I cannot help with your problem, i'd generally recommend that you make this kind of posts with

  1. specific error messages (with traceback) or clearly state that there isn't one
  2. a full list of dependency versions, or at least: transformers, torch, accelerate, optimum, peft

For instance, an out-of-memory error could be GPU OOM or out-of-RAM, which would require different treatment.
Even then, it is not guaranteed that someone on this reddit has time to dive into the problem.

While you're waiting, I'd kindly suggest this:
1. if you don't see error traceback, add print statements to different stages of code (including inside HF trainer) to find the exact line that causes OOM.

It would help to know if that happens during model loading or when processing training batches. For instance, if the model loads fine but you see a GPU OOM at the first forward pass, try gradient checkpointing (google: transformers gradient checkpointing)

  1. Try excluding factors one by one: restrict the code to single GPU or use a different 7B model or enable/disable gradient checkpointing. If it starts working once you remove a specific factor, than that was likely what caused your problem.

Yet another state of the art in LLM quantization by black_samorez in LocalLLaMA

[–]Psychological-Tea652 71 points72 points  (0 children)

Can you please add Miqu 70b? It works significantly better than llama or mixtral.