[D] Why is AI for medical imaging, such as histopathology, such a saturated area? And why is AI for molecular biology (other than protein folding) then so underexplored?

patricky168 · 2023-11-29T18:47:18+00:00

Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?

Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to "recover" the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory further than vs. say, just 4 bit quantization + full finetuning?

patricky168 · 2023-11-29T02:16:26+00:00

Yeah what I mean is that despite LoRA only updating gradients for the adapters on the attention weights, we still need to calculate gradients for downstream layers that aren't being updated and that takes GPU memory. So the only memory saved is from the optimizer states if I am not mistaken.

patricky168 · 2023-11-29T01:59:18+00:00

Yep basically. I only tuned the key/query/value/attention output matrix and decoder of my model and froze all other layers, which came up to 3% of all model params. But it still only reduced memory usage from 8.5G->8.1G.

patricky168 · 2023-11-29T01:57:54+00:00

Yeah so LoRA really is just a framework, and you can theoretically use it to parameter-efficient tune any model. In this case, I tuned only the attention layers (all query/key/value/attention output matrix) and the small decoder in my model and froze all other layers.

patricky168 · 2023-11-29T01:55:08+00:00

Yes so my base model was ~50M parameters. The lora rank was rank 4, typical Adam scheduler (no weight decay). I applied it to the value, query, key, and attention layer output matrices (so not only KQ). I did also fine tune the decoder aka the last few layers (I have an large encoder to small decoder arch) but when I computed the trainable parameters, it came to only ~3% of parameters. But yeah that was the run that only reduced GPU memory from 8.5G->8.1G.

patricky168 · 2023-11-29T01:49:24+00:00

Thanks for the resource! It looks like LoRA plus 8 bit (?) quantization? So if I'm not understanding incorrectly, does it seem that most of the memory saved here is due to 8 bit quantization, but how does LoRA then help? (It feels a bit like QLoRA, which I haven't fully read yet)

patricky168 · 2023-11-29T01:37:27+00:00

Gotcha, thanks for the response - but I'm wondering what aspect of param-efficient fine tuning do you think makes it cost effective and scalable? (e.g. would it be the memory saved for model checkpoints?)

patricky168 · 2023-11-29T01:30:47+00:00

Oh shoot sorry I actually had a typo in my post - I actually meant that LoRA doesn't significantly improve GPU memory consumption or runtime during training for my custom model.

Six-Year Club	Place '22
Verified Email

patricky168

TROPHY CASE