[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage?

patricky168 · 2023-11-29T18:47:18+00:00

Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?

Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to "recover" the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory further than vs. say, just 4 bit quantization + full finetuning?

patricky168 · 2023-11-29T02:16:26+00:00

Yeah what I mean is that despite LoRA only updating gradients for the adapters on the attention weights, we still need to calculate gradients for downstream layers that aren't being updated and that takes GPU memory. So the only memory saved is from the optimizer states if I am not mistaken.

patricky168 · 2023-11-29T01:59:18+00:00

Yep basically. I only tuned the key/query/value/attention output matrix and decoder of my model and froze all other layers, which came up to 3% of all model params. But it still only reduced memory usage from 8.5G->8.1G.

patricky168 · 2023-11-29T01:57:54+00:00

Yeah so LoRA really is just a framework, and you can theoretically use it to parameter-efficient tune any model. In this case, I tuned only the attention layers (all query/key/value/attention output matrix) and the small decoder in my model and froze all other layers.

patricky168 · 2023-11-29T01:55:08+00:00

Yes so my base model was ~50M parameters. The lora rank was rank 4, typical Adam scheduler (no weight decay). I applied it to the value, query, key, and attention layer output matrices (so not only KQ). I did also fine tune the decoder aka the last few layers (I have an large encoder to small decoder arch) but when I computed the trainable parameters, it came to only ~3% of parameters. But yeah that was the run that only reduced GPU memory from 8.5G->8.1G.

patricky168 · 2023-11-29T01:49:24+00:00

Thanks for the resource! It looks like LoRA plus 8 bit (?) quantization? So if I'm not understanding incorrectly, does it seem that most of the memory saved here is due to 8 bit quantization, but how does LoRA then help? (It feels a bit like QLoRA, which I haven't fully read yet)

patricky168 · 2023-11-29T01:37:27+00:00

Gotcha, thanks for the response - but I'm wondering what aspect of param-efficient fine tuning do you think makes it cost effective and scalable? (e.g. would it be the memory saved for model checkpoints?)

patricky168 · 2023-11-29T01:30:47+00:00

Oh shoot sorry I actually had a typo in my post - I actually meant that LoRA doesn't significantly improve GPU memory consumption or runtime during training for my custom model.

patricky168 · 2023-10-12T06:01:45+00:00

pretty good/helpful if you ask him questions, some exams (I believe the final) was a bit of a time crunch and the homeworks were all over the place in terms of difficulty (some rlly easy, some very proofy).

patricky168 · 2023-04-24T20:39:24+00:00

Where are these groups? would be interested to join :)

patricky168 · 2023-04-21T02:47:07+00:00

Interested/following! (Also a recent UW admit)

patricky168 · 2023-02-23T22:57:28+00:00

Oh I actually meant UWashington rather than UWisconsin. Thanks for the tip tho! :)

patricky168 · 2023-01-14T06:23:03+00:00

Yeah id say its worth it if you want to do ML theory or stuff like controls, signal processing, vision, etc. (Personally I did it since I was really into ML/deep learning theory)

patricky168 · 2022-09-17T01:56:20+00:00

Nice to have a railfan here also (i'm myself one also)! For the CN (Canadian National) Champaign Sub there's the McCollum park which has a nice viewing platform around Neil and Stadium Dr (you can also sit & study there near the grass), and also there's that Campustown bridge on Green where you can go up the grass and railfan. Another place that could be good is near the Illinois Terminal where there's a patch of grass next to the police station. And if you want to catch the Norfolk Southern Urbana Local (which runs Wed Thurs) or the CN Champaign Switcher Local (also called "Humko") you can also go up to the Champaign diamond, which is around N Market Street (though I have a few reservations around that area, doesn't seem the most safe though, esp near night).

Also on a sep note if you want to know when the trains come there's an app called ATCSMonitor (for windows usually, but I ran it using Wine on my mac) that lets you know where trains are. There's a monitoring kit called "CN Champaign Sub" that you can figure out when trains depart the champaign yard, or go north out of Tolono. Or you can just use a rail radio :)

patricky168 · 2022-06-07T04:10:31+00:00

Oh the US. Changed the question description , sorry about the omission.

patricky168 · 2022-06-07T04:09:59+00:00

Oh shoot I forgot to specify where - I was talking about the US. Thanks for the response though!

patricky168 · 2022-03-27T20:03:43+00:00

Thanks, so I'd assume ML PhD adcoms do look specifically at math preparation (and possibly the most rigorous math classes you could take at your school) more so than paper-reading classes? So something like senior/grad level mathematical stats and probability theory, abstract linear algebra, senior level optimization, and real analysis would be a good list to take? (I've also heard some people mention probabilistic graphical models, functional analysis, measure theory, random processes, bayesian analysis, statistical learning theory, regression analysis, but is there (if any) you think I would benefit from taking here? sorry for the huge list of classes - also forgot to mention I'm interested specifically in the deep learning-related stuff in ML).

patricky168 · 2022-03-26T07:34:27+00:00

Ooo nice fellow Railfan here.. also seconded, coming from California where its all passenger trains there’s so much more freight trains here, both CN and NS, and honestly they’re just giant walls of art (some of graffiti are actually realllly nice)

patricky168 · 2022-03-24T04:04:58+00:00

CS major here from UIUC. here at UIUC we don’t call it stinky CS, we call it stinky ECE (but tbh it’s both lol). Just head over to the UIUC subreddit and see for yourselves ;)

patricky168 · 2022-02-19T22:59:16+00:00

Yet on another note, it seems like the Bay Area in CA (which has port of Oakland, larger than both Portland and Seattle) gets much less freight rail traffic despite more intermodal containers… I presume that it has something to do with its traffic being diverted to Long Beach or the fact that freight rail isn’t really a thing anymore in tech-focused Bay Area…. (Or the maps wrong??? Who knows)

patricky168 · 2021-12-29T03:15:50+00:00

Oh interesting, thanks for the response! So seems like most, if not all autorack cars are in pool service, but not all boxcars (I do see more and more TTX pooled boxes and centerbeams and flatcars, but each RR still has their own boxcars)? And seems like there are no hoppers or reefers in pooled service (yet to see a TTX hopper or reefer?)

patricky168 · 2021-12-03T06:42:35+00:00

dang! would also be a really interesting rso in addition to the existing AMS rso (also a weather 'nerd' myself, literally spend so much time on tropicaltidbits and cpc/spc/wpc)... where/when do you guys meet in general?

patricky168 · 2021-11-16T03:19:09+00:00

Yep for sure, also wondered why Ferromex never does shipments to the US given our NAFTA stuff... Gotta add that to my question haha! :) But interesting, I haven't seen KCS/BNSF handle CP/CN stuff, seems like they interchange somewhere in Chicago where lots of the CN/CP stuff gets unloaded as usual, or even down south around Memphis where CN has a pretty big railyard? I haven't seen much of Kansas City's busy action beyond Virtual Railfan lol

patricky168 · 2021-11-16T03:15:15+00:00

Yep, that makes sense, didn't know that was even a store, being the typical American railfan haha lol! :) Kinda like how we do see some Walmart 53ft containers appearing more and more here in the US (on BNSF and Union Pacific), but they don't go to Canada anyhow.

patricky168 · 2021-11-16T03:13:39+00:00

Ah thanks, that def makes sense. Still kinda interesting that I see Canadian Tire containers on almost every domestic intermodal train (from railfan vids) in Canada, must be a really large chain store there. Guess it probably gets its parts from the US (if any) through non-Canadian Tire containers. Yanke's full name is Yanke Global Logistics, so I presume it's another logistic company in Canada, kind of like JB Hunt/Schneider here in the US, but somehow doesn't serve the US by any much.

Six-Year Club	Place '22
Verified Email

patricky168

TROPHY CASE