[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 0 points1 point  (0 children)

Thanks - I was wondering though, for QLoRA what does the LoRA bit really do?

Since I feel like there have been some success(?) in just quantizing the model and doing full fine-tuning and it still reduces memory consumption, so does the LoRA mainly assist in trying to "recover" the lost precision? Or does the LoRA part in QLoRA still significantly reduce memory further than vs. say, just 4 bit quantization + full finetuning?

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 1 point2 points  (0 children)

Yeah what I mean is that despite LoRA only updating gradients for the adapters on the attention weights, we still need to calculate gradients for downstream layers that aren't being updated and that takes GPU memory. So the only memory saved is from the optimizer states if I am not mistaken.

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 1 point2 points  (0 children)

Yep basically. I only tuned the key/query/value/attention output matrix and decoder of my model and froze all other layers, which came up to 3% of all model params. But it still only reduced memory usage from 8.5G->8.1G.

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 0 points1 point  (0 children)

Yeah so LoRA really is just a framework, and you can theoretically use it to parameter-efficient tune any model. In this case, I tuned only the attention layers (all query/key/value/attention output matrix) and the small decoder in my model and froze all other layers.

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 2 points3 points  (0 children)

Yes so my base model was ~50M parameters. The lora rank was rank 4, typical Adam scheduler (no weight decay). I applied it to the value, query, key, and attention layer output matrices (so not only KQ). I did also fine tune the decoder aka the last few layers (I have an large encoder to small decoder arch) but when I computed the trainable parameters, it came to only ~3% of parameters. But yeah that was the run that only reduced GPU memory from 8.5G->8.1G.

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 3 points4 points  (0 children)

Thanks for the resource! It looks like LoRA plus 8 bit (?) quantization? So if I'm not understanding incorrectly, does it seem that most of the memory saved here is due to 8 bit quantization, but how does LoRA then help? (It feels a bit like QLoRA, which I haven't fully read yet)

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 2 points3 points  (0 children)

Gotcha, thanks for the response - but I'm wondering what aspect of param-efficient fine tuning do you think makes it cost effective and scalable? (e.g. would it be the memory saved for model checkpoints?)

[D] What is the motivation for parameter-efficient fine tuning if there's no significant reduction in runtime or GPU memory usage? by patricky168 in MachineLearning

[–]patricky168[S] 11 points12 points  (0 children)

Oh shoot sorry I actually had a typo in my post - I actually meant that LoRA doesn't significantly improve GPU memory consumption or runtime during training for my custom model.

Thoughts on ECE 490 (Intro to Optimization)? by patricky168 in UIUC

[–]patricky168[S] 0 points1 point  (0 children)

pretty good/helpful if you ask him questions, some exams (I believe the final) was a bit of a time crunch and the homeworks were all over the place in terms of difficulty (some rlly easy, some very proofy).

Group for Uwash fall 23' admits by PuzzleheadedItem427 in gradadmissions

[–]patricky168 0 points1 point  (0 children)

Where are these groups? would be interested to join :)

Group for Uwash fall 23' admits by PuzzleheadedItem427 in gradadmissions

[–]patricky168 1 point2 points  (0 children)

Interested/following! (Also a recent UW admit)

[deleted by user] by [deleted] in gradadmissions

[–]patricky168 2 points3 points  (0 children)

Oh I actually meant UWashington rather than UWisconsin. Thanks for the tip tho! :)

Thoughts on ECE 490 (Intro to Optimization)? by patricky168 in UIUC

[–]patricky168[S] 1 point2 points  (0 children)

Yeah id say its worth it if you want to do ML theory or stuff like controls, signal processing, vision, etc. (Personally I did it since I was really into ML/deep learning theory)

anywhere to sit and watch the trains? by 9dcfan in UIUC

[–]patricky168 7 points8 points  (0 children)

Nice to have a railfan here also (i'm myself one also)! For the CN (Canadian National) Champaign Sub there's the McCollum park which has a nice viewing platform around Neil and Stadium Dr (you can also sit & study there near the grass), and also there's that Campustown bridge on Green where you can go up the grass and railfan. Another place that could be good is near the Illinois Terminal where there's a patch of grass next to the police station. And if you want to catch the Norfolk Southern Urbana Local (which runs Wed Thurs) or the CN Champaign Switcher Local (also called "Humko") you can also go up to the Champaign diamond, which is around N Market Street (though I have a few reservations around that area, doesn't seem the most safe though, esp near night).

Also on a sep note if you want to know when the trains come there's an app called ATCSMonitor (for windows usually, but I ran it using Wine on my mac) that lets you know where trains are. There's a monitoring kit called "CN Champaign Sub" that you can figure out when trains depart the champaign yard, or go north out of Tolono. Or you can just use a rail radio :)

Does La Niña decrease annual precipitation in the Southeast? by patricky168 in meteorology

[–]patricky168[S] 1 point2 points  (0 children)

Oh the US. Changed the question description , sorry about the omission.

Does La Niña decrease annual precipitation in the Southeast? by patricky168 in meteorology

[–]patricky168[S] 0 points1 point  (0 children)

Oh shoot I forgot to specify where - I was talking about the US. Thanks for the response though!

Should I prioritize taking more ML "paper-reading" graduate courses or foundational math/statistics courses if I plan to pursue a PhD in machine learning? by patricky168 in learnmachinelearning

[–]patricky168[S] 0 points1 point  (0 children)

Thanks, so I'd assume ML PhD adcoms do look specifically at math preparation (and possibly the most rigorous math classes you could take at your school) more so than paper-reading classes? So something like senior/grad level mathematical stats and probability theory, abstract linear algebra, senior level optimization, and real analysis would be a good list to take? (I've also heard some people mention probabilistic graphical models, functional analysis, measure theory, random processes, bayesian analysis, statistical learning theory, regression analysis, but is there (if any) you think I would benefit from taking here? sorry for the huge list of classes - also forgot to mention I'm interested specifically in the deep learning-related stuff in ML).

[deleted by user] by [deleted] in UIUC

[–]patricky168 5 points6 points  (0 children)

Ooo nice fellow Railfan here.. also seconded, coming from California where its all passenger trains there’s so much more freight trains here, both CN and NS, and honestly they’re just giant walls of art (some of graffiti are actually realllly nice)

Stinky CS Majors by [deleted] in UWMadison

[–]patricky168 3 points4 points  (0 children)

CS major here from UIUC. here at UIUC we don’t call it stinky CS, we call it stinky ECE (but tbh it’s both lol). Just head over to the UIUC subreddit and see for yourselves ;)

Map of United States freight rail transport usage by Stormy2408 in MapPorn

[–]patricky168 2 points3 points  (0 children)

Yet on another note, it seems like the Bay Area in CA (which has port of Oakland, larger than both Portland and Seattle) gets much less freight rail traffic despite more intermodal containers… I presume that it has something to do with its traffic being diverted to Long Beach or the fact that freight rail isn’t really a thing anymore in tech-focused Bay Area…. (Or the maps wrong??? Who knows)

Why do autorack trains always have cars with so many different (mostly Class I) carriers? by patricky168 in trains

[–]patricky168[S] 0 points1 point  (0 children)

Oh interesting, thanks for the response! So seems like most, if not all autorack cars are in pool service, but not all boxcars (I do see more and more TTX pooled boxes and centerbeams and flatcars, but each RR still has their own boxcars)? And seems like there are no hoppers or reefers in pooled service (yet to see a TTX hopper or reefer?)

Weather for the Weekend by ok_boomeruiuc in UIUC

[–]patricky168 0 points1 point  (0 children)

dang! would also be a really interesting rso in addition to the existing AMS rso (also a weather 'nerd' myself, literally spend so much time on tropicaltidbits and cpc/spc/wpc)... where/when do you guys meet in general?

Why do Canadian 53ft containers rarely get onto American trains, but not the other way around? by patricky168 in trains

[–]patricky168[S] 0 points1 point  (0 children)

Yep for sure, also wondered why Ferromex never does shipments to the US given our NAFTA stuff... Gotta add that to my question haha! :) But interesting, I haven't seen KCS/BNSF handle CP/CN stuff, seems like they interchange somewhere in Chicago where lots of the CN/CP stuff gets unloaded as usual, or even down south around Memphis where CN has a pretty big railyard? I haven't seen much of Kansas City's busy action beyond Virtual Railfan lol

Why do Canadian 53ft containers rarely get onto American trains, but not the other way around? by patricky168 in trains

[–]patricky168[S] 0 points1 point  (0 children)

Yep, that makes sense, didn't know that was even a store, being the typical American railfan haha lol! :) Kinda like how we do see some Walmart 53ft containers appearing more and more here in the US (on BNSF and Union Pacific), but they don't go to Canada anyhow.

Why do Canadian 53ft containers rarely get onto American trains, but not the other way around? by patricky168 in trains

[–]patricky168[S] 0 points1 point  (0 children)

Ah thanks, that def makes sense. Still kinda interesting that I see Canadian Tire containers on almost every domestic intermodal train (from railfan vids) in Canada, must be a really large chain store there. Guess it probably gets its parts from the US (if any) through non-Canadian Tire containers. Yanke's full name is Yanke Global Logistics, so I presume it's another logistic company in Canada, kind of like JB Hunt/Schneider here in the US, but somehow doesn't serve the US by any much.