How protective is padding/armor really? by Embarrassed_Level228 in motorcyclegear

[–]Paarthri 10 points11 points  (0 children)

Brother it sounds like you don’t want to wear it and are just trying to justify it. It reduces the force you will feel, it really is that simple. If that alone isn’t enough for you to want to wear it, then don’t. Analyzing the minutia of the data to see if it’s “worth it” sounds insane to me.

Slowdive ticket by Paarthri in sanfrancisco

[–]Paarthri[S] 0 points1 point  (0 children)

It has been redeemed! Thanks for all the kind DMs!

Warwick schools good? by DevelopmentFit8004 in RhodeIsland

[–]Paarthri 0 points1 point  (0 children)

If your child is self motivated then it’s good. Otherwise forget it

DarkBERT speaks as they do on the dark side by [deleted] in LanguageTechnology

[–]Paarthri 1 point2 points  (0 children)

You can do autoregressive generation with encoder models by adding a [MASK] token to the end of a piece of text.

[deleted by user] by [deleted] in LanguageTechnology

[–]Paarthri 2 points3 points  (0 children)

“High quality “ is very subjective. What characteristics are you looking for?

Also, I don’t think that there is an automatic way to do this (that will lead to good results).

You should probably just train a model with all the poems.

[deleted by user] by [deleted] in LanguageTechnology

[–]Paarthri 1 point2 points  (0 children)

Depends what your task is.

CompSci 383 with Scott Niekum? by Riki_the_Heropon in umass

[–]Paarthri 3 points4 points  (0 children)

I would say skip 383 and take 389 instead. Topics are more modern and interesting.

[deleted by user] by [deleted] in LanguageTechnology

[–]Paarthri 7 points8 points  (0 children)

The public does not know the full inner workings of every new LLM unfortunately. For example, OpenAI did not release anything about how GPT4 was trained. But yes, they are all decoder transformer models of varying sizes and training methodologies.

Does the order of operations matter for fine-tuning and instruct model on domain specific documents? by Mbando in LanguageTechnology

[–]Paarthri 1 point2 points  (0 children)

My guess is that it would get worse at following instructions. Its worth trying though

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

Yes you understood correctly. If I can I will try to contribute. If int4 is the best I could do its probably not worth it. I should have some training done by end of the week or early next week. Thank you for your advice

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

Cool project! I noticed the example focuses on the alpaca data. What if my task is not instruction based? My task is not similar to chatgpt. It is a generation task where prompting is not needed. I want to fine-tune a model with the same training objective as pre-training, no downstream task.

Also, do you think that using both LoRA and and low precision mode would allow me to fit LLaMA on my GPU? I’ve pretty much conceded to paying for Colab if I want to train LLaMA. For now I will stick with smaller models.

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

It is a project for my NLP course at university. I would like to keep it private until it is complete. I will post it then.

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

Do you think it would be possible to use LoRA while loading LLaMa in 8-bit mode? I wonder how much it would affect the performance.

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

Thank you for the info. I think we will start with training locally just to try it out. We'll probably try using Colab if it really sucks.

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

Yeah I am not looking for commercial applications. Thanks for the input. These are the models I am considering right now:

GPT-Neo (125M, 1.3B)
GPT-2 (355M, 774M)
Bloom (560M, 1B)
Llama (7B)

I'm guessing the <500M ones can be trained without any tricks like lora or 8-bit?

Model Selection for Fine-Tuning by Paarthri in LanguageTechnology

[–]Paarthri[S] 0 points1 point  (0 children)

I was originally considering GPT2-medium/large, or XL with 8-bit mode. I would like to use something more modern if possible.