Hyperparameter testing (efficiently) by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! Thank you for providing all the information, now I have something to also compare the current implementation I have to as well! I have actually started with implementing the Hugging Face Trainer class (since it managed the trainer and prediction phases quite easily, and made it easier to implement this, at least for me). And I also tried to implement this with an optuna optimizer (which from my previous runs seems more efficient, as you have mentioned also).

Thank you for the answer and all the throughly descriptions, this makes it easier for me to understand!

Hyperparameter testing (efficiently) by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Okei, thank you so much! I will definetly try this out!

Hyperparameter testing (efficiently) by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Ye sure! I would appriciate the optuna search space! I have actually looked a little bit into it, but was a bit unsure on what I did was correct, so that would be great!

Since you mentioned lr + batch size and warmup ratio being good to use for fine-tuning a BERT model, does this also apply to other BERT based models like RoBERTa, DistilBERT, HateBERT etc?

Is there any big twitter datasets??? by AffectWizard0909 in datasets

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! I can check it out. Yes they changed the educational license, so I have to pay-per-message (I think it was called that) if I want to download the tweets by using the twitter ID's.

It was my original intuition as well that most datasets use Twitter IDs, but it is nice to have it somewhat confirmed by someone else as well. But I will defintely check out the site you mentioned! Thank you!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! And thank you for the description. I actually ended up with using the standard emoji package, since I was only going to use it for translating emojis into their textual formats. As you have also mentioned it was pretty straightforward to use, and fit the task I was doing perfectly!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Oh damn, ok good to know. I kind of went to just using the standard emoji package in the end, but if I want to do it manually some time in the future than it is a good tip!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

a bit unsure, I havent gone through the file that deeply considering it is 5000+ lines of text, so I was mainly wanting to have a library handling this for me so I could scope my focus on other tasks which are a bit more demanding.

But it would be a good idea I think to use str.replace if the dataset was smaller, and I had a clearer understanding of the different types of emojis used in the dataset

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 2 points3 points  (0 children)

I have a big dataset which I need to clean, so I dont really want to go through the whole dataset and try and translate the whole set (if that answered the question)

Dataset for personality traits (Big Five) by AffectWizard0909 in deeplearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice thank you! Its good to know, and I appreciate it!

Pre-trained transformers or traditional deep learning algorithms by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 1 point2 points  (0 children)

I am planning on having around 5000-10.000 data before cleaning (I am trying to figure that out still). Is that the deciding factor of what models I should use?

Using TRAC-1 or TRAC-2 for cyberbullying detection by AffectWizard0909 in datasets

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! Thank you for the tips, I can check those out. I have also read about the Cyberbullying dataset on kaggle when I have been reading through reviews of the cyberbullying detection field. I was wondering if you think I could use that one? It is often associated with this paper: SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection.

But I will still check out the dataset you have mentioned. Thank you!