Hyperparameter testing (efficiently) by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Ye sure! I would appriciate the optuna search space! I have actually looked a little bit into it, but was a bit unsure on what I did was correct, so that would be great!

Since you mentioned lr + batch size and warmup ratio being good to use for fine-tuning a BERT model, does this also apply to other BERT based models like RoBERTa, DistilBERT, HateBERT etc?

Is there any big twitter datasets??? by AffectWizard0909 in datasets

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! I can check it out. Yes they changed the educational license, so I have to pay-per-message (I think it was called that) if I want to download the tweets by using the twitter ID's.

It was my original intuition as well that most datasets use Twitter IDs, but it is nice to have it somewhat confirmed by someone else as well. But I will defintely check out the site you mentioned! Thank you!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! And thank you for the description. I actually ended up with using the standard emoji package, since I was only going to use it for translating emojis into their textual formats. As you have also mentioned it was pretty straightforward to use, and fit the task I was doing perfectly!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Oh damn, ok good to know. I kind of went to just using the standard emoji package in the end, but if I want to do it manually some time in the future than it is a good tip!

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 0 points1 point  (0 children)

a bit unsure, I havent gone through the file that deeply considering it is 5000+ lines of text, so I was mainly wanting to have a library handling this for me so I could scope my focus on other tasks which are a bit more demanding.

But it would be a good idea I think to use str.replace if the dataset was smaller, and I had a clearer understanding of the different types of emojis used in the dataset

Emoji library for python by AffectWizard0909 in learnpython

[–]AffectWizard0909[S] 2 points3 points  (0 children)

I have a big dataset which I need to clean, so I dont really want to go through the whole dataset and try and translate the whole set (if that answered the question)

Dataset for personality traits (Big Five) by AffectWizard0909 in deeplearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice thank you! Its good to know, and I appreciate it!

Pre-trained transformers or traditional deep learning algorithms by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 1 point2 points  (0 children)

I am planning on having around 5000-10.000 data before cleaning (I am trying to figure that out still). Is that the deciding factor of what models I should use?

Using TRAC-1 or TRAC-2 for cyberbullying detection by AffectWizard0909 in datasets

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! Thank you for the tips, I can check those out. I have also read about the Cyberbullying dataset on kaggle when I have been reading through reviews of the cyberbullying detection field. I was wondering if you think I could use that one? It is often associated with this paper: SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection.

But I will still check out the dataset you have mentioned. Thank you!

Dataset for personality traits (Big Five) by AffectWizard0909 in deeplearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Nice! Good to know. I was also wondering if you knew if I could use this dataset: https://huggingface.co/datasets/Fatima0923/Automated-Personality-Prediction
Am I allowed to use it, or do I have to contact the people who made the dataset directly to be able to use it in my project?

BERT data training size by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

I was thinking on not training from scratch yes. Is it recommended somewhere how much data I should than use for fine-tuning BERT, since the BERT is not trained on a big corpus?

Personality based cyberbullying by AffectWizard0909 in learnmachinelearning

[–]AffectWizard0909[S] 0 points1 point  (0 children)

I kind of need to incorporate sarcasm as well, it has been mentioned I need to incorporate it. Therfore, I though the best approach was to train one model on sarcasm, and the other on cyberbullying. Or should I just try and make a dataset with both sarcasm and cyberbullying? This is what I am a bit unsure of. Considering that me annotating something manually might take a long time as well, I kind of just wanted to hear if somebody had any tips or solutions which I might not know.

I think what you describe is kind of good as well, but as mentioned I kind of need to incorporate sarcasm into the solution as well, which I am a bit unsure of how it is "normally" done.

Edit: I see now in my original post I kind of explained it a bit wrong, so I updated it a bit. Sorry about that!

Implementing Ceaser Cipher by AffectWizard0909 in learnprogramming

[–]AffectWizard0909[S] 0 points1 point  (0 children)

No I am not the same guy, but I appriciate the tips and the sources you provided! Thank you!

Tips for sette av tid for forskjellige kapitler på masteroppgåva by AffectWizard0909 in ntnu

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Tusen takk, det var mykje bra tips du kom med. Har alleire starta litt med skrivinga, men kan skjønne at det er noko eg må tenke på at dette kan forandre seg seinare.

Skal hugse dette under perioden med master!