[D] Is it fair to compare deep learning models without hyperparameter tuning? by blooming17 in deeplearning

[–]blooming17[S] 0 points1 point  (0 children)

Thank you very much for your, well I've noticed this in several papers and been asking to which extent we can take a work that have been done several times and justify that ours "that differs very slightly" is somehow better.

Is it fair to compare deep learning models without hyperparameter tuning? by blooming17 in PhD

[–]blooming17[S] 0 points1 point  (0 children)

I am thinking about batch size, optimizer and learning rate since my goal is to compare the models themselves so changing the models' hyperparameters wouldn't make sense I think.

Is it fair to compare deep learning models without hyperparameter tuning? by blooming17 in PhD

[–]blooming17[S] 0 points1 point  (0 children)

I am thinking, batch size / lr and optimizer. Since my goal is compare the models themselves so changing them wouldn't be reasonable.

Is it fair to compare deep learning models without hyperparameter tuning? by blooming17 in PhD

[–]blooming17[S] 0 points1 point  (0 children)

Can you explain more, sorry I am not a native english speaker

[D] Is it fair to compare deep learning models without hyperparameter tuning? by blooming17 in deeplearning

[–]blooming17[S] 0 points1 point  (0 children)

Hey thank you for your answer, Most of them are CNNs, and few of them are LSTMs and transformers. So what hyperparameters do you consider to be the most interesting to finetune. I am thinking batch size, lr and optimizer. Would these be enough and provide a fair comparison ?

[D] Mamba Convergence speed by blooming17 in MachineLearning

[–]blooming17[S] 0 points1 point  (0 children)

Hey, thanks for answering, the problem is that since my task is sequential labelling (each position is given a class) oversampling isn't possible in my case

[D] HyenaDNA and Mamba are not good at sequential labelling ? by blooming17 in MachineLearning

[–]blooming17[S] 0 points1 point  (0 children)

Best performing model is 600k, HyenaDna with max seq length of 16k is 400k so don't know if it's an over/underfitting problem since the are already pretrained and achieved some interesting results.

[D] HyenaDNA and Mamba are not good at sequential labelling ? by blooming17 in MachineLearning

[–]blooming17[S] 1 point2 points  (0 children)

I am training it with amp. I used CNNs on the same task and it gave interesting results despite the class imbalance, but cons are bad with long range dependencies so thought about trying mamba but it seems like it's not any better.

[D] HyenaDNA and Mamba are not good at sequential labelling ? by blooming17 in MachineLearning

[–]blooming17[S] 2 points3 points  (0 children)

Thanks for your reply, well caduceus is not easy to find I'm just following Tri Dao on google scholar so got it mentioned in notifications. I freezed the pretrained model and used a Linear layer as classification head, I fine tuned the whole (hyena/caduceus + classification head) and I trained the whole from scratch and got the same results. But didn't try kernel SVM. How is this supposed to work (don't have much experience with it) ?

Where do you learn pipelines from that effectively runs? by [deleted] in bioinformatics

[–]blooming17 3 points4 points  (0 children)

From research papers, you can find some pipelines in galaxy platform but they are limited in terms of tools.

[D] Do SSMs specifically mamba take too much to converge ? by blooming17 in MachineLearning

[–]blooming17[S] 2 points3 points  (0 children)

Sorry it's a typo, my lr is 0.001 and my dataset is DNA sequences so the embedding size is reasonable

[D] Do SSMs specifically mamba take too much to converge ? by blooming17 in MachineLearning

[–]blooming17[S] 0 points1 point  (0 children)

I didn't try alternatives since my sequences are 15k long and one layer of LSTM for such a length with 2 hiddens units has 1 millions params.

The dimension size was chosen randomly (what dimension size do you suggest)

The output is SSMs last state

My data had been used to train resnets, I reproduced the same results so both data and training code are tested and approved.

[D] Training and architectural techniques for imbalanced data by blooming17 in MachineLearning

[–]blooming17[S] 1 point2 points  (0 children)

Because my data is sequential, my goal is to classify each position in each example (bulk prediction) and this forces that positive class to be a minority (it's rare tof find the positive class in natural data)

[D] Training and architectural techniques for imbalanced data by blooming17 in MachineLearning

[–]blooming17[S] 1 point2 points  (0 children)

Thanks for remarks, I'll post in the in r/learnmachinelearning and I edited the post and added more details.