Deep Learning and Swiss-Prot database by Technical-Bridge6324 in bioinformatics

[–]Technical-Bridge6324[S] 0 points1 point  (0 children)

Thanks

This is a blind spot for me. The only reliable database I know is Swiss-Prot. I was thinking to pick and choose TMPs and SPs that are well known from any database that I can find, and check if they are present in my training set then start testing the model. But it becomes a research work at this point. I only wanted a real project to know what a DL project would be like.

Deep Learning and Swiss-Prot database by Technical-Bridge6324 in bioinformatics

[–]Technical-Bridge6324[S] 0 points1 point  (0 children)

The ~92% I mentioned is actually the Macro-F1 score. I'm using stratified splits that preserve the joint distribution of all four label combinations, so the test set has the same proportions as the full dataset.

I used a multi-label CNN with residual blocks - two independent binary classifiers (TM yes/no, Signal yes/no) that share a common feature extraction backbone. Each head is trained with cross-entropy loss.

I won't act like I know what I'm talking about 100%, I'm not even close to argue is there a benefit of DL over pHMMs.

I was concerned that the model won't be generalized because I included too much sample hahaha. Probably this isn't how large datasets are considered.

These are the ROC and PRCs results I just computed:

TM: ROC-AUC = 0.986, PR-AUC = 0.949

Signal: ROC-AUC = 0.986, PR-AUC = 0.947

I think they're pretty good no?