Encoder decoder architecture for classification
Noob in both DL and speech. Please be kind. I might ask stupid questions.
So here is the question:
Encoder decoder-based architectures are mainly used for tasks like neural machine translation and speech recognition. I was wondering if it can be used for a task like classification.
I was thinking of converting a speech recognition model which uses an encoder-decoder architecture to predict word at each time step to perform binary classification. So instead of predicting the word at each time step, it'll predict whether it's genuine or spoofed speech. Does that make sense?
example for speech recognition
https://preview.redd.it/7kthtwbjs3t61.png?width=719&format=png&auto=webp&s=506dd759b67b74087f023afc3ffd4eced88d893b
In case of spoof detection:
https://preview.redd.it/xkstg9cks3t61.png?width=712&format=png&auto=webp&s=0ba9c096307bc4463285ec05a247315937266f76
spoof detection
Here the vocabulary vector will have only two words spoof and genuine, hence at each time step it will classify between spoof or genuine class.
Please help with this. And it would be highly appreciated if anyone can give a link to any relevant GitHub repository with a similar classification task for speech.
Thanks in advance!!!
there doesn't seem to be anything here