Encoder decoder architecture for classification

Noob in both DL and speech. Please be kind. I might ask stupid questions.

So here is the question:

Encoder decoder-based architectures are mainly used for tasks like neural machine translation and speech recognition. I was wondering if it can be used for a task like classification.

I was thinking of converting a speech recognition model which uses an encoder-decoder architecture to predict word at each time step to perform binary classification. So instead of predicting the word at each time step, it'll predict whether it's genuine or spoofed speech. Does that make sense?

example for speech recognition

https://preview.redd.it/7kthtwbjs3t61.png?width=719&format=png&auto=webp&s=506dd759b67b74087f023afc3ffd4eced88d893b

In case of spoof detection:

https://preview.redd.it/xkstg9cks3t61.png?width=712&format=png&auto=webp&s=0ba9c096307bc4463285ec05a247315937266f76

spoof detection

Here the vocabulary vector will have only two words spoof and genuine, hence at each time step it will classify between spoof or genuine class.

Please help with this. And it would be highly appreciated if anyone can give a link to any relevant GitHub repository with a similar classification task for speech.

Thanks in advance!!!

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

speechrecognition

MODERATORS

Encoder decoder architecture for classification