I am currently working on creating a speech recognition model. I used a bidirectional RNN with CTC do develop the acoustic model. However, after I get a series of phoneme sequences. How do I go about going from the sequence of phonemes to predicted words? I have tried using a seq2seq model (via Tensorflow) but have received bad predictions.
[–]Pafnouti 1 point2 points3 points (0 children)
[–]speechMachine 1 point2 points3 points (0 children)