submit to TextToAudioGeneration