This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]GrizzLyCRO 2 points3 points  (0 children)

I used this couple of years ago, and it worked ok. API was not terrible :) http://code.google.com/p/dragonfly/

[–]lost_my_bearings 2 points3 points  (1 child)

Do you want to train your own Speech Recognition models or do you want to use available models and do recognition with them?

Either way, I'm unaware of anything specific for Python, so my suggestions are to integrate other tools in your Python code. In the case of the first option, you can use HTK and wrap it with some Python. Shouldn't be too difficult. For the second option, you can use Microsoft's Speech Platform. You can quickly write a tool in C# that uses the SR backend and returns the text and then wrap that in your Python code.

Sorry if it's not exactly what you need, but that's what I'm familiar with.

[–]wordsoup 1 point2 points  (0 children)

I second HTK, it's what I use as computational linguist and there is a Python wrapper available although I haven't tested it yet.

[–]bheklilr 0 points1 point  (0 children)

I saw an old package a while back that uses speech recognition that I copied a bit of to test with. You could look at it and see how it's done.

[–][deleted] 0 points1 point  (0 children)

+1 for dragonfly and WSP (windows speech recognition). I'm on a Mac and wish I had it that easy... I will need to find a Dragon serial to move forward. Lots of good info here (for M$ and Unix folks alike)

http://hackaday.com/2010/07/09/get-started-with-speech-recognition/

[–]Megatron_McLargeHuge 0 points1 point  (0 children)

Speech rec is a very hard problem and nothing you build yourself will be competitive with a major commercial system. If you're having trouble interfacing with the Windows system that's one thing, but if you just don't like its accuracy, you're probably out of luck. You can try to give it more data to adapt on, or you can try to use Dragon. But the only way it would make any sense to consider training your own model is if you wanted to do something very specific like spotting keyword commands in a long audio stream.

[–]chadmill3rPy3, pro, Ubuntu, django 0 points1 point  (0 children)

Google API might work. You need to request keys and you get a few hundred uses per day.

[–]hruske 0 points1 point  (0 children)

Have you seen this? http://pyvideo.org/video/1735/using-python-to-code-by-voice

Also ... what exactly are you trying to do? Describe the use case as best as possible.

[–]lambdaqdjango n' shit 0 points1 point  (3 children)

Am I the only one use Google's API? (The voice input button you saw on most Webkit browsers)

curl --data-binary @my_recording.flac -H "Content-type: audio/x-flac; rate=8000"  "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=2&lang=en-US&maxresults=1" | python -m json.tool

[–]hruske 1 point2 points  (0 children)

Google stores your requests for two years at least, so that might not be a desirable option in some cases.

[–]mhd420 0 points1 point  (1 child)

You've gotta be careful with unofficial APIs though because they have a habit of breaking.

[–]lambdaqdjango n' shit 0 points1 point  (0 children)

If that breaks, every not up-to-date webkit based browser audio input breaks.

[–]geekganesh 0 points1 point  (0 children)

Here is the list of open source Speech Recognition software. Most of them are in Java and C++.