Recently Google’s free text to speech api has made the rounds. The reverse is also possible, converting speech to text.
A combination of several technologies and open source tools make this possible. In the browser, Flash is used to access the microphone and stream the audio to an RTMP server. Red5 is used because its a versatile media server that has the benefit of being open source and free.
Once that audio is received on the server, it needs to be converted to text. There are many speech recognition engines to choose from. Many are proprietary and provide very good accuracy results but they are pricey and closed source. There are some state of the art opensource speech recognition engines too, such as julius and Sphinx to name a couple. The speechapi service uses sphinx because it is license friendly and has a strong community.
Recognition results are returned to your web page in a callback that you specify in the speechapi constructor. The results are passed from the server to client as a JSON string. The result object contains the raw text results as well as other information that can be useful for you speech client, like pronunciation and “grammar tags” that can be useful for semantic interpretation of the results.
Thanks for your interest