Author Topic: Speech Recognition with Javascript  (Read 88 times)

Offline Muhammad Younus

  • Newbie
  • *
  • Posts: 23
  • Test
    • View Profile
Speech Recognition with Javascript
« on: April 20, 2017, 02:49:35 PM »
Recently Google’s free text to speech api has made the rounds. The reverse is also possible, converting speech to text.

With speechapi.com’s javascript API, it is possible to build interesting speech-web mashups that include both speech-to-text as well as text-to-speech.

A combination of several technologies and open source tools make this possible. In the browser, Flash is used to access the microphone and stream the audio to an RTMP server. Red5 is used because its a versatile media server that has the benefit of being open source and free.

Once that audio is received on the server, it needs to be converted to text. There are many speech recognition engines to choose from. Many are proprietary and provide very good accuracy results but they are pricey and closed source. There are some state of the art opensource speech recognition engines too, such as julius and Sphinx to name a couple. The speechapi service uses sphinx because it is license friendly and has a strong community.

Now this is great, we can transmit audio and convert it to text but we need to control the process and use the results in the web page. That is where Javascript comes in. Speechapi.com provides a Javascript API. There is a setupRecognition method that sets up the grammar used in the speech-to-text process. There is a simple grammar mode, where you can just provide a comma seperated list of words. JSGF is also supported and is useful for more complex grammars. There are also methods that communicate with the flash control to indicate when to start transmitting audio and when to stop transmitting audio. You can also use the flash controls built in press to speak button to specify the speech endpoints.

Recognition results are returned to your web page in a callback that you specify in the speechapi constructor. The results are passed from the server to client as a JSON string. The result object contains the raw text results as well as other information that can be useful for you speech client, like pronunciation and “grammar tags” that can be useful for semantic interpretation of the results.

We think this technology is pretty cool and we encourage you to try it out. You can try it for free at speechapi.com where you just include a few lines of of javascript and html into your webpage to enable speech recognition. We are also open sourcing the package over the next few months, so sign up at our site if your interested.
Thanks for your interest
Younus
Muhammad Younus
Software Engineering Department.

Offline Tahmid

  • Full Member
  • ***
  • Posts: 104
  • Research to discover
    • View Profile
    • Mr. Tahmid Sami Rahman
Re: Speech Recognition with Javascript
« Reply #1 on: April 20, 2017, 03:51:33 PM »
Informative post.
Best Regards

Tahmid Sami Rahman

Lecturer, Department of EEE
Faculty of Engineering
Room: 506, Main Campus
102 Shukrabad, Dhanmondi,
Dhaka-1207
Phone: +8801726140559