Speech API too slow [35902969]

WAI

Bug

Status Update

No update yet.

Description

we...@gmail.com

created issue #1

Aug 19, 2016 05:14AM

Why is the Speech API so slow in transcription?

I tested an audio file of about 18s in length using the HTTP interface. The speech engine takes almost 16s to return a result. Am I using the wrong parameters or does the Speech API needs some additional work after the beta? By the way, using other speech recognition APIs takes about 3s for this file.

I am directly testing the REST interface using curl, synchronize speech recognition v1beta1. My OS is Mac OS X El Capitan.

time curl "

https://speech.googleapis.com/v1beta1/speech:syncrecognize?key={MY_API_KEY}" --header "Content-Type: application/json" --data '{"config":{"encoding":"FLAC","sample_rate":16000,"language_code":"en-US"},"audio":{"uri":"gs://accobot-speech/new_record.flac"}}'

Comments

ar...@google.com <ar...@google.com> #2Aug 20, 2016 10:54PM

Thanks for your issue report. Can you confirm if this is taking consistently the same amount of time? Could you advise on which alternative APIs you've used as a benchmark?

If API operations are taking a long time, one potential workaround is to use the speech.asyncrecognize method to process jobs in the background [1].

Note that the Speech API currently makes no claims about performance, and as a Beta release the Speech API is not subject to any SLA and is not intended for real time usage in critical applications (see footnote at [2]).

[1]

https://cloud.google.com/speech/reference/rest/v1beta1/speech/asyncrecognize
[2]

https://cloud.google.com/speech/

we...@gmail.com <we...@gmail.com> #3Aug 23, 2016 02:54AM

Hi. The amount of time it takes to transcribe audio files (only English tested) that is around 20 seconds in length fall consistently between 15 seconds and 20 seconds, using the curl command listed in the original post.

The benchmark was:
1. MS Bing Speech API [1], which takes roughly 6-7 seconds to transcribe audios of similar length
2. IBM Watson Speech to Text [2], which also gets the job done for less than 10 seconds

The speech.asyncrecognize method is relatively OK for real-time operations, which I am gladly using now. However, in cases where we have to transcribe bulk audio files, speech.asyncrecognize method takes even longer time to complete than the syncregocnize method.

Surprisingly, one of my friend is a Chrome developer and the Chrome Speech API (should be powered by the same engine) is amazingly fast. I passed my audio files to him for a testing and the transcription can be done well below 4 seconds. I wonder what is going on in the backend.

[1]

https://www.microsoft.com/cognitive-services/en-us/speech-api
[2]

https://www.ibm.com/watson/developercloud/speech-to-text.html

ar...@google.com <ar...@google.com> #4Aug 23, 2016 04:39PM

Status: Won't Fix (Intended Behavior)

The Chrome Web Speech API is actually a different implementation which has been around since Chrome 25, released on February 21st 2013 [1]. The Web Speech API is web standard which is supported across multiple browsers [2] and uses traditional speech processing algorithms.

The Cloud Speech API is a new technology which uses Google's advanced deep learning neural network algorithms. This is similar to how the Bing Speech API and Watson Speech to Text API work. As the Speech API is still a beta product, we don't make any claims that it will be more performant than established solutions, however speed will improve over time as our algorithms become more optimized and more resources are committed to the service.

I will close this issue out as performance benchmarks aren't currently considered a defect with regard to expected behavior, however rest assured this is something we're constantly working on improving. For now if accuracy and language support is less important and speed is more of a concern, the Chrome Web Speech API is an good alternative.

For more general discussion on performance, I'd recommend posting a topic to the 'cloud-speech-discuss' forum [3].

[1]

googlechromereleases.blogspot.com/2013/02/stable-channel-update_21.html
[2]

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
[3]

https://groups.google.com/forum/#!topic/cloud-speech-discuss/

ar...@google.com <ar...@google.com> #5Aug 23, 2016 04:45PM

It looks like the forum link I posted wasn't correct, here's the correct link:

https://groups.google.com/d/forum/cloud-speech-discuss

Issue 35902969

Description

Issue summary

Comments

ar...@google.com <ar...@google.com> #2Aug 20, 2016 10:54PM

we...@gmail.com <we...@gmail.com> #3Aug 23, 2016 02:54AM

ar...@google.com <ar...@google.com> #4Aug 23, 2016 04:39PM

ar...@google.com <ar...@google.com> #5Aug 23, 2016 04:45PM

Add comment

Issue metadata