The customer has some question regarding the stability score when using streaming response of Speech-to Text API [306517539]

Assigned

Feature Request

Status Update

No update yet.

Description

zu...@google.com

created issue #1

Oct 20, 2023 02:15AM

Problem you have encountered: The customer is using the Speech-to-Text API with latest_short model and lastest_long model, they discovered that “ is_final” is not displayed when a part of the sentence was given a stability score of 0.9. The customer assumed that a stability score of 0.9 means that the sentence was perfectly recognized and before it moved to the next part of the sentence, the API should give is_final to the part which scored 0.9.

Work done:
1.The customer only tried latest_short model [1] at the beginning, based on the screenshot “latest_short,png” in the feed. The “I want to open” part got a stability score of 0.9, however, after that, we can see the API did not give “is_final” to this part, instead, the API continued to rate the next part, which will be “fold”.
2. We tried the same audio with the “latest_long” model as well, and the same situation was discovered. “Is_final” was only given after the whole process was completed. Please refer to screenshots “latest_long1.png”and “latest_long2.png” in the feed.
3. According to [2], there is no specific explanation regarding how the audio will be separated and why is_final was only appeared at the end of the process. And hardly any related information from internal search.

My Screenshots/Documentation:
[1]

https://cloud.google.com/speech-to-text/docs/latest-models#model_identifiers
[2]

https://cloud.google.com/speech-to-text/docs/speech-to-text-requests#streaming_responses

What you expected to happen:
1. The customer would like to know the exact time difference between using latest_short and latest_long.
2. The customer would like to receive is_final when the partial sentence has been recognized.

Comments

va...@google.com <va...@google.com> Oct 20, 2023 06:44AM

Reassigned to je...@google.com.

je...@google.com <je...@google.com> #2Oct 25, 2023 05:13AM

Reassigned to gc...@google.com.

I have informed our engineering team of this feature request. There is currently no ETA for its implementation.

A current workaround would be to check the returned "boundingPoly" [1] "vertices" for the returned "textAnnotations". If the calculated rectangle's heights > widths, than your image is sideways.

[1]

https://cloud.google.com/vision/reference/rest/v1/images/annotate#boundingpoly

Issue 306517539

Description

Issue summary

Comments

va...@google.com <va...@google.com> Oct 20, 2023 06:44AM

je...@google.com <je...@google.com> #2Oct 25, 2023 05:13AM

Add comment

Issue metadata