Assigned
Status Update
Comments
ar...@revolgy.com <ar...@revolgy.com> #2
We still have a problem with speech recognition of numbers (Google v2, short model, eu region, polish language).
User says: 795 249 2770
But recognised as: 795 249 277 - reproduced by Google Cloud Console.
File with user's speech is in attachment.
We hope, that you will fix it soon!
User says: 795 249 2770
But recognised as: 795 249 277 - reproduced by Google Cloud Console.
File with user's speech is in attachment.
We hope, that you will fix it soon!
Description
Please describe your requested enhancement. Good feature requests will solve common problems or enable new use cases.
What you would like to accomplish: I would like to get the issue with the STT, short model, for the pl-pl language code fixed.
There are transcription issues while using the short model for call recordings. Repetitive numbers (as in a phone number, if the same digit is doubled or tripled it’s only recognized once) or small words like "dwa" are missing in the transcript. The GSR@v2 service returns a stability score of 0.01 with an empty response, indicating the words were missing.
How this might work: -
If applicable, reasons why alternative solutions are not sufficient:
After exploring alternative solutions, including recommending a different model or updating the Speech-to-Text (STT) version, it became clear that these options wouldn't be feasible due to the customer's API integration and the complexity of version changes. While the long and latest_long models perform well for UI transcripts, the customer has identified additional challenges that these models wouldn't address.
I have recommended the customer to use the long or latest_long model but they informed me that there are other issues there with the polish language code and it won’t help, this model is not recognizing properly short answers even "yes"/"no" ("tak"/"nie") and it has many other unstable results. Missing words in the middle of the phrase are also an active issue.
Phone numbers are not recognized properly, only a couple of digits. Either not recognized or a wrong digit sequence is transcripted.
Other information (workarounds you have tried, documentation consulted, etc): -