Issue recognizing repeated digits sequences in Polish Language [349753183]

Assigned

Customer Issue

Status Update

No update yet.

Description

ma...@google.com

created issue #1

Jun 27, 2024 09:02AM

This will create a public issue which anybody can view and comment on.

Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.

Problem you have encountered: The customer encounters an issue with the recognition of the digits in a Speech to Text v2 API for Polish language. They found that when using an audio sample that contains repeated digits, the model is not accurate when outputting the text result.

What you expected to happen: The customer expects the records to be accurate, with all the digits transcribed to text as it is on the audio sample in Polish language.

Steps to reproduce: The model processed an audio sample that contained “555” throughout other digits, and the output was “55” along with the other digits.

Comments

wi...@google.com <wi...@google.com> #2Jul 18, 2024 08:07AM

I have informed our engineering team of this feature request. There is currently no ETA for its implementation.

A current workaround would be to check the returned "boundingPoly" [1] "vertices" for the returned "textAnnotations". If the calculated rectangle's heights > widths, than your image is sideways.

[1]

https://cloud.google.com/vision/reference/rest/v1/images/annotate#boundingpoly

[Deleted User] <[Deleted User]> #3Jul 19, 2024 09:05AM

I also need this problem solved :)

audio-files_calf1966960-cc7e-4622-8678-0cac90a5691e_answer_pl (1).wav

261 KB

Download

Screenshot 2024-07-19 at 12.02.15.png

259 KB

View

Download

Issue 349753183

Description

Issue summary

Comments

wi...@google.com <wi...@google.com> #2Jul 18, 2024 08:07AM

[Deleted User] <[Deleted User]> #3Jul 19, 2024 09:05AM

Add comment

Issue metadata