Problem you have encountered:

We are experiencing internal errors (Error Code 13: "An internal error occurred.") when using the Speech-to-Text API's Chirp model for Chinese transcription. This issue has been consistently observed since August 4th, 2024. The error is preventing us from transcribing Chinese audio files using the Chirp model.

What you expected to happen:

We expected to successfully transcribe Chinese audio files using the Chirp model, as the Speech-to-Text API documentation states that Chirp supports Traditional Chinese.

Steps to reproduce:

Use the Speech-to-Text API's batch_recognize method with the following configuration:

config = cloud_speech.RecognitionConfig(
auto_decoding_config={},
features=cloud_speech.RecognitionFeatures(
enable_automatic_punctuation=True,
),
model="chirp_2",
language_codes=["cmn-Hans-CN"],
)

Provide a Chinese audio file as input.

The API will return an error with code 13 and the message "An internal error occurred."

Other information:

Workarounds tried:

Using the Long model: While successful, it has a higher word error rate compared to the Chirp model.
Using the Chirp2 model with the "en-US" language setting: This workaround produces accurate Traditional Chinese transcriptions with some English nouns, but it's unclear .

Documentation :

[1] https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages

[2] https://cloud.google.com/speech-to-text/docs/basics#speech_requests