Get latency for realtime speech to text of around 2 seconds. [337766123]

Assigned

Bug

Status Update

No update yet.

Description

sa...@gmail.com

created issue #1

Apr 29, 2024 10:35AM

This will create a public issue which anybody can view and comment on.

Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.

Problem you have encountered: Currently I am using streamingRecognize and getting a latency of 2seconds and above.

What you expected to happen: Is there any way to tune down the latency below 1seocnds?

Steps to reproduce:

Other information (workarounds you have tried, documentation consulted, etc): I have tried to implement the best practices mentioned in the docs, but it did not work.

Currently I am getting the audio from the browser using mediaRecorder and sending the audio file to the server using socket. I am provide the code I am using on the server end.

Any kind of help or insights will be appreciated. Thank you.

Code:
*server.js*
import express from 'express';
import ViteExpress from 'vite-express';
import { Server } from '

socket.io';
import http from 'http';
import { Readable } from 'stream';
import gcpSTTService from './services/gcpSTTService.js';

const app = express();

const server = http.createServer(app);
const io = new Server(server);

io.on('connection', (socket) => {
console.log('a user connected');

// Create a Readable stream to handle audio data
const audioStream = new Readable({
read(size) {
// Do nothing here since we'll be pushing data from the socket event handler
},
});

// Handle incoming audio data from the client
socket.on('gcpSpeechToText', async (data) => {
console.log('Received audio data');

let audioBuffer;

if (typeof data === 'string') {
// Convert string to Buffer
audioBuffer = Buffer.from(data, 'base64');
} else if (Buffer.isBuffer(data) || data instanceof Uint8Array) {
// Data is already a Buffer or Uint8Array
audioBuffer = data;
} else {
// Handle other data types or throw an error
console.error('Unsupported data type received:', typeof data);
return;
}

// Push the audio data into the stream
audioStream.push(audioBuffer);
});

// Pipe the audio stream to the test function

audioStream.pipe(gcpSTTService(io));

socket.on('disconnect', () => {
console.log('User disconnected');
});
});

server.listen(3000, () => console.log('Server is listening...'));

ViteExpress.bind(app, server);

*gcpSTTService.js*
import speech from '@google-cloud/speech';
const client = new speech.SpeechClient();
// import { io } from "socket.io-client";

function gcpSTTService(io) {
const config = {
encoding: 'WEBM_OPUS',
sampleRateHertz: 48000,
languageCode: 'en-US',
};

const request = {
config: config,
interimResults: false, // If you want interim results, set this to true
};

return client
.streamingRecognize(request)
.on('error', console.error)
.on('data', (data) => {
// Check if interim results are available
if (data.results[0] && data.results[0].alternatives[0]) {
console.log(
`Interim Transcription: ${data.results[0].alternatives[0].transcript}`
);
io.emit(
'gcpSpeechToTextResult',
data.results[0].alternatives[0].transcript
);
return data.results[0].alternatives[0].transcript;
}
});
}

export default gcpSTTService;

Comments

va...@google.com <va...@google.com> Apr 30, 2024 09:56AM

Assigned to pu...@google.com.

pu...@google.com <pu...@google.com> #2May 1, 2024 12:14PM

Reassigned to gc...@google.com.

I have informed our engineering team of this feature request. There is currently no ETA for its implementation.

A current workaround would be to check the returned "boundingPoly" [1] "vertices" for the returned "textAnnotations". If the calculated rectangle's heights > widths, than your image is sideways.

[1]

https://cloud.google.com/vision/reference/rest/v1/images/annotate#boundingpoly

Issue 337766123

Description

Issue summary

Comments

va...@google.com <va...@google.com> Apr 30, 2024 09:56AM

pu...@google.com <pu...@google.com> #2May 1, 2024 12:14PM

Add comment

Issue metadata