Assigned
Status Update
Comments
va...@google.com <va...@google.com>
pu...@google.com <pu...@google.com> #2
I have informed our engineering team of this feature request. There is currently no ETA for its implementation.
A current workaround would be to check the returned "boundingPoly" [1] "vertices" for the returned "textAnnotations". If the calculated rectangle's heights > widths, than your image is sideways.
[1]https://cloud.google.com/vision/reference/rest/v1/images/annotate#boundingpoly
A current workaround would be to check the returned "boundingPoly" [1] "vertices" for the returned "textAnnotations". If the calculated rectangle's heights > widths, than your image is sideways.
[1]
Description
Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.
Problem you have encountered: Currently I am using streamingRecognize and getting a latency of 2seconds and above.
What you expected to happen: Is there any way to tune down the latency below 1seocnds?
Steps to reproduce:
Other information (workarounds you have tried, documentation consulted, etc): I have tried to implement the best practices mentioned in the docs, but it did not work.
Currently I am getting the audio from the browser using mediaRecorder and sending the audio file to the server using socket. I am provide the code I am using on the server end.
Any kind of help or insights will be appreciated. Thank you.
Code:
*server.js*
import express from 'express';
import ViteExpress from 'vite-express';
import { Server } from '
import http from 'http';
import { Readable } from 'stream';
import gcpSTTService from './services/gcpSTTService.js';
const app = express();
const server = http.createServer(app);
const io = new Server(server);
io.on('connection', (socket) => {
console.log('a user connected');
// Create a Readable stream to handle audio data
const audioStream = new Readable({
read(size) {
// Do nothing here since we'll be pushing data from the socket event handler
},
});
// Handle incoming audio data from the client
socket.on('gcpSpeechToText', async (data) => {
console.log('Received audio data');
let audioBuffer;
if (typeof data === 'string') {
// Convert string to Buffer
audioBuffer = Buffer.from(data, 'base64');
} else if (Buffer.isBuffer(data) || data instanceof Uint8Array) {
// Data is already a Buffer or Uint8Array
audioBuffer = data;
} else {
// Handle other data types or throw an error
console.error('Unsupported data type received:', typeof data);
return;
}
// Push the audio data into the stream
audioStream.push(audioBuffer);
});
// Pipe the audio stream to the test function
audioStream.pipe(gcpSTTService(io));
socket.on('disconnect', () => {
console.log('User disconnected');
});
});
server.listen(3000, () => console.log('Server is listening...'));
ViteExpress.bind(app, server);
*gcpSTTService.js*
import speech from '@google-cloud/speech';
const client = new speech.SpeechClient();
// import { io } from "socket.io-client";
function gcpSTTService(io) {
const config = {
encoding: 'WEBM_OPUS',
sampleRateHertz: 48000,
languageCode: 'en-US',
};
const request = {
config: config,
interimResults: false, // If you want interim results, set this to true
};
return client
.streamingRecognize(request)
.on('error', console.error)
.on('data', (data) => {
// Check if interim results are available
if (data.results[0] && data.results[0].alternatives[0]) {
console.log(
`Interim Transcription: ${data.results[0].alternatives[0].transcript}`
);
io.emit(
'gcpSpeechToTextResult',
data.results[0].alternatives[0].transcript
);
return data.results[0].alternatives[0].transcript;
}
});
}
export default gcpSTTService;