analyze_entity_sentiment throwing exception on utf-8 string. [73787794]

Assigned

Bug

Status Update

No update yet.

Description

sh...@nobias.com

created issue #1

Feb 23, 2018 07:20PM

The following python code throws an exception. (UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte in field: google.cloud.language.v1.TextSpan.content)

Apparently, not handling the utf-8 character sequences correctly?

-------

import sys
import google.auth
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
from google.oauth2 import service_account
import google.oauth2.credentials

# create client for Google Cloud
key_file = "task/google_cloud_key.json"
credentials = service_account.Credentials.from_service_account_file(key_file)
scoped_credentials = credentials.with_scopes(['

https://www.googleapis.com/auth/cloud-platform'])
google_client = language.LanguageServiceClient(credentials=scoped_credentials)

# Detect and send native Python encoding to receive correct word offsets.
encoding = enums.EncodingType.UTF32
if sys.maxunicode == 65535:
encoding = enums.EncodingType.UTF16

# this string has the problem
tt = b'WARNING: Video contains strong language\r\n\r\nA new Netflix documentary series looks at food from all over the world, including Houston\'s world-leading culinary scene. \xe2\x80\x9cUgly Delicious\xe2\x80\x9d follows celebrity chef David Change as he explores the relationships between culture, politics, and food.'
# this string is OK...
#tt = b'WARNING: Video contains strong language\r\n\r\nA new Netflix documentary series looks at food from all over the world, including Houston\'s world-leading culinary scene. \xe2\x80\x9cUgly Delicious\xe2\x80\x9d'
# this validates that this string is utf-8
str=tt.decode('utf-8')

document_body = types.Document(content=tt, type=enums.Document.Type.PLAIN_TEXT)
msg = google_client.analyze_entity_sentiment(document_body, encoding)
print(msg)

Comments

gs...@google.com <gs...@google.com> #2Feb 25, 2018 10:06PM

Assigned to gc...@google.com.

This issue has been brought to the attention of Engineering, who will address it in due course. There is no ETR as yet. You can keep up-to-date with developments by following this thread.

sh...@nobias.com <sh...@nobias.com> #3May 1, 2018 10:16PM

Hello,

We've hit this problem again recently.
If there is a good workaround (aside from just ignoring it), please let us
know.

Thanks,
--Shinichi

On Sun, Feb 25, 2018 at 5:06 PM, <buganizer-system@google.com> wrote:

- Show quoted text -

Issue 73787794

Description

Issue summary

Comments

gs...@google.com <gs...@google.com> #2Feb 25, 2018 10:06PM

sh...@nobias.com <sh...@nobias.com> #3May 1, 2018 10:16PM

Add comment

Issue metadata