Assigned
Status Update
Comments
gs...@google.com <gs...@google.com> #2
This issue has been brought to the attention of Engineering, who will address it in due course. There is no ETR as yet. You can keep up-to-date with developments by following this thread.
sh...@nobias.com <sh...@nobias.com> #3
Hello,
We've hit this problem again recently.
If there is a good workaround (aside from just ignoring it), please let us
know.
Thanks,
--Shinichi
On Sun, Feb 25, 2018 at 5:06 PM, <buganizer-system@google.com> wrote:
We've hit this problem again recently.
If there is a good workaround (aside from just ignoring it), please let us
know.
Thanks,
--Shinichi
On Sun, Feb 25, 2018 at 5:06 PM, <buganizer-system@google.com> wrote:
Description
Apparently, not handling the utf-8 character sequences correctly?
-------
import sys
import google.auth
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
from google.oauth2 import service_account
import google.oauth2.credentials
# create client for Google Cloud
key_file = "task/google_cloud_key.json"
credentials = service_account.Credentials.from_service_account_file(key_file)
scoped_credentials = credentials.with_scopes(['
google_client = language.LanguageServiceClient(credentials=scoped_credentials)
# Detect and send native Python encoding to receive correct word offsets.
encoding = enums.EncodingType.UTF32
if sys.maxunicode == 65535:
encoding = enums.EncodingType.UTF16
# this string has the problem
tt = b'WARNING: Video contains strong language\r\n\r\nA new Netflix documentary series looks at food from all over the world, including Houston\'s world-leading culinary scene. \xe2\x80\x9cUgly Delicious\xe2\x80\x9d follows celebrity chef David Change as he explores the relationships between culture, politics, and food.'
# this string is OK...
#tt = b'WARNING: Video contains strong language\r\n\r\nA new Netflix documentary series looks at food from all over the world, including Houston\'s world-leading culinary scene. \xe2\x80\x9cUgly Delicious\xe2\x80\x9d'
# this validates that this string is utf-8
str=tt.decode('utf-8')
document_body = types.Document(content=tt, type=enums.Document.Type.PLAIN_TEXT)
msg = google_client.analyze_entity_sentiment(document_body, encoding)
print(msg)