Sentence extraction doesn't seem to work in few cases. [155686500]

Assigned

Bug

Status Update

No update yet.

Description

sa...@gmail.com

created issue #1

May 5, 2020 02:00AM

Hi,

I have two examples where the sentence extraction didn't happen correctly.

Example 1: The API extracted 3 sentences from the below text. (Hey Anton, Thank you and remaining text as 3rd sentence). I was expecting it to break into 4. The fourth sentence after semi colon.

Hey Anton! Thank you.
I did have a few questions regarding my payments; what date I make my first payment, are the payments monthly, and can we get me set up for automatic payments?

Example 2: The API extracted 2 sentences from the below text (What's my account number and remaining text as 2nd sentence). I was expecting it to break into 3 sentences (I pay only by money order as 2nd sentence and remaining as the 3rd sentence)

What's my account number.. I pay ONLY by money order....I need to no what information needs 2 be put on it so that it may be credited 2 my account...

Will you please review these two examples and help me understand why it didn't break into more sentences as I expected. Below is the code snippet.

private static void AnalyzeSyntaxFromText(Correspondence correspondence)
{
var client = LanguageServiceClient.Create();
try
{
var response = client.AnnotateText(new Document()
{
Content = correspondence.Message,
Type = Document.Types.Type.PlainText
},
new AnnotateTextRequest.Types.Features() {ExtractSyntax = true});
WriteSentences(response.Sentences, response.Tokens, correspondence);
}
catch (Exception ex)
{
correspondence.ErrorMessage = ex.ToString();
}
}

Thanks in advance.

What's my account number.. I pay ONLY by money order....I need to no what information needs 2 be put on it so that it may be credited 2 my account...

5.5 KB

View

Download

Comments

sa...@google.com <sa...@google.com> Jun 2, 2020 04:28PM

Assigned to sa...@google.com.

sa...@google.com <sa...@google.com> #2Jun 5, 2020 02:37PM

Hi,
Thank you for sharing the two examples.

The product engineering team is aware of this issue and are investigating it . There is no ETA at this time for a fix, but all further updates should occur here.“

gs...@google.com <gs...@google.com> Jun 5, 2020 05:00PM

Reassigned to gc...@google.com.

da...@gmail.com <da...@gmail.com> #3Jun 6, 2020 01:22PM

Hi,

I have also came across this issue. I am supplying the following string to the API:

Mr. Sherlock Holmes and Dr. John Watson were better than the F.B.I. at crime fighting.

The sentences are being returned as:

[
  {
    "text": {
      "content": "Mr. Sherlock Holmes and Dr. John Watson were better than the F.B.I.",
      "beginOffset": 484
    },
    "sentiment": null
  },
  {
    "text": { "content": "at crime fighting.", "beginOffset": 552 },
    "sentiment": null
  }
]

So it thinks the full stop after F.B.I. is the end of the sentence. The strange thing is that in the tokens response it includes the 'F.B.I.' string as it's own token, so it is parsing it correctly in that part of the resonse, just not in the sentences array.

Issue 155686500

Description

Issue summary

Comments

sa...@google.com <sa...@google.com> Jun 2, 2020 04:28PM

sa...@google.com <sa...@google.com> #2Jun 5, 2020 02:37PM

gs...@google.com <gs...@google.com> Jun 5, 2020 05:00PM

da...@gmail.com <da...@gmail.com> #3Jun 6, 2020 01:22PM

Add comment

Issue metadata