Assigned
Status Update
Comments
ci...@google.com <ci...@google.com> #2
Hello,
It seems that the output of vision API while parsing the text of a PDF file already provides the text from top to bottom as described in [1].
Could you please include a specific example of your issue (an attach an example PDF that generates undesired output) so I can better understand your problem and help you with?
--------------------
[1]https://cloud.google.com/vision/docs/pdf
It seems that the output of vision API while parsing the text of a PDF file already provides the text from top to bottom as described in [1].
Could you please include a specific example of your issue (an attach an example PDF that generates undesired output) so I can better understand your problem and help you with?
--------------------
[1]
so...@icsl.es <so...@icsl.es> #3
Hello,
Find enclosed the sample. PDF, the output json and the .txt I generate by just iterating sequentally the JSON.
I know vision generates the coords for each letter, but it should be straight forward to extract the text in the right order just based on the JSON position.
Regards,
Josep
Find enclosed the sample. PDF, the output json and the .txt I generate by just iterating sequentally the JSON.
I know vision generates the coords for each letter, but it should be straight forward to extract the text in the right order just based on the JSON position.
Regards,
Josep
ci...@google.com <ci...@google.com> #4
Hello,
Thank you for providing this. I was able to reproduce your issue and I will forward all this information to the Product Engineering team. Please, bear in mind that there is no ETA for this, but any update on this will be posted here. Consider starring this issue for getting automatic notifications [1]. Feature Requests stared by a higher number of users are more likely to be implemented [2].
--------------------
[1]https://developers.google.com/issue-tracker/guides/subscribe#starring_an_issue
[2]https://cloud.google.com/support/docs/issue-trackers#what_to_expect_once_youve_opened_an_issue
Thank you for providing this. I was able to reproduce your issue and I will forward all this information to the Product Engineering team. Please, bear in mind that there is no ETA for this, but any update on this will be posted here. Consider starring this issue for getting automatic notifications [1]. Feature Requests stared by a higher number of users are more likely to be implemented [2].
--------------------
[1]
[2]
Description
All pdf's sent to google vision are not read in top-down format but left-to-right in block.
That makes nearly impossible a properly parsing of the output.
Nearly all other market solutions (event some free ones) offers the possibility of keeping the original format, even with tables.
Really a needed feature for serious pdf to text processing.
Thanks.