Overlapping Boxes Issues on Document AI OCR Processor [343518735]

Assigned

Feature Request

Status Update

No update yet.

Description

es...@google.com

created issue #1

May 29, 2024 10:26PM

This will create a public issue which anybody can view and comment on.

Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.

Problem you have encountered:

Document AI OCR Processor often produces overlapping bounding boxes in the images/files it processes. This inconsistency in the extracted data is caused by the creation of new data using information from the overlapping boxes.

I have attempted to resolve this issue by increasing the image/file quality, brightness, and size, as well as adjusting the OCR Processor settings. Unfortunately, these measures have not been effective. Notably, the majority of the overlapping bounding boxes are vertically aligned.

What you expected to happen:

Extract the information from image/file without overlapping the boxes and mixing information extracted.

Steps to reproduce:

Create an OCR Processor on Document AI
Upload an image/file to process
Run the processing step
Get the processed files/images with overlapping boxes

I will attach some files that I have tested and its results (tested resources on PDF and JPG files)

TestResult7.png

272 KB

View

Download

TestResult6.png

101 KB

View

Download

TestResult5.png

71 KB

View

Download

TestResult4.png

50 KB

View

Download

TestResult3.png

28 KB

View

Download

TestResult2.png

30 KB

View

Download

TestResult1.png

16 KB

View

Download

Sample 3.pdf

1.1 MB

View

Download

Sample 7.pdf

455 KB

View

Download

Sample 2.pdf

343 KB

View

Download

Sample 6.pdf

131 KB

View

Download

Sample 4.pdf

95 KB

View

Download

Sample 5.pdf

35 KB

View

Download

IssueTracker