Table recognition does not works when a table spans across page [341364630]

Assigned

Bug

Status Update

No update yet.

Description

sa...@google.com

created issue #1

May 17, 2024 09:48PM

This will create a public issue which anybody can view and comment on.

Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.

Problem you have encountered:

Document AI struggles to parse tables that spread across multiple pages in a document.

What you expected to happen:

* Document AI would recognize the table header on one page and the corresponding data on another page as a single, cohesive table.

* The header information would be correctly linked to its corresponding data points, allowing for easy and accurate extraction of the entire table.

Steps to reproduce:

Document Structure:

The document starts with a fixed number of key-value pairs (tax form).
An optional "Results" section follows, with:

Row 1: Header info with key-value pairs (free tax format) spanning 3 columns.

Content section: 3 columns with titles and values.

Values can be free text, labeled, or multi-line.

Values can repeat within a column.

All dividers (between columns and rows) are made of "=" characters.

Attempted Solutions:

Multiple processors (Custom Extractor, Form Parser, invoice parser) fail to handle multi-page tables.

Training a custom extractor with numeric documents doesn't help.
Parser considers data from the previous page as part of the current table.

Data Hierarchy for Custom Parser:

Table is the parent object.

Each section is a child object.

Each child object contains field text (3rd layer).

The current processor cannot handle more than 3 layers.

Other information (workarounds you have tried, documentation consulted, etc):

Comments

sa...@google.com <sa...@google.com> #2May 17, 2024 09:51PM

Hello,

To assist us in conducting thorough investigation, we kindly request your cooperation in providing the following information regarding the reported issue:

Has this scenario ever worked as expected in the past?
Do you see this issue constantly or intermittently ?
If this issue is seen intermittently, then how often do you observe this issue ? Is there any specific scenario or time at which this issue is observed ?
To help us understand the issue better, please provide detailed steps to reliably reproduce the problem.
It would be greatly helpful if you could attach screenshots of the output related to this issue.

Your cooperation in providing these details will enable us to dive deeper into the matter and work towards a prompt resolution. We appreciate your assistance and look forward to resolving this issue for you.

Thank you for your understanding and cooperation.

Issue 341364630

Description

Issue summary

Comments

sa...@google.com <sa...@google.com> #2May 17, 2024 09:51PM

Add comment

Issue metadata