Assigned
Status Update
Comments
ds...@google.com <ds...@google.com>
ds...@google.com <ds...@google.com> #2
Hello,
To assist us in conducting thorough investigation, we kindly request your cooperation in providing the following information regarding the reported issue:
- Has this scenario ever worked as expected in the past?
- Do you see this issue constantly or intermittently ?
- If this issue is seen intermittently, then how often do you observe this issue ? Is there any specific scenario or time at which this issue is observed ?
- To help us understand the issue better, please provide detailed steps to reliably reproduce the problem.
- It would be greatly helpful if you could attach screenshots of the output related to this issue.
Your cooperation in providing these details will enable us to dive deeper into the matter and work towards a prompt resolution. We appreciate your assistance and look forward to resolving this issue for you.
Thank you for your understanding and cooperation.
Description
Problem you have encountered:
Processing a document through our custom document extractor (from a local script) results in a new folder being added to our output directory. This folder has a long name of ~20 numbers. It contains a folder [0...n] for each document process. Each of these folders contains the json output from the processor. A sample path would be
output-directory/12345678901234567890/0/filename-0.json
.The process for finding post-Human-in-the-Loop JSONs is much less clear. After processing a document in the same manner from above and letting it trigger human review (either with a known bad document or setting the document confidence threshold to 100%), our document shows up in the Specialist portal. We adjust the annotations accordingly and press Submit. This appears to populate a ~20-character folder in our HITL output directory, which contains a ~20 character JSON. Both the folder name and JSON name appear unrelated to the original filename. The JSON also contains a lot of fields not included in the original output JSON, but I'm inclined to believe that may be a separate issue. A sample path here would be
hitl/12345678901234567890/98765432109876543210.json
.I can't see how it's possible to match the post-HITL JSON files up to the post-processor pre-HITL JSON files.
What you expected to happen:
I expect the post-HITL JSON files to have a filename (or other metadata) that can be used to match these files to the pre-HITL files. Otherwise I can't see how HITL is a useful feature.
Steps to reproduce:
Other information (workarounds you have tried, documentation consulted, etc):
I previously opened a ticket here: https://issuetracker.google.com/issues/287924956 . I wasn't able to respond in time, but I am happy to provide additional information in a private thread.