Enhance BQ schema auto detection [231164514]

Infeasible

Bug

Status Update

No update yet.

Description

ra...@gmail.com

created issue #1

May 3, 2022 01:10PM

Problem you have encountered:

BigQuery schema auto-detection doesn't include enough data during it's detection process so the generated schemes are incorrect

What you expected to happen:

Enhance BQ schema auto detection to,
(a) Include a larger number of rows (currently 500) in its analysis
(b) If the table is an external table, process as many files as possible (rather than data from one file which is randomly picked).

Other information (workarounds you have tried, documentation consulted, etc):

https://cloud.google.com/bigquery/docs/schema-detect#auto-detect

Comments

rb...@google.com <rb...@google.com> #2May 3, 2022 01:11PM

Public facing bug created so the customer can track progress.

zi...@garnercorp.com <zi...@garnercorp.com> #3Jun 7, 2022 07:57PM

We encountered the same issue with our production data. Schema auto-detection currently only scans the first 500 rows of a CSV file.
If the rest of the rows have different kinds of data, the load job fails.

It would be nice to introduce some fallback mechanism, such that when BigQuery detected a different type of data when importing a column, it would switch to a different data type for that column.

va...@google.com <va...@google.com> Jul 1, 2022 04:17PM

Assigned to va...@google.com.

va...@google.com <va...@google.com> #4Jul 22, 2022 08:46AM

Hello,

Could you please confirm whether the issue is still persisting on your end or not?....

va...@google.com <va...@google.com> #5Jul 29, 2022 08:36AM

Status: Won't Fix (Infeasible)

Hello,

Since there is no responese from your side,I will proceed to mark this issue as won't fix(infeasible), In case the issue persists on your side or you want to report a new issue, please do not hesitate to create a new Issue Tracker thread describing your situation.

Issue 231164514