Feature Request - BigQuery - Allow Specifying the Number of Rows to Use for Schema Auto Detection [260487034]

Assigned

Feature Request

Status Update

No update yet.

Description

[Deleted User]

created issue #1

Nov 28, 2022 06:24AM

This will create a public issue which anybody can view and comment on.

Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.

Problem you have encountered:

BigQuery table creation from a csv file fails where the data type after the first 500 rows doesn't match the data type in the first 500 rows (specified as a limitation in

https://cloud.google.com/bigquery/docs/schema-detect)

"400 Error while reading data, error message: Could not parse <example_string> as INT64 for field <field_name> (position 1) starting at location xxxxxxx with message 'Unable to parse' File: <csv_name>"

What you expected to happen:

Column detected as string instead of integer and the table creates successfully

Steps to reproduce:

Try to load data into BigQuery using schema auto detection, with all values for a column being integers for the first 500 rows, with a value for the column being a string in the 501st row

https://cloud.google.com/bigquery/docs/schema-detect

Other information (workarounds you have tried, documentation consulted, etc):

Workarounds tried;

Creating the schema manually for each csv -
In our context this is not feasible as we have 1000's of different csv files each with a different schema and are looking to load them into BigQuery automatically.

Possible solutions could be;

1) Allowing the user to specify the number of rows to use for schema auto detection
2) Allowing the user to instruct schema auto detection to define all columns as strings

Similar Issues;

https://issuetracker.google.com/issues/231164514

https://issuetracker.google.com/issues/176505275

https://issuetracker.google.com/issues/171653451

Comments

[Deleted User] <[Deleted User]> #2Nov 28, 2022 11:57PM

Currently using this package as a workaround -

https://github.com/bxparks/bigquery-schema-generator/

ni...@google.com <ni...@google.com> Dec 1, 2022 09:06PM

Assigned to ga...@google.com.

Issue 260487034

Description

Issue summary

Comments

[Deleted User] <[Deleted User]> #2Nov 28, 2022 11:57PM

ni...@google.com <ni...@google.com> Dec 1, 2022 09:06PM

Add comment

Issue metadata