Status Update
Comments
va...@google.com <va...@google.com>
ja...@google.com <ja...@google.com> #2
Hello,
Thank you for reaching out to us!
To assist us in conducting thorough investigation, we kindly request your cooperation in providing the following information regarding the reported issue:
- Do you see this issue constantly or intermittently?
- Is there any specific scenario or time at which this issue is observed?
- To help us understand the issue better, please provide detailed steps to reliably reproduce the problem.
- It would be greatly helpful if you could attach screenshots of the output/error related to this issue.
Your cooperation in providing these details will enable us to dive deeper into the matter and work towards a prompt resolution. We appreciate your assistance and look forward to resolving this issue for you.
Thank you for your understanding and cooperation.
le...@zeliq.com <le...@zeliq.com> #3
1. This is the first time I have encountered this issue, and it is still present today.
2. I don't know. It seems to be a bug that needs to be fixed, as it won't go away otherwise.
3. Go to BigQuery, create a table, import the file I shared in my initial post, and you will see that a field containing a string is incorrectly assigned as INTEGER because it contains numbers—even though it is quoted (which indicates a string).
4. I'm sharing a screenshot of a large import that fails because a field is assigned as INTEGER instead of STRING, even though it should be a string. The issue is that a STRING cannot be written because the field is incorrectly set as INTEGER. The problem occurs because BigQuery initially processes a file where a field contains quoted numbers and incorrectly assigns it as INTEGER. As a result, the next file cannot be written since the schema expects an INTEGER instead of a STRING.
The reverse problem can also occur: If you first import a STRING, BigQuery will later reject a file containing quoted numbers, assuming the field should be an INTEGER.
TL;DR: BigQuery's Schema Auto Detect incorrectly assigns a field as INTEGER when a file contains only quoted numbers or null values (i.e., no quoted STRING).
Thanks for your support.
ja...@google.com <ja...@google.com> #4
Hello,
Thanks for your response.
Even though the numbers are in double quotes, BigQuery might still interpret them as integers if it detects a pattern. If the first file only has quoted numbers and nulls, it might think the column is always going to be an integer.
I would suggest you to create a small "dummy" file that contains at least one actual string value in the column. Make sure this file is processed first during the import. This will give BigQuery a better chance of inferring the correct data type.
I am waiting for your response.
le...@zeliq.com <le...@zeliq.com> #5
> I would suggest you create a small "dummy" file that contains at least one actual string value in the column.
No, that won’t work, because the issue goes both ways:
1. If the first file contains a STRING, the column is set to STRING. But as soon as BigQuery encounters a file where the column contains only `null` or quoted number (e.g., `"1234"`), it considers that the field type has changed.
2. If the first file contains only quoted numbers (which should still be treated as strings), BigQuery wrongly assigns the column type as INTEGER, even though quoted numbers are strings, not integers.
This behavior contradicts JSON.org itself : (
"A value can be a STRING in DOUBLE QUOTES, or a number, or true, or false, or null, or an object, or an array. These structures can be nested."
> Make sure this file is processed first during the import.
That’s not a reliable solution, because with BigQuery Schema Auto Detection, you can’t control the processing order. Trying to work around this defeats the purpose of schema auto-detection.
BigQuery should never assign INTEGER when encountering quoted numbers, this is a bug.
ja...@google.com <ja...@google.com> #6
Hello,
Thank you for reaching out to us with your request.
We have duly noted your feedback and will thoroughly validate it. While we cannot provide an estimated time of implementation or guarantee the fulfillment of the issue, please be assured that your input is highly valued. Your feedback enables us to enhance our products and services.
We appreciate your continued trust and support in improving our Google Cloud Platform products. In case you want to report a new issue, please do not hesitate to create a new issue on the
Once again, we sincerely appreciate your valuable feedback; Thank you for your understanding and collaboration.
Description
I'm encountering a frustrating issue with BigQuery's Schema Auto Detection while importing data.
When importing a file containing only containing things like : { "field": "123456" }
BigQuery interprets the value as an integer instead of a string, even though double quotes in JSON indicate a string.
When importing a large dataset via GCS, if the first processed file contains only null values and quoted numbers (e.g., "123456"), BigQuery may incorrectly infer the field type as INTEGER instead of STRING. As a result, future STRING values won't be writable because the field type is locked as INTEGER.
Thanks!