Status Update
Comments
jo...@google.com <jo...@google.com>
cr...@google.com <cr...@google.com>
st...@google.com <st...@google.com> #3
+1
sw...@google.com <sw...@google.com> #4
This is not a problem with the Python SDK. It's a problem with the load job backend for Parquet files.
I can reproduce this issue with the BQ CLI (parquet file attached).
bq load --source_format=PARQUET swast-scratch:my_dataset.j
son_from_parquet json.parquet name:STRING,metadata:JSON
Output:
Upload complete.
Waiting on bqjob_r3cb1283b5b4dd24a_0000018508419771_1 ... (0s) Current status: R Waiting on bqjob_r3cb1283b5b4dd24a_0000018508419771_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'swast-
scratch:bqjob_r3cb1283b5b4dd24a_0000018508419771_1': Unsupported field type:
JSON
Failure details:
- Unsupported field type: JSON
The Python SDK uses Parquet by default because both pandas DataFrame and Parquet are columnar formats.
Customers can workaround this issue by using CSV as a serialization format. (Feature added here:
st...@google.com <st...@google.com> #5
Are you saying to use SourceFormat.CSV [1], like in the scenario below ?
[1] job_config = bigquery.LoadJobConfig( schema=table_schema, source_format=SourceFormat.CSV )
My customer currently has this scenario:
- create bigquery table with a json column.
- insert into the table with cli
bq query "insert into ts-storage-dba-poc.vdang_demo_dataset.json_table6 values (JSON '{\"first\": \"vivian\", \"last\": \"dang\"}')"
- Now if we run the following python api code, which read data into dataframe then load it backup, it failed with error message google.api_core.exceptions.BadRequest: 400 Unsupported field type: JSON
import pandas as pd
from google.cloud import bigquery
client = bigquery.Client(project='myproject')
job = client.query("select * from vdang_demo_dataset.json_table6")
df = job.to_dataframe()
print(client.load_table_from_dataframe(df, "vdang_demo_dataset.json_table6").result())
sw...@google.com <sw...@google.com> #6
Yes, even omitting the schema should work as the Python SDK will fetch it if needed.
import pandas as pd
from google.cloud import bigquery
client = bigquery.Client(project='myproject')
job = client.query("select * from vdang_demo_dataset.json_table6")
df = job.to_dataframe()
print(client.load_table_from_dataframe(
df, "vdang_demo_dataset.json_table6",
job_config=bigquery.LoadJobConfig(source_format="CSV")).result())
st...@google.com <st...@google.com> #7
Thanks for confirming!
st...@google.com <st...@google.com> #8
Is there a documentation or update planned yet to mention the JSON interoperability ?
br...@google.com <br...@google.com>
sh...@google.com <sh...@google.com>
da...@google.com <da...@google.com>
cr...@google.com <cr...@google.com> #9
bq load --source_format=PARQUET swast-scratch:my_dataset.j
son_from_parquet json.parquet name:STRING,metadata:JSON
Description
Please add JSON as a supported type in Python SDK for BigQuery.
[1]
[2] BadRequest: 400 Unsupported field type: JSON