Infeasible
Status Update
Comments
bl...@google.com <bl...@google.com>
do...@gmail.com <do...@gmail.com> #2
Hi,
I changed the parquet file not to use the "version 2 data page", and now BigQuery seems to accept it. So perhaps the issue is that BigQuery does not support the latest parquet file format.
I changed the parquet file not to use the "version 2 data page", and now BigQuery seems to accept it. So perhaps the issue is that BigQuery does not support the latest parquet file format.
er...@google.com <er...@google.com>
al...@google.com <al...@google.com>
ho...@google.com <ho...@google.com> #3
related Stack Overflow discussion:
https://stackoverflow.com/questions/59221214/how-can-i-figure-out-why-bigquery-is-rejecting-my-parquet-file/59232605?noredirect=1#comment104708338_59232605
Wes McKinney says "Parquet V2 is not considered production. If ParquetJS is writing this by default you should ask them to change it"
Wes McKinney says "Parquet V2 is not considered production. If ParquetJS is writing this by default you should ask them to change it"
ke...@google.com <ke...@google.com> #4
Based on the debug log, BigQuery obtained 0 value from the column '_id', although the relevant column chunk metadata indicates that there should be 10 values.
Thanks for the link from #3. Let's not use the "version 2 data page" and track this as a known issue.
Thanks for the link from #3. Let's not use the "version 2 data page" and track this as a known issue.
Description
Problem you have encountered:
I am unable to load parquet files generated by my code, and the error information I am getting does not help me understand why.
The error I get is the same regardless of whether I upload to the console, via the API, or using the bq command line tool.
```
$ bq --disable_ssl_validation load --autodetect --source_format PARQUET formative.orgtest /tmp/formative-orgs-parquetlRKKvN/orgs-0.parquet
Upload complete.
Waiting on bqjob_r67036461528a601_0000016edd66f839_1 ... (8s) Current status: DONE
BigQuery error in load operation: Error processing job 'formative-dev-and-staging:bqjob_r67036461528a601_0000016edd66f839_1': Error while reading data, error message:
Read less values than expected from: prod-scotty-b97683eb-9008-42e2-9a54-15d7173e2e4e; Actual: 0, Expected: 10
```
The file appears to be valid, at least as far as parquet-tools is concerned:
```
$ java -jar /home/ubuntu/bin/parquet-tools-1.10.0.jar meta /tmp/formative-orgs-parquetlRKKvN/orgs-0.parquet
file: file:/tmp/formative-orgs-parquetlRKKvN/orgs-0.parquet
creator: parquet.js
file schema: root
--------------------------------------------------------------------------------
_id: REQUIRED BINARY O:UTF8 R:0 D:0
name: REQUIRED BINARY O:UTF8 R:0 D:0
type: REQUIRED BINARY O:UTF8 R:0 D:0
row group 1: RC:10 TS:661 OFFSET:4
--------------------------------------------------------------------------------
_id: BINARY UNCOMPRESSED DO:0 FPO:4 SZ:192/192/1.00 VC:10 ENC:RLE,PLAIN ST:[no stats for this column]
name: BINARY UNCOMPRESSED DO:0 FPO:221 SZ:266/266/1.00 VC:10 ENC:RLE,PLAIN ST:[no stats for this column]
type: BINARY UNCOMPRESSED DO:0 FPO:514 SZ:124/124/1.00 VC:10 ENC:RLE,PLAIN ST:[no stats for this column]
```
```
$ java -jar /home/ubuntu/bin/parquet-tools-1.10.0.jar cat -j /tmp/formative-orgs-parquetlRKKvN/orgs-0.parquet
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/ubuntu/bin/parquet-tools-1.10.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
{"_id":"rtAkwCyNvbrRHqrFo","name":"Test School 2","type":"school"}
{"_id":"5b3d467e11bf1f784a70410a","name":"Formative","type":"school"}
{"_id":"q7n1rowy2bk","name":"Sequoyah Sch Chalkville Campus","type":"school"}
{"_id":"q1spluwca5","name":"Det Ctr","type":"school"}
{"_id":"893la3qlvti","name":"Wallace Sch Mt Meigs Campus","type":"school"}
{"_id":"tn76afdgsyp","name":"Mcneel Sch Vacca Campus","type":"school"}
{"_id":"8dniv39ox2s","name":"Alabama Youth Services","type":"school"}
{"_id":"xsx83s1wa1e","name":"Albertville Middle School","type":"school"}
{"_id":"c38mbxhq92j","name":"Albertville High School","type":"school"}
{"_id":"2x35p4wbhdj","name":"Evans Elementary School","type":"school"}
```
What you expected to happen:
The file would be accepted, or a clear error message would explain why the file is not accepted.
Steps to reproduce:
Try to upload this parquet file into BigQuery.