Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Request for new functionality
View staffing
Description
It would be highly appreciated if you could resolve this issue so that creating a subscription directly inserting data into BigQuery allows for table schemas where its column(s) contain hyphen ‘-’.
If infeasible, then please clearly state this limitation in documentation, as I couldn’t see the statement in the following:
How this might work:
Creation of the subscription is successful and messages with keys containing hyphen ‘-’ published to the topic are stored in a BigQuery table accordingly, as BigQuery table schemas allow the character.
If applicable, reasons why alternative solutions are not sufficient:
One workaround is to deploy a Dataflow pipeline between Pub/Sub and BigQuery specifying the legacy streaming API instead of the Storage Write API.
However, It’s obviously more costly and less straightforward than directly importing data from a Pub/Sub subscription to a BigQuery table.
This workaround is obtained as a by-product with my experimentation (See “Other information”)
Other information (workarounds you have tried, documentation consulted, etc):
I experimented with Dataflow (Apache beam) between Pub/Sub and BigQuery as follows:
schema = 'id:INTEGER,hoge-huga:STRING'
with beam.Pipeline(options=pipeline_options) as p:
(p
| 'Read from Pub/Sub' >> ReadFromPubSub(topic=f'projects/{PROJECT}/topics/{TOPIC}')
| 'Parse JSON and add processed_time' >> beam.ParDo(AddProcessedTime())
| 'Write to BigQuery' >> WriteToBigQuery(
f'{PROJECT}:{DATASET}.{TABLE}',
schema=schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
method="STREAMING_INSERTS" # 'STORAGE_WRITE_API'
)
)
Here, the schema includes a column with ‘-’.
When I choose the method as “STREAMING_INSERTS”, this schema is allowed and messages are stored in the BigQuery table accordingly.
However, Choosing the method as “STORAGE_WRITE_API” raises an error related to the schema.
Thus, it seems that a Protocol Buffer schema doesn’t allow special characters.