Status Update
Comments
va...@google.com <va...@google.com>
je...@google.com <je...@google.com> #2
Hello,
This issue report has been forwarded to the Cloud Dataflow Product team so that they may investigate it, but there is no ETA for a resolution today. Future updates regarding this issue will be provided here.
je...@google.com <je...@google.com> #3
Hi,
SDK container image is configurable, in the Optional Parameters section:
ma...@delfina.com <ma...@delfina.com> #4
I did a ctrl + f on "sdk" to be extra sure I wasn't just missing something, but it's still totally possible that I am!
Would you mind attaching an image of the screenshot rather than the
je...@google.com <je...@google.com> #5
Hi,
I sincerely apologize for the inconvenience caused.
I have attached the image now, could you please check?
ma...@delfina.com <ma...@delfina.com> #6
All good, thank you for the attached picture!
It looks like that is the creation of a dataflow job via the Cloud Console. Is it possible to set the SDK Image using gcloud dataflow flex-template build
?
je...@google.com <je...@google.com> #7
Hi,
It is not possible to set the SDK image using gcloud dataflow flex-template build because the launcher and SDK container image are different images, at minimum they have different entry points.
-SDK container image's entrypoint is a go binary that calls Beam SDK harness, usually not modified by the user.
-launcher image's entrypoint is a go binary that calls the user program to submit a Pipeline.
So to make things work one would need two images.
ma...@delfina.com <ma...@delfina.com> #8
Thanks for the response!
Two separate things regarding that:
- You absolutely can use the same image for both launcher and worker, I am doing that now! You do this by setting the entrypoint to the worker entrypoint, and it seems like the launcher must launch by overriding the entrypoint.
- Even if using two different images, the important thing is being able to configure both of them at the same time, so that you know you won't have version skew. e.g. being able to say
gcloud dataflow flex-template build --image=<launcher_image> --sdk_image=<worker_image>
ma...@delfina.com <ma...@delfina.com> #9
For more context, here is an official
To make the tutorial easy to mimic, it over-simplifies some things that are really important at enterprise scale. One of the things that it over-simplifies is the managing of image versions. In any kind of productionized environment, the "Build the Flex Template" step and the "Run the Flex Template" step are going to happen at different times in different places, introducing this problem of version skew. In many enterprise prod environments, ours for sure, the "build" step happens as a result of code changes in our repository (via cloud build trigger) and the "run" step happens via a google cloud scheduler job defined in terraform. Those two steps have no easy way to communicate with each other, so it's hard to coordinate such that they are running compatible versions.
Take the following example:
- Pipeline as described above is pushed (gcr.io/my_pipeline:abcd) and runs hourly. Because there's no way to define the worker/SDK image at "build" time, the cloud scheduler job instructs data flow to use the latest version of the SDK image (gcr.io/my_pipeline:latest).
- An engineer upgrades a third party dependency
foo
from version 1.0 to 2.0. That includes a breaking change wherefoo.read_data
's kwargdelete_after_read
changed from False, to True, so the engineer made sure to update every instance offoo.read_data
to setdelete_after_read
to False in the same commit. - A dataflow pipeline kicks off with launcher from gcr.io/my_pipeline:abcd
- Before the workers are started, a rebuild completes of the breaking change updating
foo
and gets tagged as :latest - The pipeline now sends code intended for
foo
at version1.0
to a worker which hasfoo
at version2.0
installed, and so every call tofoo.read_data
unintentionally deletes data.
je...@google.com <je...@google.com> #10
Hi,
Thanks for the information.
Could you please confirm if your issue is resolved?
ma...@delfina.com <ma...@delfina.com> #11
My issue is not resolved, could you please escalate this?
je...@google.com <je...@google.com> #12
Hello,
Thank you for contacting the Google Cloud support team.
I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking
Please note that the Issue Tracker is primarily meant for reporting commonly observed issues and requesting new features. For individual support issues, it is best to utilize the support ticketing system. I'm going to close this issue which will no longer be monitored. If you have any additional issues or concerns, please don’t hesitate to create a new issue on the
We appreciate your cooperation. Thank you!
ma...@delfina.com <ma...@delfina.com> #13
Hi! I do not need support, this is a feature request as detailed in my comments above. The current configuration options available in dataflow do not easily allow updating jobs without version skew between the launcher and worker images. Rethinking those configuration options is the feature request.
I have provided a detailed example of how such version skew could happen and the potentially serious consequences it could have, and would like the engineering team to be aware of this.
Description
Problem you have encountered:
When building a flex template, you can configure the base image (via
gcloud dataflow flex-template build --image
), but not the SDK image. The SDK image needs to be sent as a runtime flag.For jobs that use the same image for both base and SDK, we don't want any version skew, i.e. we want to ensure the base and SDK are identical. But, the runtime environment may not necessarily know exactly what image is in the template (that's the whole point of the template after all), and so is stuck either trying to read and parse the template, or using :latest which risks version skew.
E.g. our flow is: Create a cloud build trigger to build both container and flex template on changes to our release branch In terraform, create a cloud scheduler job to send an HTTP request to launch a dataflow job with that template periodically
That scheduler job has no clue what version is pushed to the template, but that's where the sdk image flag needs to be set. Moving it into the template would let us set both at the same time.
What you expected to happen:
Base image and SDK image should be configurable in the same location to easily avoid skew.