Status Update
Comments
me...@google.com <me...@google.com>
ke...@spotify.com <ke...@spotify.com> #2
mb...@google.com <mb...@google.com>
mb...@google.com <mb...@google.com> #3
ke...@spotify.com <ke...@spotify.com> #4
We extract the pipeline graph and the metrics for all pipelines at Spotify in order to track various properties of our "fleet". One thing we used to be able to do and would like to do even with Runner V2 is to use these metrics to identify specific steps that have large amounts of shuffle so that we can potentially provide recommendations for optimization and cost control. At present on Runner V2 we can only identify that an entire pipeline has high shuffle/cost but cannot say, for example, "step X in pipeline Y joins a huge and a tiny dataset, use a side input"
mb...@google.com <mb...@google.com>
ap...@google.com <ap...@google.com> #5
Hi Kellen,
Sorry for the long silence. Please note that even with legacy runner ShuffleBytesRead and ShuffleBytesWritten counter are not perfectly tracking costs (they might not account for caching) and implemented only for java. Runner V2 use different model to report the counters and we prefer customers to rely on global billing counters which are guaranteed to be accurate.
We probably should be able to provide you shuffle read bytes coutner associated with the step, but its accuracy will also be limited (for instance no accounting for re-iterations). And there is no easy way to provide per step shuffle write counter.
Would such solution work for you? Alternatively you could add your own custom counters which can be presented in UI and associated wit the step, but if I understand correctly that was not an ideal option from your perspective.
Thanks,
Alex
Description
This will create a public issue which anybody can view and comment on.
Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.
Problem you have encountered: Unable to see ShuffleBytesRead and ShuffleBytesWritten metrics when querying for metrics via the Dataflow API.
What you expected to happen:See the metrics again
Steps to reproduce: Not reproducible
Other information (workarounds you have tried, documentation consulted, etc): This is created to track and get updates on the larger metrics improvement project that Dataflow engineering team have planned for later this quarter. That would lead to a delivery time probably some time in October.
Which will allow the customer to see those metrics again using Dataflow API