Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Unintended behavior
View staffing
Description
Problem you have encountered:
Dataproc Spark Serverless jobs that are launched at the same time (when a two jobs are launched back to back with the time difference in milliseconds). They get the same application id.
What you expected to happen:
Each job to have a unique application id irrespective of time or the cluster it is launched on or there should be a way to customize the application id or deduplicate
Steps to reproduce:
Create a PHS location in GCS location like - "gs://dataproc-phs-bucket/phs/event/spark-job-history" and then launch multiple Dataproc Serverless Spark jobs simultaneously within milliseconds of intervals with each other. This would in turn create duplicate application id for jobs launched at the same time.
Workarounds
Add some time interval between job launches.
Add glob/* in the path for GCS bucket eg. gs://dataproc-phs-bucket/*/spark-job-history