Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Request for new functionality
View staffing
Description
This will create a public issue which anybody can view and comment on.
Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.
Problem you have encountered:
Dataproc Spark Serverless jobs that are launched at the same time (when a two jobs are launched back to back with the time difference in milliseconds). They get the same application id.
What you expected to happen:
Each job to have a unique application id irrespective of time or the cluster it is launched on or there should be a way to customize the application id.
Steps to reproduce: Create a PHS location in GCS location like - "gs://dataproc-phs-bucket/phs/event/spark-job-history" and then launch multiple Dataproc Serverless Spark jobs simultaneously within milliseconds of intervals with each other. This would in turn create duplicate application id for jobs launched at the same time.
Other information (workarounds you have tried, documentation consulted, etc):
Workarounds :
gs://dataproc-phs-bucket/*/spark-job-history