Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Issue affecting a 3rd party
View staffing
Description
Please provide as much information as possible. At least, this should include a description of your issue and steps to reproduce the problem. If possible please provide a summary of what steps or workarounds you have already tried, and any docs or articles you found (un)helpful.
Problem you have encountered:
User has below error message when he execute the training job with the preemptible mode.
"The replica master 0 exited with a non-zero status of 1. Termination reason: Error "
Please investigate the root cause of the error.
What you expected to happen:
working without errors.
Steps to reproduce:
the root cause would be the preemptible mode. Because it only occurs when they use the preemtible mode.
Other information (workarounds you have tried, documentation consulted, etc):
the workaround is retrying the training job.