Job States
Commands like squeue
, sacct
and scontrol show job
will show a
state of each job. All job states are explained in the .
Here is a table with the most common ones
Name | Long name | Description |
---|---|---|
PD | Pending | Job is waiting to be started |
CF | Configuring | The queue system is starting up the job |
R | Running | Job is running |
CG | Completing | Job is finishing |
CD | Completed | Job has finished |
CA | Cancelled | Job has been cancelled, either before or after it started |
F | Failed | Job has exited with a non-zero exit status |
TO | Timeout | Job didn't finish in time, and was cancelled |
PR | Preemepted | Job was requeued because a higher priority job needed the resources |
NF | Node_fail | Job was requeued because of a problem with one of its comput nodes |
OOM | Out_of_memory | Job was cancelled because it tried to use too much memory |
The commands can also give a reason why a job is in the state it is. This is most useful for pending jobs. All these reasons are explained in the .
Here is a table with the most common ones
Name | Description |
---|---|
Resources | The job is waiting for resources to become idle |
Priority | There are jobs with higher priority than this job. The job might be started, if it does not delay any of those jobs |
ReqNodeNotAvail | One or more of the job's required nodes is currently not available, typically because it is down or reserved |
Dependency | The job is waiting for jobs it depend on to start or finish. |
JobHeldUser | The job has been put on hold by the user |
JobHeldAdmin | The job has been put on hold by an admin. Please contact support if you don't know why it is being held. |
CC Attribution: This page is maintained by the University of 探花精选 IT FFU-BT group. It has either been modified from, or is a derivative of, "" by NRIS under .