Reportedly, some VM jobs (and possibly others) get in a "stuck" state where they
don't make progress: no fraction done change, and little CPU usage.
These jobs will eventually be aborted when their elapsed time reaches the rsc_fpops_bound limit,
but this could take weeks or months depending on the limit.

Proposal: have the client try to figure out when a job is stuck.

ACTIVE_TASK new fields:  double stuck_check_elapsed_time  double stuck_check_fraction_done  double stuck_check_cpu_time  (initialize all to zero)STUCK_CHECK_POLL_PERIOD = 3600every STUCK_CHECK_POLL_PERIOD seconds   for each active task atp      if non_cpu_intensive: continue      if sporadic: continue      if atp->stuck_check_elapsed_time == 0         atp->stuck_check_elapsed_time = atp->elapsed_time         atp->stuck_check_fraction_done = atp->fraction_done         atp->stuck_check_cpu_time = atp->current_cpu_time         continue      if atp->elapsed_time < atp->stuck_check_elapsed_time + STUCK_CHECK_POLL_PERIOD        continue     if atp->stuck_check_fraction_done == atp->fraction_done        && (atp->current_cpu_time - atp->stuck_check_cpu_time < 10)        (job is stuck - print warning)     atp->stuck_check_elapsed_time = atp->elapsed_time     atp->stuck_check_fraction_done = atp->fraction_done     atp->stuck_check_cpu_time = atp->current_cpu_time

e.g. in the last hour of running, the fraction done hasn't changed,
and the incremental CPU time is < 10s.

At that point, the client could

notify the user, suggesting that they abort the job
abort the job

Let's do 1) for starters, to make sure that the logic is right,
then at some point do 2).

Metadata

Assignees

No one assigned

Status

In progress

Milestone

Client/Manager 9.2.0No due date

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stuck jobs #5352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions