Project

General

Profile

Actions

Bug #8454

closed

Bridge sets status "Terminated" for properly failed jobs

Added by Evgeny Novikov over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Urgent
Category:
Bridge
Target version:
Start date:
09/22/2017
Due date:
09/22/2017
% Done:

100%

Estimated time:
Detected in build:
svn
Platform:
Published in build:

Description

It looks strange that code for processing solved and failed jobs in service.utils.FinishJobDecision#_get_status_ differs too much. In particular, when Core doesn't send its finish report and fails, Bridge decides to set status "Terminated" intended for reporting critical issues with schedulers.

Actions #1

Updated by Evgeny Novikov over 6 years ago

  • Priority changed from Normal to Urgent

This is a very unpleasant issue to be fixed soon.

Actions #2

Updated by Vladimir Gratinskiy over 6 years ago

Evgeny Novikov wrote:

It looks strange that code for processing solved and failed jobs in service.utils.FinishJobDecision#_get_status_ differs too much. In particular, when Core doesn't send its finish report and fails, Bridge decides to set status "Terminated" intended for reporting critical issues with schedulers.

If Core doesn't send finish report it is NOT properly failed Job. "Failed" status is intended only for jobs with unknown report for Core. So either "Terminated" or "Corrupted" can be set for jobs without finish Core report.

Status that function service.utils.FinishJobDecision#_get_status_ get in args is status set by scheduler, but then Bridge check everything and set proper status.

Actions #3

Updated by Evgeny Novikov over 6 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

It looks strange that code for processing solved and failed jobs in service.utils.FinishJobDecision#_get_status_ differs too much. In particular, when Core doesn't send its finish report and fails, Bridge decides to set status "Terminated" intended for reporting critical issues with schedulers.

If Core doesn't send finish report it is NOT properly failed Job. "Failed" status is intended only for jobs with unknown report for Core. So either "Terminated" or "Corrupted" can be set for jobs without finish Core report.

"Corrupted" should be used since I described above when "Terminated" can be used.

Status that function service.utils.FinishJobDecision#_get_status_ get in args is status set by scheduler, but then Bridge check everything and set proper status.

Bridge always will be a final arbiter. Moreover, it knows more than schedulers since it processes reports in addition.

Actions #4

Updated by Vladimir Gratinskiy over 6 years ago

  • Due date set to 09/22/2017
  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Fixed in fix_8454.

Actions #5

Updated by Evgeny Novikov over 6 years ago

  • Status changed from Resolved to Closed

I merged this trivial bug fix to master in 06cb460c.

Actions

Also available in: Atom PDF