Feature #8149: Think on proper progress evaluation when several jobs are solved at once or/and much computational resources are available - Klever - Open-Source Projects

Actions

Copy link

Feature #8149

closed

Think on proper progress evaluation when several jobs are solved at once or/and much computational resources are available

Added by Evgeny Novikov almost 7 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Ilja Zakharov

Category:

Target version:

1.0

Start date:

09/20/2017

Due date:

10/26/2017

% Done:

100%

Estimated time:

(Total: 0.00 h)

Published in build:

Description

At the moment a progress of solving verification tasks and corresponding remaining times are shown properly just when one job is solving and there are computational resources that are enough just for solving one verification task. This makes the progress and times almost useless when several jobs are solved at once or/and much computational resources are available.

Subtasks 3 (0 open — 3 closed)

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Evgeny Novikov almost 7 years ago

BTW, the same problem is for several sub-jobs.

Actions

Copy link

Updated by Evgeny Novikov almost 7 years ago

Assignee deleted (~~Ilja Zakharov~~)
Priority changed from Urgent to High

There are too many very high priority issues. This one can be done after really important features will be supported.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Assignee set to Ilja Zakharov
Priority changed from High to Urgent
Target version set to 1.0

In addition to the requested improvements it will be necessary to fix progress calculation at all (that was mentioned in #8444).

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Ilja suggests to change the progress API between Bridge and Core so that there will be additional report fields intended just for progress calculation and there shouldn't any references to concrete Core component names (something similar to #8442).

Actions

Copy link

Updated by Ilja Zakharov over 6 years ago

To restore functionality of progress evaluation I propose to do the following changes in Bridge first:
1) Implement a separate request to Bridge (not as a report) with the following data: {
"total tasks to be generated": 11111,
"tasks failed": 122,
"tasls solved": 500,
"average wall time to finish": 3766,
"wall time spend on solution": 2300
}
This request Core can do several times during the solution.
If Core has several sub-jobs it will not send any data at all or do it as there is the only sub-job.

2) Bridge should just store and visualize the data as is calculating the percentage as (failed + solved) * 100/total.

3) If Core provided data to Bridge and the job has PROCESSING status Bridge should send received data to scheduler during service/get_jobs_and_tasks request adding the following section: {
....
"jobs progress": {
"job_id": {data as is sent by Core}
}
}
Addiction of the section can be done only on update the data to economy traffic.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Ilja Zakharov wrote:

To restore functionality of progress evaluation I propose to do the following changes in Bridge first:
1) Implement a separate request to Bridge (not as a report) with the following data: {
"total tasks to be generated": 11111,
"tasks failed": 122,
"tasls solved": 500,
"average wall time to finish": 3766,
"wall time spend on solution": 2300
}

Minor improvements:
"tasks failed" -> "failed tasks"
"tasls solved" -> "solved tasks"
"average wall time to finish" -> "expected time for solving tasks"
"wall time spend on solution" -> "elapsed time for solving tasks"

This request Core can do several times during the solution.

My suggestion is to send just changed values since, say, "total tasks to be generated" will not ever change.

If Core has several sub-jobs it will not send any data at all or do it as there is the only sub-job.

So, Bridge should expect that there can be no progress reports for some verification jobs.

2) Bridge should just store and visualize the data as is calculating the percentage as (failed + solved) * 100/total.

I suggest the following formula: 100 * solved / (total - failed). This should be calculated just if failed < total. Otherwise if failed = total progress can be hidden because of tasks generation/solution finishes.

Actions

Copy link

Updated by Ilja Zakharov over 6 years ago

My suggestion is to send just changed values since, say, "total tasks to be generated" will not ever change.

I disagree with this point. Since Bridge does not need doing any complicated calculation, let's allow changing the numbers. However, I am not sure that we will change them, but anyway such artificial restrictions are really not necessary.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Ilja Zakharov wrote:

My suggestion is to send just changed values since, say, "total tasks to be generated" will not ever change.

I disagree with this point. Since Bridge does not need doing any complicated calculation, let's allow changing the numbers. However, I am not sure that we will change them, but anyway such artificial restrictions are really not necessary.

I didn't restrict any changes, although it isn't clear. I just suggested to send data incrementally, i.e. send just changed values and likely even after some configurable period of time. For instance, users can request to update a progress just each 30 seconds or each 5 minutes.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Ilja Zakharov wrote:

If Core has several sub-jobs it will not send any data at all or do it as there is the only sub-job.

I suggest a quite simple for implementation and useful for users approach to evaluate a progress for jobs with sub-jobs. Like with task let's evaluate the number of sub-jobs (total, solved, failed) and an average wall time spent on their solution. So, Bridge will show a progress of sub-jobs solution rather than tasks solution. Regarding to data there can be following additional fields:

{
   "sub-jobs to be solved": 50,
   "failed sub-jobs": 5,
   "solved sub-jobs": 25,
   "expected time for solving sub-jobs": 3766,
   "elapsed time for solving sub-jobs": 2300
}

Also, I suggest to name "total tasks to be generated" as "tasks to be generated".

Actions

Copy link

#10

Updated by Ilja Zakharov over 6 years ago

Regarding to data there can be following additional fields

Ok, lets do it also. I would propose to send the data within the same request but make it possible to either attach either data about the tasks and sub-jobs, only about sub-jobs or only about tasks. Because corresponding information core will likely calculate indifferent places.

Actions

Copy link

#11

Updated by Evgeny Novikov over 6 years ago

We need a specification that will conclude this discussion and describe all new requests and their semantics in all details.

Actions

Copy link

#12

Updated by Ilja Zakharov over 6 years ago

Lets discuss it here: https://goo.gl/H2WFsw.

Actions

Copy link

#13

Updated by Ilja Zakharov over 6 years ago

Added issue #8536 as blocking since I am doing progress calculation on base of refactoring of Job.py which I cannot finish without separate total coverage request.

Actions

Copy link

#14

Updated by Ilja Zakharov over 6 years ago

Status changed from New to Resolved

The updated progress implementation is available in branch 8149-new-progress. I will perform additional tests, but for simple examples all work nicely for both jobs with subjobs and without them.

Actions

Copy link

#15

Updated by Ilja Zakharov over 6 years ago

Status changed from Resolved to Open

Decided to update the implementation.

Actions

Copy link

#16

Updated by Ilja Zakharov over 6 years ago

Status changed from Open to Resolved

The final implementation is in branch "8149-new-progress". I see no any explicit bugs there at the moment.

Actions

Copy link

#17

Updated by Evgeny Novikov over 6 years ago

Status changed from Resolved to Closed

I merged the branch to master in 459f75e7. At last we have quite proper progress evaluation and visualization that is extremely valuable for large production jobs.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Klever

Custom queries

Feature #8149

Think on proper progress evaluation when several jobs are solved at once or/and much computational resources are available

Updated by Evgeny Novikov almost 7 years ago

Updated by Evgeny Novikov almost 7 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Evgeny Novikov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Ilja Zakharov over 6 years ago

Updated by Evgeny Novikov over 6 years ago