Project

General

Profile

Actions

Bug #9059

open

Absence of log files can result to job error

Added by Evgeny Novikov almost 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
High
Assignee:
Category:
Scheduling
Target version:
-
Start date:
07/03/2018
Due date:
% Done:

0%

Estimated time:
Detected in build:
svn
Platform:
Published in build:

Description

This is a first time when I met this situation, so, I think that it is very seldom. The log looks like:

2018-07-03 15:15:28,726 (resource_scheduler.py:534) root  INFO> Going to check that deadlock are impossible
2018-07-03 15:15:28,726 (resource_scheduler.py:206) root DEBUG> Do not submit information about the workload to Bridge as nothing has been changed
2018-07-03 15:15:28,726 (__init__.py:426) root  INFO> Start scheduling new tasks
2018-07-03 15:15:28,726 (resource_scheduler.py:534) root  INFO> Going to check that job '012fcd235a27f9ed1e08f2ac1fd316d4' does not introduce deadlocks
2018-07-03 15:15:28,727 (__init__.py:436) root  INFO> Going to start 0 new tasks and 1 jobs
2018-07-03 15:15:28,727 (__init__.py:765) root  INFO> Try to start job 012fcd235a27f9ed1e08f2ac1fd316d4
2018-07-03 15:15:28,727 (native.py:194) root DEBUG> Start solution of job '012fcd235a27f9ed1e08f2ac1fd316d4'
2018-07-03 15:15:28,727 (__init__.py:475) root DEBUG> Flush submitted tasks and jobs if necessary
2018-07-03 15:15:28,727 (native.py:515) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': Going to start a new process which will start native scheduler client
2018-07-03 15:15:28,727 (__init__.py:483) root DEBUG> Scheduler iteration has finished
2018-07-03 15:15:28,729 (native.py:517) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': get pid of the started process.
2018-07-03 15:15:28,729 (__init__.py:488) root  INFO> Do not wait besause of statuses changing
2018-07-03 15:15:28,730 (native.py:519) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': the pid is 5243.
Executor 5243: execute: '/var/lib/klever/workspace/Branches-and-Tags-Processing/src/scheduler/bin/scheduler-client job --file scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client.json'
Executor 5243: Finished command: '/var/lib/klever/workspace/Branches-and-Tags-Processing/src/scheduler/bin/scheduler-client job --file scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client.json'
2018-07-03 15:15:29,388 (native.py:524) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': join method returned 'None'.
2018-07-03 15:15:29,388 (native.py:526) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': process 5243 joined, going to check its exit code
2018-07-03 15:15:29,388 (native.py:529) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': exit code of the process 5243 is '1'
2018-07-03 15:15:29,731 (__init__.py:130) root  INFO> Start scheduling iteration with statuses exchange with the server
2018-07-03 15:15:29,731 (__init__.py:147) root  INFO> Scheduler has 0 pending, 0 processing, 0 finished and 0 error tasks
2018-07-03 15:15:29,732 (__init__.py:165) root  INFO> Scheduler has 1 pending, 0 processing, 0 finished and 0 error jobs and 0 cancelled
2018-07-03 15:15:29,732 (__init__.py:170) root DEBUG> Add task 0 error descriptions
2018-07-03 15:15:29,732 (__init__.py:177) root DEBUG> Add job 0 error descriptions
2018-07-03 15:15:29,732 (bridge.py:081) root DEBUG> Send "POST" request to "http://localhost:8998/service/get_jobs_and_tasks/" 
2018-07-03 15:15:29,792 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8998 "POST /service/get_jobs_and_tasks/ HTTP/1.1" 200 3260
2018-07-03 15:15:29,794 (__init__.py:217) root DEBUG> Remove tasks with statuses FINISHED and ERROR which have been submitted
2018-07-03 15:15:29,794 (__init__.py:223) root DEBUG> Remove jobs with statuses FINISHED and ERROR
2018-07-03 15:15:29,794 (native.py:403) root  INFO> Going to prepare execution of the job 012fcd235a27f9ed1e08f2ac1fd316d4
2018-07-03 15:15:29,794 (native.py:462) root DEBUG> Yielding result of a future object of job 012fcd235a27f9ed1e08f2ac1fd316d4
2018-07-03 15:15:29,794 (native.py:495) root WARNING> Execution of job 012fcd235a27f9ed1e08f2ac1fd316d4 terminated with an exception: Cannot find Scheduler client file with logs: 'scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client-log.log'
2018-07-03 15:15:29,794 (__init__.py:746) root ERROR> Cannot process results of task 012fcd235a27f9ed1e08f2ac1fd316d4: Execution of job 012fcd235a27f9ed1e08f2ac1fd316d4 terminated with an exception: Cannot find Scheduler client file with logs: 'scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client-log.log'
2018-07-03 15:15:29,795 (__init__.py:408) root DEBUG> Update information about available verification tools
2018-07-03 15:15:29,795 (__init__.py:418) root DEBUG> Update information about connected nodes
2018-07-03 15:15:29,795 (resource_scheduler.py:074) root DEBUG> Try to receive information about resources from controller
2018-07-03 15:15:29,796 (connectionpool.py:208) urllib3.connectionpool DEBUG> Starting new HTTP connection (1): localhost
2018-07-03 15:15:29,798 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/catalog/nodes HTTP/1.1" 200 202
2018-07-03 15:15:29,799 (adapters.py:078) consulate.adapters DEBUG> GET http://localhost:8500/v1/kv/states/ldvdev
2018-07-03 15:15:29,800 (connectionpool.py:208) urllib3.connectionpool DEBUG> Starting new HTTP connection (1): localhost
2018-07-03 15:15:29,801 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/kv/states/ldvdev HTTP/1.1" 200 701
2018-07-03 15:15:29,802 (adapters.py:078) consulate.adapters DEBUG> GET http://localhost:8500/v1/kv/states/ldvdev
2018-07-03 15:15:29,803 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/kv/states/ldvdev HTTP/1.1" 200 701
2018-07-03 15:15:29,804 (resource_scheduler.py:534) root  INFO> Going to check that deadlock are impossible
2018-07-03 15:15:29,804 (resource_scheduler.py:206) root DEBUG> Do not submit information about the workload to Bridge as nothing has been changed
2018-07-03 15:15:29,804 (__init__.py:426) root  INFO> Start scheduling new tasks
2018-07-03 15:15:29,805 (__init__.py:436) root  INFO> Going to start 0 new tasks and 0 jobs
2018-07-03 15:15:29,805 (__init__.py:475) root DEBUG> Flush submitted tasks and jobs if necessary
2018-07-03 15:15:29,805 (__init__.py:483) root DEBUG> Scheduler iteration has finished
2018-07-03 15:15:29,805 (__init__.py:488) root  INFO> Do not wait besause of statuses changing
2018-07-03 15:15:30,805 (__init__.py:130) root  INFO> Start scheduling iteration with statuses exchange with the server
2018-07-03 15:15:30,806 (__init__.py:147) root  INFO> Scheduler has 0 pending, 0 processing, 0 finished and 0 error tasks
2018-07-03 15:15:30,806 (__init__.py:165) root  INFO> Scheduler has 0 pending, 0 processing, 0 finished and 1 error jobs and 0 cancelled
2018-07-03 15:15:30,806 (__init__.py:170) root DEBUG> Add task 0 error descriptions
2018-07-03 15:15:30,806 (__init__.py:177) root DEBUG> Add job 1 error descriptions
2018-07-03 15:15:30,806 (bridge.py:081) root DEBUG> Send "POST" request to "http://localhost:8998/service/get_jobs_and_tasks/" 
2018-07-03 15:15:30,879 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8998 "POST /service/get_jobs_and_tasks/ HTTP/1.1" 200 494
2018-07-03 15:15:30,880 (__init__.py:217) root DEBUG> Remove tasks with statuses FINISHED and ERROR which have been submitted
2018-07-03 15:15:30,880 (__init__.py:223) root DEBUG> Remove jobs with statuses FINISHED and ERROR
2018-07-03 15:15:30,880 (__init__.py:225) root DEBUG> Delete job 012fcd235a27f9ed1e08f2ac1fd316d4 with status ERROR

Actions #1

Updated by Evgeny Novikov almost 6 years ago

Perhaps there is some race between canceling and starting job decisions. For instance, when one cancels a job decision some operations can be buffered, so that when starting a decision of the same job very soon operations can be intermixed.

Actions

Also available in: Atom PDF