Actions
Bug #9059
openAbsence of log files can result to job error
Start date:
07/03/2018
Due date:
% Done:
0%
Estimated time:
Detected in build:
svn
Platform:
Published in build:
Description
This is a first time when I met this situation, so, I think that it is very seldom. The log looks like:
2018-07-03 15:15:28,726 (resource_scheduler.py:534) root INFO> Going to check that deadlock are impossible 2018-07-03 15:15:28,726 (resource_scheduler.py:206) root DEBUG> Do not submit information about the workload to Bridge as nothing has been changed 2018-07-03 15:15:28,726 (__init__.py:426) root INFO> Start scheduling new tasks 2018-07-03 15:15:28,726 (resource_scheduler.py:534) root INFO> Going to check that job '012fcd235a27f9ed1e08f2ac1fd316d4' does not introduce deadlocks 2018-07-03 15:15:28,727 (__init__.py:436) root INFO> Going to start 0 new tasks and 1 jobs 2018-07-03 15:15:28,727 (__init__.py:765) root INFO> Try to start job 012fcd235a27f9ed1e08f2ac1fd316d4 2018-07-03 15:15:28,727 (native.py:194) root DEBUG> Start solution of job '012fcd235a27f9ed1e08f2ac1fd316d4' 2018-07-03 15:15:28,727 (__init__.py:475) root DEBUG> Flush submitted tasks and jobs if necessary 2018-07-03 15:15:28,727 (native.py:515) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': Going to start a new process which will start native scheduler client 2018-07-03 15:15:28,727 (__init__.py:483) root DEBUG> Scheduler iteration has finished 2018-07-03 15:15:28,729 (native.py:517) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': get pid of the started process. 2018-07-03 15:15:28,729 (__init__.py:488) root INFO> Do not wait besause of statuses changing 2018-07-03 15:15:28,730 (native.py:519) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': the pid is 5243. Executor 5243: execute: '/var/lib/klever/workspace/Branches-and-Tags-Processing/src/scheduler/bin/scheduler-client job --file scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client.json' Executor 5243: Finished command: '/var/lib/klever/workspace/Branches-and-Tags-Processing/src/scheduler/bin/scheduler-client job --file scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client.json' 2018-07-03 15:15:29,388 (native.py:524) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': join method returned 'None'. 2018-07-03 15:15:29,388 (native.py:526) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': process 5243 joined, going to check its exit code 2018-07-03 15:15:29,388 (native.py:529) root DEBUG> Future task '012fcd235a27f9ed1e08f2ac1fd316d4': exit code of the process 5243 is '1' 2018-07-03 15:15:29,731 (__init__.py:130) root INFO> Start scheduling iteration with statuses exchange with the server 2018-07-03 15:15:29,731 (__init__.py:147) root INFO> Scheduler has 0 pending, 0 processing, 0 finished and 0 error tasks 2018-07-03 15:15:29,732 (__init__.py:165) root INFO> Scheduler has 1 pending, 0 processing, 0 finished and 0 error jobs and 0 cancelled 2018-07-03 15:15:29,732 (__init__.py:170) root DEBUG> Add task 0 error descriptions 2018-07-03 15:15:29,732 (__init__.py:177) root DEBUG> Add job 0 error descriptions 2018-07-03 15:15:29,732 (bridge.py:081) root DEBUG> Send "POST" request to "http://localhost:8998/service/get_jobs_and_tasks/" 2018-07-03 15:15:29,792 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8998 "POST /service/get_jobs_and_tasks/ HTTP/1.1" 200 3260 2018-07-03 15:15:29,794 (__init__.py:217) root DEBUG> Remove tasks with statuses FINISHED and ERROR which have been submitted 2018-07-03 15:15:29,794 (__init__.py:223) root DEBUG> Remove jobs with statuses FINISHED and ERROR 2018-07-03 15:15:29,794 (native.py:403) root INFO> Going to prepare execution of the job 012fcd235a27f9ed1e08f2ac1fd316d4 2018-07-03 15:15:29,794 (native.py:462) root DEBUG> Yielding result of a future object of job 012fcd235a27f9ed1e08f2ac1fd316d4 2018-07-03 15:15:29,794 (native.py:495) root WARNING> Execution of job 012fcd235a27f9ed1e08f2ac1fd316d4 terminated with an exception: Cannot find Scheduler client file with logs: 'scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client-log.log' 2018-07-03 15:15:29,794 (__init__.py:746) root ERROR> Cannot process results of task 012fcd235a27f9ed1e08f2ac1fd316d4: Execution of job 012fcd235a27f9ed1e08f2ac1fd316d4 terminated with an exception: Cannot find Scheduler client file with logs: 'scheduler/jobs/012fcd235a27f9ed1e08f2ac1fd316d4/client-log.log' 2018-07-03 15:15:29,795 (__init__.py:408) root DEBUG> Update information about available verification tools 2018-07-03 15:15:29,795 (__init__.py:418) root DEBUG> Update information about connected nodes 2018-07-03 15:15:29,795 (resource_scheduler.py:074) root DEBUG> Try to receive information about resources from controller 2018-07-03 15:15:29,796 (connectionpool.py:208) urllib3.connectionpool DEBUG> Starting new HTTP connection (1): localhost 2018-07-03 15:15:29,798 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/catalog/nodes HTTP/1.1" 200 202 2018-07-03 15:15:29,799 (adapters.py:078) consulate.adapters DEBUG> GET http://localhost:8500/v1/kv/states/ldvdev 2018-07-03 15:15:29,800 (connectionpool.py:208) urllib3.connectionpool DEBUG> Starting new HTTP connection (1): localhost 2018-07-03 15:15:29,801 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/kv/states/ldvdev HTTP/1.1" 200 701 2018-07-03 15:15:29,802 (adapters.py:078) consulate.adapters DEBUG> GET http://localhost:8500/v1/kv/states/ldvdev 2018-07-03 15:15:29,803 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8500 "GET /v1/kv/states/ldvdev HTTP/1.1" 200 701 2018-07-03 15:15:29,804 (resource_scheduler.py:534) root INFO> Going to check that deadlock are impossible 2018-07-03 15:15:29,804 (resource_scheduler.py:206) root DEBUG> Do not submit information about the workload to Bridge as nothing has been changed 2018-07-03 15:15:29,804 (__init__.py:426) root INFO> Start scheduling new tasks 2018-07-03 15:15:29,805 (__init__.py:436) root INFO> Going to start 0 new tasks and 0 jobs 2018-07-03 15:15:29,805 (__init__.py:475) root DEBUG> Flush submitted tasks and jobs if necessary 2018-07-03 15:15:29,805 (__init__.py:483) root DEBUG> Scheduler iteration has finished 2018-07-03 15:15:29,805 (__init__.py:488) root INFO> Do not wait besause of statuses changing 2018-07-03 15:15:30,805 (__init__.py:130) root INFO> Start scheduling iteration with statuses exchange with the server 2018-07-03 15:15:30,806 (__init__.py:147) root INFO> Scheduler has 0 pending, 0 processing, 0 finished and 0 error tasks 2018-07-03 15:15:30,806 (__init__.py:165) root INFO> Scheduler has 0 pending, 0 processing, 0 finished and 1 error jobs and 0 cancelled 2018-07-03 15:15:30,806 (__init__.py:170) root DEBUG> Add task 0 error descriptions 2018-07-03 15:15:30,806 (__init__.py:177) root DEBUG> Add job 1 error descriptions 2018-07-03 15:15:30,806 (bridge.py:081) root DEBUG> Send "POST" request to "http://localhost:8998/service/get_jobs_and_tasks/" 2018-07-03 15:15:30,879 (connectionpool.py:396) urllib3.connectionpool DEBUG> http://localhost:8998 "POST /service/get_jobs_and_tasks/ HTTP/1.1" 200 494 2018-07-03 15:15:30,880 (__init__.py:217) root DEBUG> Remove tasks with statuses FINISHED and ERROR which have been submitted 2018-07-03 15:15:30,880 (__init__.py:223) root DEBUG> Remove jobs with statuses FINISHED and ERROR 2018-07-03 15:15:30,880 (__init__.py:225) root DEBUG> Delete job 012fcd235a27f9ed1e08f2ac1fd316d4 with status ERROR
Updated by Evgeny Novikov over 6 years ago
Perhaps there is some race between canceling and starting job decisions. For instance, when one cancels a job decision some operations can be buffered, so that when starting a decision of the same job very soon operations can be intermixed.
Actions