Project

General

Profile

Actions

Bug #2727

open

Watcher reports a good verdict even though RI and DSCV failed

Added by Evgeny Novikov over 12 years ago. Updated over 12 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Start date:
04/09/2012
Due date:
% Done:

0%

Estimated time:
Detected in build:
svn
Platform:
Published in build:

Description

So, when one verifies a driver against the same kernel and rule several times in the same directory, one can see that although RI fails, RCV verifies a driver (on the basis of garbage left from previous launches). Then it leads to inconsistency in a report since RCV reports status 'Ok' while RI reports 'Fail'.

Actions #1

Updated by Evgeny Novikov over 12 years ago

  • Priority changed from Normal to High

This issue confuses new researches, especially during new models development (since rule instrumentor often fails). So, we should fix it to avoid explaination of the fact that one should clean a work directory, when it launch the same task (i.e. the same kernel, model and driver).

Actions #2

Updated by Evgeny Novikov over 12 years ago

  • Subject changed from When RI fails by some reason RCV verifies a driver nevertheless to LDV-Core reports a good verdict even though RI and DSCV failed
  • Status changed from New to Open
  • Assignee changed from Vadim Mutilin to Evgeny Novikov

I investigated that this happens when DSCV doesn't fail completely even though RI failed:

rule-instrumentor: DEBUG: The models directory specified through 'LDV_KERNEL_RULES' environment variable is '/aaa'.          
rule-instrumentor: TRACE: Check whether the models are installed properly.                                                   
Directory '/aaa' (kernel rules models) doesn't exist at /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/rule-instrumentor.pl line 1331.
dscv: WARNING: Exception occured: INTEGRATION ERROR.  Cmdline:  /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/rule-instrumentor.pl --basedir=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/ri/08_1a --rule-model=08_1a --cmdfile=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/cmd_after_deg.xml --cmdfile-out=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/cmdfiles/cmd08_1a.xml at /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/dscv line 643.   
dscv: WARNING: Fatal error.  Stopping services before reporting...                                                           
dscv: INFO: Shutting down watcher

But ldv-core doesn't analyze DSCV return value, it takes left from previous launch DSCV's report and continues work just in case when DSCV works well. I assume that this may be related with watcher since DSCV does die. So, ldv-core should take into account DSCV's return value and also produce INTEGRATION ERROR when RI and/or DSCV fail.

Actions #3

Updated by Evgeny Novikov over 12 years ago

  • Subject changed from LDV-Core reports a good verdict even though RI and DSCV failed to Watcher reports a good verdict even though RI and DSCV failed

After thorough debugging I have found that watcher is responsible for this bug. First of all, it's very difficult to understand how this program behaves. As I understand, it doesn't process children (including DSCV) return values explicitly. But when DSCV fails it "kills" watcher that launched it. Most likely this "killing" doesn't actually kill this watcher or make this watcher to understand that some bad things happened. After all, watcher reports to ldv_core that all is ok:

dscv: WARNING: Exception occured: INTEGRATION ERROR.  Cmdline:  /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/rule-instrumentor.pl --basedir=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/ri/08_1a --rule-model=08_1a --cmdfile=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/cmd_after_deg.xml --cmdfile-out=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/cmdfiles/cmd08_1a.xml at /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/dscv line 643.   
dscv: WARNING: Fatal error.  Stopping services before reporting...                                                           
dscv: INFO: Shutting down watcher                                                                                            
watcher: DEBUG: Called watcher: /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../watcher/ldv-watcher fail dscv ldv 18 dscv 11                                                                                                                      
watcher: TRACE: Checking if exists /home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/watcher/instance_pool...                                                                              
watcher: TRACE: Checking if exists /home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/watcher/key_pool...                                                                                   
watcher: TRACE: ldv,18,dscv,11: fail dscv ldv 18 dscv 11                                                                     
watcher:  INFO: Reported failure for ["ldv", "18", "dscv", "11"]                                                             
watcher: DEBUG: Watcher returns 0, waitpid: 0                                                                                
INTEGRATION ERROR.  Cmdline:  /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/rule-instrumentor.pl --basedir=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/ri/08_1a --rule-model=08_1a --cmdfile=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/cmd_after_deg.xml --cmdfile-out=/home/joker/work/14_driver/big_launch/work/current--X--drivers/hwmon/gl520sm.ko--X--defaultlinux--X--08_1a/linux/csd_deg_dscv/11/dscv_tempdir/dscv/cmdfiles/cmd08_1a.xml at /home/joker/work/14_driver/test_ldv_tools_2/ldv-core/../bin/dscv line 643.                                     
watcher: DEBUG: Watcher returns 0, waitpid: 0                                                                                
ldv-core: TRACE: Set task status to "queued" 

(when DSCV says "Shutting down watcher", it tries to kill the watcher that launched DSCV).
I wouldn't like to fix watcher internals, because of they are too tangled and hard to debug. But I will implement the following workaround. Let's ldv-core will not rely on watcher work and will remove corresponding DSCV report file before DSCV launching. Then when DSCV fails ldv-core will obtain empty report file and can interpret this as a so-called integration error.

Actions #4

Updated by Evgeny Novikov over 12 years ago

  • Assignee deleted (Evgeny Novikov)
  • Priority changed from High to Normal

Unfortunately, I cannot implement in the full way even the workaround I have proposed. d1e68fa of the master branch at least doesn't report SAFE when RI and DSCV fails.
Again a lot of problems are related with watcher and its integration with ldv-tools. I have spent much time, but didn't understand how to propagate errors properly to high-level components like ldv-core and ldv-task. Moreover the given components process errors incorrectly themselves. For instance, they sometimes ignore that children may fail. Sometimes they don't fail directly if children fail.
Although after my workaround the problem becomes not so visible, we have to do something with that. I was terrified when I have seen how watcher is integrated with ldv-tools components. IMHO it does break our well built architecture and tangles development and maintenance. So, one should spent a lot of hours to understand how to fix even such the simple bugs...
I guess that one of good diplomas can be associated with ldv-tools parallelization. One of the primary task may be to simplify considerably the parallelization interface for components.

Actions #5

Updated by Evgeny Novikov over 12 years ago

BTW, after updating from the master branch one do not need to remove 'work' directory before ldv-tools launching any more.

Actions

Also available in: Atom PDF