Project

General

Profile

Feature #8335

Similarity management

Added by Pavel Andrianov over 2 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Bridge
Target version:
Start date:
08/11/2017
Due date:
07/11/2019
% Done:

100%

Estimated time:
Published in build:

Description

Now conclusions based on similarity are hard-coded: if a report is more than 0% similar to an existing mark, the mark is applied to the report, the report is shown as marked on the Job page and so on. For example, one does not consider a report with similarity 50% as a bug. There is no way to affect it: only one view on report page, where similarity may be disabled or enabled. For general user it is useful to have an option, what percentage of similarity should be treated, and this percentage should be applied to Job page, to Job tree page etc. Views on report pages may be also useful, so if I set marks application to 100% similarity by default, using special view on a some unsafe report I may also look on 50% similar marks.
I do not know now should the option be an attribute of a user or, may be, a (user, job) pair.


Related issues

Related to Klever - Feature #8338: Add ability to specify similarity thresholdResolved08/11/201707/02/2019

Actions
Related to Klever - Feature #9412: Support more advanced calculation of total verdicts and similarityResolved12/12/201807/10/2019

Actions

History

#1

Updated by Vadim Mutilin over 2 years ago

The priority should be high.
Now we can't tell trusted marks with 100% similarity from suspicious ones with 50% similarity

#2

Updated by Evgeny Novikov over 2 years ago

  • Category set to Bridge
  • Assignee set to Vladimir Gratinskiy
  • Priority changed from Normal to High

Pavel Andrianov wrote:

Now conclusions based on similarity are hard-coded: if a report is more than 0% similar to an existing mark, the mark is applied to the report, the report is shown as marked on the Job page and so on. For example, one does not consider a report with similarity 50% as a bug. There is no way to affect it: only one view on report page, where similarity may be disabled or enabled. For general user it is useful to have an option, what percentage of similarity should be treated, and this percentage should be applied to Job page, to Job tree page etc. Views on report pages may be also useful, so if I set marks application to 100% similarity by default, using special view on a some unsafe report I may also look on 50% similar marks.

Vadim Mutilin wrote:

Now we can't tell trusted marks with 100% similarity from suspicious ones with 50% similarity

Indeed there is the only corner case which is likely correct: marks can be automatically treated irrelevant when a similarity is 0 (also they are automatically treated irrelevant when attributes like verification object or rule specification don't match). All other cases in ideal need manual consideration and confirmation since a similarity of neither, say, 20% nor 100% guarantees nothing - error trace comparison criteria usually catch (much) more than intended and no one should rely upon them blindly. So the proper solution, confirmation, was proposed and implemented.

Unfortunately so called "general" users are actually lazy users which don't want to review associated marks carefully to perform proper marks and reports management (some marks can be duplicates, some require refinement, some similar reports can highlight tricky bugs in code and so on). Instead they want to do less and get nice reports faster. For such the users it should be personal, not global settings, perhaps even different for various jobs since sometimes they can spend a bit more time for more careful evaluation.
Pavel Andrianov wrote:

I do not know now should the option be an attribute of a user or, may be, a (user, job) pair.

From the one side this matches a current view conception (except for relations with particular jobs) when one sets up data representation in accordance with one's purposes. This is even partially implemented since at the unsafe page you can hide all associated marks with similarity between 0% and 100%. From the other side the situation is much worse since it isn't restricted just with simple data representation - caches should be (re)calculated in accordance with user specific settings. For instance, assume one will change its personal mark similarity acceptance range from default +0% to, say, 80%. Then for such the user all caches related with associated marks (e.g. total verdicts shown at the unsafes page and statistics on them shown at the job and jobs tree pages) should be calculated again and later corresponding cached data should shown individually for the given user. Likely this also can be treated as a view but more complicated than the existing ones.

Technically this is possible although looks to be quite hard for implementation since it touches many things. Philosophically this freedom will result in many difficulties from my point of view since different users will see various statistics. For instance, users with default +0% can see many incompatible marks in statistics and more unconsidered marks at unsafe pages. So, at least it looks like the similarity rate should be specified for particular jobs rather than particular users. Worse is that after some time a selected rate can become bad. For instance, one will select 80% and suddenly will get good statistics (which also doesn't necessary happen). Later new marks which aren't appropriate indeed can damage this statistics since they can have more similarity rate. So one still will require to do something with that mess and this can be even harder than moving in the only proper way mentioned above. Perhaps this doesn't matter much such the requested functionality is most likely useful to just quickly analyze verification results of some job and forget about these efforts soon as well as all other efforts made during previous analyses.

Vadim Mutilin wrote:

The priority should be high.

The solution isn't obvious while implementation looks to be hard anyway, so, I don't think that this should/will be implemented soon.

#3

Updated by Vadim Mutilin over 2 years ago

20% nor 100% guarantees nothing

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

#4

Updated by Vladimir Gratinskiy over 2 years ago

Vadim Mutilin wrote:

20% nor 100% guarantees nothing

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

The solution is to add more error trace comparison functions that returns either 0% or 100%.

#5

Updated by Vadim Mutilin over 2 years ago

The solution is to add more error trace comparison functions that returns either 0% or 100%.

As far as I understood percentage is outside the criteria selected by the user (like all_forests_compare).
It is calculated somehow later. So we can't just add new criteria

#6

Updated by Evgeny Novikov over 2 years ago

Vadim Mutilin wrote:

20% nor 100% guarantees nothing

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

This doesn't match the original issue description where an option with a threshold similarity was suggested. Anyway 100% also doesn't guarantee nothing except a given error trace matches a given pattern in accordance with a given comparison criterion. So either reformulate the issue or open a new one. They assume completely various solutions.

#7

Updated by Evgeny Novikov over 2 years ago

Vadim Mutilin wrote:

The solution is to add more error trace comparison functions that returns either 0% or 100%.

As far as I understood percentage is outside the criteria selected by the user (like all_forests_compare).
It is calculated somehow later. So we can't just add new criteria

AFAIK percentages are returned by corresponding functions. So, it is trivial to either have more functions that in addition filter out all values rather than 100% or to just have a separate option to do that with normal functions. Anyway this is another issue.

#8

Updated by Vladimir Gratinskiy over 2 years ago

I've added new functions in branch feature_8335 - "call_forests_compare_simple" and "forests_cb_compare_simple".

#9

Updated by Vadim Mutilin over 2 years ago

"call_forests_compare_simple" and "forests_cb_compare_simple"

how they are related to "all_forests_compare" which we use for races?

#10

Updated by Evgeny Novikov over 2 years ago

Vadim Mutilin wrote:

"call_forests_compare_simple" and "forests_cb_compare_simple"

how they are related to "all_forests_compare" which we use for races?

Prior to other discussions reformulate the given issue or open the one! The new one is preferable since this one also can exist although it won't be implemented ever.

#11

Updated by Vadim Mutilin over 2 years ago

Evgeny Novikov wrote:

Vadim Mutilin wrote:

20% nor 100% guarantees nothing

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

This doesn't match the original issue description where an option with a threshold similarity was suggested.

Pavel described a generalization. I assume that there are may be a criterion where you can be sure that it is reached by achieving 80%

#12

Updated by Evgeny Novikov over 2 years ago

Vadim Mutilin wrote:

Evgeny Novikov wrote:

Vadim Mutilin wrote:

20% nor 100% guarantees nothing

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

This doesn't match the original issue description where an option with a threshold similarity was suggested.

Pavel described a generalization. I assume that there are may be a criterion where you can be sure that it is reached by achieving 80%

Not at all. Pavel suggested to have a new option particular for a user or/and a job to do this. You suggest to change criteria that is completely another thing. Both features can be implemented but having other criteria is extremely simpler than having the new option.

#13

Updated by Vadim Mutilin over 2 years ago

You suggest to change criteria that is completely another thing.

To be correct, it was suggested by Vladimir

I see two ways to implement similarity management
  1. with the help of parameterized criteria
  2. new options for the view

Pavel, what do you think, does parameterized criteria fulfill similarity management request?

#14

Updated by Vladimir Gratinskiy over 2 years ago

Vadim Mutilin wrote:

"call_forests_compare_simple" and "forests_cb_compare_simple"

how they are related to "all_forests_compare" which we use for races?

There is no such fucntion in master. When it will then I add new comparison function.

#15

Updated by Evgeny Novikov over 2 years ago

  • Priority changed from High to Normal

Vadim Mutilin wrote:

You suggest to change criteria that is completely another thing.

To be correct, it was suggested by Vladimir

This obviously follows your note that contradicts an original issue description:
Vadim Mutilin wrote:

100% guaranties that my criteria is totaly agree with the trace, so I can trust it if I trust the criteria. Thats what I want. The meaning of 20% I do not understand.

I see two ways to implement similarity management
  1. with the help of parameterized criteria
  2. new options for the view

Pavel, what do you think, does parameterized criteria fulfill similarity management request?

These ways are very different and both can be supported indeed. I will open a new issue myself. This one shouldn't be implemented.

#16

Updated by Evgeny Novikov 11 months ago

Do you need this in case when #9412 will be implemented?

#17

Updated by Evgeny Novikov 11 months ago

  • Related to Feature #9412: Support more advanced calculation of total verdicts and similarity added
#18

Updated by Evgeny Novikov 11 months ago

  • Status changed from New to Feedback
#19

Updated by Evgeny Novikov 11 months ago

  • Target version set to 3.0
  • Priority changed from Normal to Urgent
  • Status changed from Feedback to Open

Let's do this together with #9412 and #8338.

#20

Updated by Vladimir Gratinskiy 4 months ago

  • Status changed from Open to Feedback

What exactly should be done here? I don't think calculating caches individually for each user is possible in near future, so both "the option be an attribute of a user or, may be, a (user, job) pair" are impossible.

#21

Updated by Pavel Andrianov 4 months ago

This is just optimization for #8338 to simplify manual setting of the threshold. So, this is not a special view for every user and you do not need to modify caches. While creating a mark, the corresponding threshold may be set manually (#8338) or may be copied from default value: user settings or (user, job) settings. For example, first, the value of threshold is get from user settings, but if it was adjusted in job settings, the main value is job one.

#22

Updated by Vladimir Gratinskiy 4 months ago

  • % Done changed from 0 to 100
  • Status changed from Feedback to Resolved
  • Due date set to 07/11/2019

Implemented in bridge-3.0.

I've done it for "user" settings only because of:

Unfortunately so called "general" users are actually lazy users

And it will be some more actions to set default threshold for each job.

The default threshold value will be used for inline created unsafe marks and will be set by default on fullweight mark creation page.

Also available in: Atom PDF