Feature #9414: Reuse archive with source files between different error traces on report uploading - Klever - Open-Source Projects

Actions

Copy link

Feature #9414

closed

Reuse archive with source files between different error traces on report uploading

Added by Vitaly Mordan over 6 years ago. Updated over 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Evgeny Novikov

Category:

Target version:

3.0

Start date:

01/25/2019

Due date:

07/02/2019

% Done:

100%

Estimated time:

(Total: 0.00 h)

Published in build:

Description

Feature #8704 allowed to separated source files from error traces and put them into a single archive, which can be used from any error trace in report to be uploaded.
This helps to reduce report size in several times.
But during its uploading Klever duplicate archive with source file for each error trace in bridge/media/Unsafes/Sources, and thus the original goal (which is to reduce amount of uploaded sources) is not satisfied, feature #8704 is not working as intended.
In order to solve this problem, archive with the same source files should be reused between different error traces not only inside report to be uploaded, but also after uploading.
For example, here is a fragment of report:

{
    "attrs": [
        {
            "name": "Id",
            "value": "1" 
        }
    ],
    "error traces": ["unsafe_1.zip"],
    "id": "/CPAchecker/unsafe_1",
    "parent id": "/CPAchecker",
    "sources": "sources.zip",
    "type": "unsafe" 
},
{
    "attrs": [
        {
            "name": "Id",
            "value": "2" 
        }
    ],
    "error traces": ["unsafe_2.zip"],
    "id": "/CPAchecker/unsafe_2",
    "parent id": "/CPAchecker",
    "sources": "sources.zip",
    "type": "unsafe" 
}

After uploading there should be only one archive with source files, which is reuse by both error traces.
Note, that we cannot merge those two nodes into

{
    "attrs": [
        {
            "name": "Id",
            "value": "1" 
        }
    ],
    "error traces": ["unsafe_1.zip", "unsafe_2.zip"],
    "id": "/CPAchecker/unsafe_1",
    "parent id": "/CPAchecker",
    "sources": "sources.zip",
    "type": "unsafe" 
}

because they have different attributes, some of which will be used during filtration of error traces.

Subtasks 2 (0 open — 2 closed)

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Vitaly Mordan over 6 years ago

Related to Feature #8704: Support processing of multiple error traces per one unsafe report added

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Status changed from New to Rejected

There is not this issue, since you do can specify various attributes for different error traces within one report. Klever already does this - you can find that Klever does not report any attributes for error traces in reports explicitly. The trick is including Klever attributes within witnesses. Witnesses with data races started to use this trick a long ago, so, you can learn the format from them.

Actions

Copy link

Updated by Pavel Andrianov over 6 years ago

Some attributes can be related not to a witness, but to a whole launch. So, the feature is only not to clone the same "sources.zip", if it occurs several times.

Actions

Copy link

Updated by Vitaly Mordan over 6 years ago

Now each unsafe node is tied to verification node (with resources, logs, etc.).
Unlike variable names in races, some attributes do not relate to witness at all.
Do you suggest to create a single node for all error trace to lose all that information and components hierarchy?
This is very bad suggestion.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Pavel Andrianov wrote:

Some attributes can be related not to a witness, but to a whole launch. So, the feature is only not to clone the same "sources.zip", if it occurs several times.

If you need the same attributes for all underlying reports, you need to upload them within a parent report, most likely within verification reports.

As far as I understand, very old Bridge did detect file duplicates, but it was quite hard to do.

Actions

Copy link

Updated by Evgeny Novikov over 6 years ago

Vitaly Mordan wrote:

Now each unsafe node is tied to verification node (with resources, logs, etc.).
Unlike variable names in races, some attributes do not relate to witness at all.
Do you suggest to create a single node for all error trace to lose all that information and components hierarchy?
This is very bad suggestion.

What is an unsafe node? What is a verification node?

If attributes are not related with witnesses, you should not upload them with help of unsafe reports.

Actions

Copy link

Updated by Pavel Andrianov over 6 years ago

If you need the same attributes for all underlying reports, you need to upload them within a parent report, most likely within verification reports.

Yes, but in this case we have to create several unsafe reports with the same "source.zip", which will be cloned by bridge.

As far as I understand, very old Bridge did detect file duplicates, but it was quite hard to do.

We just request to resolve file names, so if "source.zip" is contained in several unsafe reports, it should be cloned, while uploading.

Actions

Copy link

Updated by Pavel Andrianov over 6 years ago

What is an unsafe node? What is a verification node?

This is unsafe report and verification report.

Actions

Copy link

Updated by Vitaly Mordan over 6 years ago

For example, we have the following hierarchy of nodes in report:
-root
--Component1
--Component2
---CPAchecker_1
----unsafe (sources=sources.zip)
----unknown
---CPAchecker_N
----unsafe (sources=sources.zip)
----unknown

If you need the same attributes for all underlying reports, you need to upload them within a parent report, most likely within verification reports.

Here we need attributes, resources, logs and such of verification report (CPAchecker_i) for each leaf, therefore I will not merge them.
At the same time, each unsafe report have a link to sources.zip, which should not be duplicated on uploading.

Actions

Copy link

#10

Updated by Evgeny Novikov over 6 years ago

Pavel Andrianov wrote:

If you need the same attributes for all underlying reports, you need to upload them within a parent report, most likely within verification reports.

Yes, but in this case we have to create several unsafe reports with the same "source.zip", which will be cloned by bridge.

Can't you create an intermediate reports of another type? E.g.:

verification report with common attributes:
    unsafe report with common sources and one or more error traces with specific attributes

If you can't, Bridge should support this rather than the strange thing suggested below.

As far as I understand, very old Bridge did detect file duplicates, but it was quite hard to do.

We just request to resolve file names, so if "source.zip" is contained in several unsafe reports, it should be cloned, while uploading.

But there can be files with the same names but various contents. So, it seems you suggest a workaround rather than to do things in the right way.

Actions

Copy link

#11

Updated by Vitaly Mordan over 6 years ago

Evgeny Novikov wrote:

But there can be files with the same names but various contents. So, it seems you suggest a workaround rather than to do things in the right way.

This is not possible, since each archive contains absolute paths to the source files.
If you need different sources, you can create several archives and add links to them:
-unsafe (sources=sources_1.zip, trace=t_1.zip)
-unsafe (sources=sources_1.zip, trace=t_2.zip)
-unsafe (sources=sources_2.zip, trace=t_3.zip)
Again, in this case you will need 2 archives with sources (not 3).

Actions

Copy link

#12

Updated by Pavel Andrianov over 6 years ago

Can't you create an intermediate reports of another type?

We do it already (see example from Vitaly), but it does not help for the problem: anyway we need to create several different unsafe reports with the same "sources.zip".

But there can be files with the same names but various contents.

I hardly can imagine this. How can an archive contain two different files with the same name?

I can suggest another idea, if you liked it more. Could the Bridge get sources not only from the unsafe report, but also from parent ones. So, if the sources are specified for a common parent, it is considered as it is specified for all child reports. It is important, that in this case the sources are not duplicated for all children.

Actions

Copy link

#13

Updated by Evgeny Novikov over 6 years ago

Status changed from Rejected to Open
Assignee set to Vladimir Gratinskiy
Priority changed from Normal to Urgent
Target version set to 3.0

Pavel Andrianov wrote:

Can't you create an intermediate reports of another type?

We do it already (see example from Vitaly), but it does not help for the problem: anyway we need to create several different unsafe reports with the same "sources.zip".

Now I catch the issue at last. First of all, I am not sure that you prepare reports in the proper way. As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?). If you violate this assumption, you should expect various issues in Bridge, e.g. comparison of verification results will not work. I am wondering, why Bridge does not reject your reports - this should be fixed.

But when you will upload reports properly, this will not help to solve the issue. Below you suggested a nice way to fix it.

But there can be files with the same names but various contents.

I hardly can imagine this. How can an archive contain two different files with the same name?

I can suggest another idea, if you liked it more. Could the Bridge get sources not only from the unsafe report, but also from parent ones. So, if the sources are specified for a common parent, it is considered as it is specified for all child reports. It is important, that in this case the sources are not duplicated for all children.

After all there should be something like:

Core report
    Component1 report
    Component2 report
        sources.zip # This is a new feature to be supported by Bridge. It should use this archive for all error traces in subtree below. 
        Component2-1 report
            Verification1 report
                Attributes1
                Log1
                Resources1
                Unsafe1 report
        Component2-2 report
            Verification2 report
                Attributes2
                Log2
                Resources2
                Unsafe2 report
                Unknown2 report

Please, not that for comparison Bridge will consider unique tuples of values of cumulative verification report attributes marked to be used for comparison. For instance, if "Verification1 report" has attribute A with value A1 and one Unsafe report, and "Verification2 report" has attribute A with value A2, one Unsafe report and one Unknown report, and attribute A is marked to be used for comparison, then Bridge will compare these verification results with other ones by constructing tuples (A1) and (A2) for which total verdicts will be "Found all unsafes" and "Found not all unsafes". If you will report verification results using another approach to have some other features, you will lose an ability to compare verification results and perhaps something else.

Actions

Copy link

#14

Updated by Pavel Andrianov over 6 years ago

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?)

Neither specification format, nor implementation has such restriction. Anyway, a single component (Verifier) can be launched several times in a one launch, so, it produces several verification reports. Thus I do not consider the restriction to be relevant.

After all there should be something like:

I am not against the new features as in your example. I would just say that add several lines to check the name and not to clone it is much more easier than to implement such large feature. For this particular problem it looks like "To use a sledge-hammer to crack a nut."

Actions

Copy link

#15

Updated by Evgeny Novikov over 6 years ago

Status changed from Open to Feedback

Pavel Andrianov wrote:

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?)

Neither specification format, nor implementation has such restriction. Anyway, a single component (Verifier) can be launched several times in a one launch, so, it produces several verification reports. Thus I do not consider the restriction to be relevant.

Let's wait an answer from Vladimir. For me it seems to be a bug in Bridge since it does not make corresponding jobs corrupted due to incorrect reports. And later different operations may fail unexpectedly, in particular after updating Bridge.

Klever was never intended for things you are talking about. Of course, you can do very many various ones, e.g. launch various verifiers using conditions and maybe loops, use different configurations and so on, but then you have to prepare the only verification report with all verification results as its children. According to this it looks like the original issue is not relevant for Klever at all.

BTW, if you want to describe some useful information about your complex verification strategy, you can use advanced features of attributes, in particular data attributes, and upload many files rather than one. If you dislike this approach, do not expect that you will be able to use Bridge safely.

Actions

Copy link

#16

Updated by Pavel Andrianov over 6 years ago

then you have to prepare the only verification report with all verification results as its children.

Then I have no possibility to describe all verifiers I used, as I have to prepare only one verification report per all launches. For example, one of the launch failed with exception, the other found several unsafes and finish with timeout and the last one suddenly proved true. Now there is a restriction, that a safe verdict can not be combined with unsafe or unknown, which is reasonable. Also, there are different resources, logs and so on. It is not correct to put all this data into different attributes of unsafe reports. Thus I still consider the restriction 'only one verification report' as irrelevant.

Actions

Copy link

#17

Updated by Vladimir Gratinskiy over 6 years ago

Evgeny Novikov wrote:

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?).

There is no such restriction.

Actions

Copy link

#18

Updated by Evgeny Novikov over 6 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?).

There is no such restriction.

Okay. Does Bridge compare verification results on the basis of verification reports? That's it calculates a total verdict for a given verification report and all its children?

Actions

Copy link

#19

Updated by Evgeny Novikov over 6 years ago

Pavel Andrianov wrote:

then you have to prepare the only verification report with all verification results as its children.

Then I have no possibility to describe all verifiers I used, as I have to prepare only one verification report per all launches. For example, one of the launch failed with exception, the other found several unsafes and finish with timeout and the last one suddenly proved true. Now there is a restriction, that a safe verdict can not be combined with unsafe or unknown, which is reasonable. Also, there are different resources, logs and so on. It is not correct to put all this data into different attributes of unsafe reports. Thus I still consider the restriction 'only one verification report' as irrelevant.

Of course, users are interested in verification results first of all. Thus, you need to prepare them in a standard way. For providing data that can be interested for verifier developers there are different means in Klever that I already described. I am not sure that you was aware about them.

Actions

Copy link

#20

Updated by Pavel Andrianov over 6 years ago

For providing data that can be interested for verifier developers there are different means

I am not speaking about developers. This is a practical issue, when verification task is prepared once and then different configurations are used (memory check with SMG, race check with CPALockator). We do not include any developer-specific features in the report.

Actions

Copy link

#21

Updated by Vladimir Gratinskiy over 6 years ago

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?).

There is no such restriction.

Okay. Does Bridge compare verification results on the basis of verification reports? That's it calculates a total verdict for a given verification report and all its children?

It calculates total verdict on the basis of attributes values. Each leaf (unsafe/safe/unknown) has its own attributes that are copied from all parents branch (including verification report).

Actions

Copy link

#22

Updated by Evgeny Novikov over 6 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

As far as I rememember there can be the only verification report for the same parent (Vladimir, am I right?).

There is no such restriction.

Okay. Does Bridge compare verification results on the basis of verification reports? That's it calculates a total verdict for a given verification report and all its children?

It calculates total verdict on the basis of attributes values. Each leaf (unsafe/safe/unknown) has its own attributes that are copied from all parents branch (including verification report).

I understand that. But I wanted to clarify, how comparison works. If I don't mistake, for each verification report Bridge calculates a total verdict on the basis of its child reports, e.g. "Safe" or "Found all unsafes". And then it compares total verdicts for equal values of attributes that are marked to be used for comparison. Sometimes there may be non-matching attribute values.

Actions

Copy link

#23

Updated by Evgeny Novikov over 6 years ago

Status changed from Feedback to Open

Pavel Andrianov wrote:

For providing data that can be interested for verifier developers there are different means

I am not speaking about developers. This is a practical issue, when verification task is prepared once and then different configurations are used (memory check with SMG, race check with CPALockator). We do not include any developer-specific features in the report.

It seems that you described the situation inaccurately. Indeed, you have various verification tasks since you check various specifications for target programs. And you prepare different verification reports for different verification tasks where different instances of a verifier are used for solving those tasks. This matches the common Klever workflow exactly. The only difference is that you have the same sources always while Klever usually has different sources for each verification report. The latter is the case for verification of Linux loadable kernel modules, but in other use cases this can differ. That's why I will try to suggest the general solution of this problem.

Let's consider the following example:

Core report
    sources.zip
    Component1 report
    Component2 report
        sources2.zip
        Verification1 report
            Attributes1 # For instance, "requirement" is "memory safety" in the proper form.
            Log1
            Resources1
            Unsafe1 report
        Verification2 report
            sources3.zip
            Attributes2 # For instance, "requirement" is "concurrency safety" in the proper form.
            Log2
            Resources2
            Unsafe2 report
            Unknown2 report

In this case Bridge should use sources1.zip and sources2.zip together for error traces of "Unsafe1 report", and it should use sources1.zip, sources2.zip and sources3.zip together for error traces of "Unsafe2 report". Using this feature one will be able to report common sources once. Possible caveats:

There can be the same source files with the same or different contents at various levels. I suggest to use those sources that are at a more deep level in this case.
Sources can be uploaded just with component or verification start reports. That's why there won't be data misses and data races when required sources for a child report was not uploaded for its parent or they are uploaded concurrently.
Maybe some more tricky issues.

Do everybody agree that this is a good feature to be supported?

Actions

Copy link

#24