Project

General

Profile

Feature #7894

Bypass issues with badly uploaded ZIP archives

Added by Vitaly Mordan over 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Immediate
Category:
Bridge
Target version:
Start date:
01/24/2017
Due date:
% Done:

0%

Estimated time:
Published in build:

Description

Some time ago an additional check of report archives were introduced in Bridge. Indeed it just checks that archives are ZIP ones by using standard zipfile.is_zipfile(). Unfortunately, this check fails seldom and nondeterministically.

I suggest to rollback all changes made during processing of requests prior detecting of bad ZIP archives (it would be much better to check them before any changes) and return an error rather than to fail a whole job decision. Core and Native Scheduler Task Worker will be able to send corresponding requests more times in such the case.

This is or course a crucial bug existing for a long time (perhaps forever).


Related issues

Related to Klever - Bug #8387: Repeat requests resulting in bad ZIP archive error in Bridge or delivering bad ZIP archivesClosed08/25/2017

Actions
Related to Klever - Feature #8386: Rework tests for BridgeClosed08/25/201711/24/2017

Actions
Related to Klever - Feature #8662: Workaround authentification issuesClosed01/14/2018

Actions

History

#1

Updated by Evgeny Novikov over 2 years ago

  • Priority changed from High to Urgent

This is a payment for more optimal data exchange. Indeed quite many changes should be done to get rid of this issue since everybody has to resend requests with zip archives that are broken while all receivers also should take care about this (they should reset all changes made before broken archives are encountered and also they should report corresponding errors for senders).

#2

Updated by Evgeny Novikov over 2 years ago

  • Status changed from New to Rejected

I don't see this issue anymore.

#3

Updated by Evgeny Novikov almost 2 years ago

  • Subject changed from Reading zip archives is broken to Non-synchronized zip archives
  • Status changed from Rejected to Open
  • Assignee set to Evgeny Novikov
  • Priority changed from Urgent to Immediate

It looks like the issue happens very occasionally due to not all data is written to archives but they are opened and read by Bridge as streams. So we need to ensure before the latter that archives are written to a disk.

The issue exists for a very long time - it wasn't introduced recently.

#4

Updated by Evgeny Novikov almost 2 years ago

The fix suggested in 778b8b1 of branch fix-broken-archives didn't help. So, perhaps the issue is related with sending file streams over a network.

#5

Updated by Evgeny Novikov almost 2 years ago

  • Subject changed from Non-synchronized zip archives to Possible race when uploading reports
  • Description updated (diff)
  • Category set to Bridge
  • Assignee changed from Evgeny Novikov to Vladimir Gratinskiy

BTW, I removed the original description since we didn't encounter exactly those issues for a long time. So, most likely they were fixed.

#6

Updated by Evgeny Novikov almost 2 years ago

  • Priority changed from Immediate to Urgent

Because of this issue isn't a regression and in most cases it affects an incredibly minimal number of unimportant verification results and hopefully doesn't crash anything, we decide to postpone both its fixing and additional testing until next releases.

#7

Updated by Evgeny Novikov almost 2 years ago

Sometimes broken archives can prevent association of unknown reports with unknown marks:

[23.Aug.2017 11:43:43] The archive "/var/www/klever-bridge/media/Unknowns/2017/08/data_xgPdWEe.zip" of report "//AVTG/crypto/algif_skcipher.ko/generic:memory/EMG/unknown" is not a ZIP file
==================================================
[23.Aug.2017 11:43:43] Can't get problem desc for unknown '41703': File is not a zip file
Traceback (most recent call last):
  File "/var/www/klever-bridge/marks/UnknownUtils.py", line 220, in __connect
    problem_desc = ArchiveFileContent(self.report, self.report.problem_description).content.decode('utf8')
  File "/var/www/klever-bridge/bridge/utils.py", line 155, in __init__
    self.content = self.__extract_file_content()
  File "/var/www/klever-bridge/bridge/utils.py", line 161, in __extract_file_content
    with zipfile.ZipFile(fp, 'r') as zfp:
  File "/usr/lib/python3.5/zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.5/zipfile.py", line 1094, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

#8

Updated by Evgeny Novikov almost 2 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from Possible race when uploading reports to Bypass issues with badly uploaded ZIP archives
  • Description updated (diff)
#9

Updated by Evgeny Novikov almost 2 years ago

Please, continue fixing in branch fix-broken-archives.

#10

Updated by Evgeny Novikov almost 2 years ago

With the latest commit from the branch fix-broken-archives I got the following exception when tried to send a finish report second time:

2017-08-25 11:37:27 (core.py:261) CORE ERROR> Catch exception when sending reports to Klever Bridge
Traceback (most recent call last):
  File "/home/novikov/work/klever/core/core/core.py", line 258, in send_reports
    self.session.upload_report(report_file, report_file_archives)
  File "/home/novikov/work/klever/core/core/session.py", line 146, in upload_report
    if archives else {}
  File "/home/novikov/work/klever/core/core/session.py", line 79, in __request
    'Got error "{0}" when send "{1}" request to "{2}"'.format(self.error, method, url))
core.session.BridgeError: Got error "trying to finish the finished component" when send "POST" request to "http://localhost:8998/reports/upload/" 

Please, think about my note from the issue description:

I suggest to rollback all changes made during processing of requests prior detecting of bad ZIP archives (it would be much better to check them before any changes)...

#11

Updated by Vladimir Gratinskiy almost 2 years ago

Evgeny Novikov wrote:

it would be much better to check them before any changes

Fixed.

#12

Updated by Evgeny Novikov almost 2 years ago

  • Priority changed from Urgent to Immediate

I suppose to fix the issue before an upcoming release.

#13

Updated by Evgeny Novikov almost 2 years ago

Not yet again. I got the following exception in case of broken ZIP archives:

[25.Aug.2017 12:25:02] 'UploadReport' object has no attribute 'data'
Traceback (most recent call last):
  File "/home/novikov/work/klever/bridge/reports/UploadReport.py", line 54, in __init__
    self.__check_archive(self.archive)
  File "/home/novikov/work/klever/bridge/reports/UploadReport.py", line 626, in __check_archive
    raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))
AttributeError: 'UploadReport' object has no attribute 'data'

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".
  2. Report identifiers should be initialized before usage.
#14

Updated by Evgeny Novikov almost 2 years ago

  • Target version set to 0.1
#15

Updated by Vladimir Gratinskiy almost 2 years ago

Evgeny Novikov wrote:

Not yet again. I got the following exception in case of broken ZIP archives:
[...]

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".

It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.

  1. Report identifiers should be initialized before usage.

Fixed.

#16

Updated by Evgeny Novikov almost 2 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Not yet again. I got the following exception in case of broken ZIP archives:
[...]

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".

It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.

Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.

#17

Updated by Vladimir Gratinskiy almost 2 years ago

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Not yet again. I got the following exception in case of broken ZIP archives:
[...]

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".

It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.

Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.

I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.

#18

Updated by Evgeny Novikov almost 2 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Not yet again. I got the following exception in case of broken ZIP archives:
[...]

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".

It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.

Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.

I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.

Yet again I was wrong. The proper words are "do not ignore internal Bridge errors when processing broken ZIP archives". Otherwise something can be missed, e.g. there can be no information in log files about broken archive paths and corresponding requests.

#19

Updated by Evgeny Novikov almost 2 years ago

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Not yet again. I got the following exception in case of broken ZIP archives:
[...]

So, there are two issues:
  1. All exceptions even like above are reported as "ZIP error".

It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.

Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.

I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.

Yet again I was wrong. The proper words are "do not ignore internal Bridge errors when processing broken ZIP archives". Otherwise something can be missed, e.g. there can be no information in log files about broken archive paths and corresponding requests.

Although at the moment everything works well, but you still can report some unexpected errors as ZIP errors. This isn't good since it can cause infinite loops when Core will send good archives with the same request infinitely. You can either introduce a specific exception or just use a normal function return statement to check bad archives. In this case you won't catch unexpected exceptions.

#20

Updated by Evgeny Novikov almost 2 years ago

Good news! The suggested fix does help to bypass the specified issues. So, you need to do the same things everywhere when Bridge reads file descriptors sent with help of requests. In addition, be ready to accept the same request twice when Bridge sends file descriptors to be read by Core or scheduler workers.

As I already mentioned, it's better to execute the more advanced check - zipfile.ZipFile.testzip. Although it takes more time, but we do need to ensure that everything works absolutely stable and deterministically.

Also unit tests are absolutely necessary to check these workarounds since it is almost impossible during integration testing. But perhaps they should be developed together with or after #8386.

#21

Updated by Evgeny Novikov almost 2 years ago

  • Status changed from Open to Closed

See notes in https://forge.ispras.ru/issues/8387#note-7.

Since there is no more issues for 0.1, 30e029a was marked with tag v0.1rc27. If nobody will detect new critical or regression issues during a week, I will release Klever 0.1.

#22

Updated by Evgeny Novikov over 1 year ago

  • Related to Feature #8662: Workaround authentification issues added

Also available in: Atom PDF