Feature #7894
closedBypass issues with badly uploaded ZIP archives
0%
Description
Some time ago an additional check of report archives were introduced in Bridge. Indeed it just checks that archives are ZIP ones by using standard zipfile.is_zipfile(). Unfortunately, this check fails seldom and nondeterministically.
I suggest to rollback all changes made during processing of requests prior detecting of bad ZIP archives (it would be much better to check them before any changes) and return an error rather than to fail a whole job decision. Core and Native Scheduler Task Worker will be able to send corresponding requests more times in such the case.
This is or course a crucial bug existing for a long time (perhaps forever).
Updated by Evgeny Novikov almost 8 years ago
- Priority changed from High to Urgent
This is a payment for more optimal data exchange. Indeed quite many changes should be done to get rid of this issue since everybody has to resend requests with zip archives that are broken while all receivers also should take care about this (they should reset all changes made before broken archives are encountered and also they should report corresponding errors for senders).
Updated by Evgeny Novikov over 7 years ago
- Status changed from New to Rejected
I don't see this issue anymore.
Updated by Evgeny Novikov over 7 years ago
- Subject changed from Reading zip archives is broken to Non-synchronized zip archives
- Status changed from Rejected to Open
- Assignee set to Evgeny Novikov
- Priority changed from Urgent to Immediate
It looks like the issue happens very occasionally due to not all data is written to archives but they are opened and read by Bridge as streams. So we need to ensure before the latter that archives are written to a disk.
The issue exists for a very long time - it wasn't introduced recently.
Updated by Evgeny Novikov over 7 years ago
The fix suggested in 778b8b1 of branch fix-broken-archives didn't help. So, perhaps the issue is related with sending file streams over a network.
Updated by Evgeny Novikov over 7 years ago
- Subject changed from Non-synchronized zip archives to Possible race when uploading reports
- Description updated (diff)
- Category set to Bridge
- Assignee changed from Evgeny Novikov to Vladimir Gratinskiy
BTW, I removed the original description since we didn't encounter exactly those issues for a long time. So, most likely they were fixed.
Updated by Evgeny Novikov over 7 years ago
- Priority changed from Immediate to Urgent
Because of this issue isn't a regression and in most cases it affects an incredibly minimal number of unimportant verification results and hopefully doesn't crash anything, we decide to postpone both its fixing and additional testing until next releases.
Updated by Evgeny Novikov over 7 years ago
Sometimes broken archives can prevent association of unknown reports with unknown marks:
[23.Aug.2017 11:43:43] The archive "/var/www/klever-bridge/media/Unknowns/2017/08/data_xgPdWEe.zip" of report "//AVTG/crypto/algif_skcipher.ko/generic:memory/EMG/unknown" is not a ZIP file ================================================== [23.Aug.2017 11:43:43] Can't get problem desc for unknown '41703': File is not a zip file Traceback (most recent call last): File "/var/www/klever-bridge/marks/UnknownUtils.py", line 220, in __connect problem_desc = ArchiveFileContent(self.report, self.report.problem_description).content.decode('utf8') File "/var/www/klever-bridge/bridge/utils.py", line 155, in __init__ self.content = self.__extract_file_content() File "/var/www/klever-bridge/bridge/utils.py", line 161, in __extract_file_content with zipfile.ZipFile(fp, 'r') as zfp: File "/usr/lib/python3.5/zipfile.py", line 1026, in __init__ self._RealGetContents() File "/usr/lib/python3.5/zipfile.py", line 1094, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file
Updated by Evgeny Novikov over 7 years ago
- Tracker changed from Bug to Feature
- Subject changed from Possible race when uploading reports to Bypass issues with badly uploaded ZIP archives
- Description updated (diff)
Updated by Evgeny Novikov over 7 years ago
Please, continue fixing in branch fix-broken-archives.
Updated by Evgeny Novikov over 7 years ago
With the latest commit from the branch fix-broken-archives I got the following exception when tried to send a finish report second time:
2017-08-25 11:37:27 (core.py:261) CORE ERROR> Catch exception when sending reports to Klever Bridge Traceback (most recent call last): File "/home/novikov/work/klever/core/core/core.py", line 258, in send_reports self.session.upload_report(report_file, report_file_archives) File "/home/novikov/work/klever/core/core/session.py", line 146, in upload_report if archives else {} File "/home/novikov/work/klever/core/core/session.py", line 79, in __request 'Got error "{0}" when send "{1}" request to "{2}"'.format(self.error, method, url)) core.session.BridgeError: Got error "trying to finish the finished component" when send "POST" request to "http://localhost:8998/reports/upload/"
Please, think about my note from the issue description:
I suggest to rollback all changes made during processing of requests prior detecting of bad ZIP archives (it would be much better to check them before any changes)...
Updated by Vladimir Gratinskiy over 7 years ago
Evgeny Novikov wrote:
it would be much better to check them before any changes
Fixed.
Updated by Evgeny Novikov over 7 years ago
- Priority changed from Urgent to Immediate
I suppose to fix the issue before an upcoming release.
Updated by Evgeny Novikov over 7 years ago
Not yet again. I got the following exception in case of broken ZIP archives:
[25.Aug.2017 12:25:02] 'UploadReport' object has no attribute 'data' Traceback (most recent call last): File "/home/novikov/work/klever/bridge/reports/UploadReport.py", line 54, in __init__ self.__check_archive(self.archive) File "/home/novikov/work/klever/bridge/reports/UploadReport.py", line 626, in __check_archive raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id'])) AttributeError: 'UploadReport' object has no attribute 'data'So, there are two issues:
- All exceptions even like above are reported as "ZIP error".
- Report identifiers should be initialized before usage.
Updated by Vladimir Gratinskiy over 7 years ago
Evgeny Novikov wrote:
Not yet again. I got the following exception in case of broken ZIP archives:
So, there are two issues:
[...]
- All exceptions even like above are reported as "ZIP error".
It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.
- Report identifiers should be initialized before usage.
Fixed.
Updated by Evgeny Novikov over 7 years ago
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Not yet again. I got the following exception in case of broken ZIP archives:
So, there are two issues:
[...]
- All exceptions even like above are reported as "ZIP error".
It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.
Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.
Updated by Vladimir Gratinskiy over 7 years ago
Evgeny Novikov wrote:
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Not yet again. I got the following exception in case of broken ZIP archives:
So, there are two issues:
[...]
- All exceptions even like above are reported as "ZIP error".
It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.
Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.
I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.
Updated by Evgeny Novikov over 7 years ago
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Not yet again. I got the following exception in case of broken ZIP archives:
So, there are two issues:
[...]
- All exceptions even like above are reported as "ZIP error".
It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.
Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.
I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.
Yet again I was wrong. The proper words are "do not ignore internal Bridge errors when processing broken ZIP archives". Otherwise something can be missed, e.g. there can be no information in log files about broken archive paths and corresponding requests.
Updated by Evgeny Novikov over 7 years ago
Evgeny Novikov wrote:
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
Not yet again. I got the following exception in case of broken ZIP archives:
So, there are two issues:
[...]
- All exceptions even like above are reported as "ZIP error".
It's false as this exception can be happened only during exception "raise ValueError('The archive "%s" of report "%s" is not a ZIP file' % (arch, self.data['id']))" wich is happened if archive is broken.
Sorry, my note was formulated wrong. I supposed to report "ZIP error" just when archive is really broken and not, say, when Bridge internal errors happen. Otherwise we will fall into an infinite loop.
I understood you, but that exception could be happened only in cases when archive is broken as it is exception inside another exception.
Yet again I was wrong. The proper words are "do not ignore internal Bridge errors when processing broken ZIP archives". Otherwise something can be missed, e.g. there can be no information in log files about broken archive paths and corresponding requests.
Although at the moment everything works well, but you still can report some unexpected errors as ZIP errors. This isn't good since it can cause infinite loops when Core will send good archives with the same request infinitely. You can either introduce a specific exception or just use a normal function return statement to check bad archives. In this case you won't catch unexpected exceptions.
Updated by Evgeny Novikov over 7 years ago
Good news! The suggested fix does help to bypass the specified issues. So, you need to do the same things everywhere when Bridge reads file descriptors sent with help of requests. In addition, be ready to accept the same request twice when Bridge sends file descriptors to be read by Core or scheduler workers.
As I already mentioned, it's better to execute the more advanced check - zipfile.ZipFile.testzip. Although it takes more time, but we do need to ensure that everything works absolutely stable and deterministically.
Also unit tests are absolutely necessary to check these workarounds since it is almost impossible during integration testing. But perhaps they should be developed together with or after #8386.
Updated by Evgeny Novikov over 7 years ago
- Status changed from Open to Closed
See notes in https://forge.ispras.ru/issues/8387#note-7.
Since there is no more issues for 0.1, 30e029a was marked with tag v0.1rc27. If nobody will detect new critical or regression issues during a week, I will release Klever 0.1.
Updated by Evgeny Novikov almost 7 years ago
- Related to Feature #8662: Workaround authentification issues added