Project

General

Profile

Actions

Bug #8015

closed

Properly block parallel exclusive requests

Added by Evgeny Novikov about 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Immediate
Category:
Bridge
Target version:
-
Start date:
03/07/2017
Due date:
03/13/2017
% Done:

0%

Estimated time:
Detected in build:
svn
Platform:
Published in build:

Description

At the moment Bridge doesn't likely block parallel exclusive requests that are executed by different Gunicorn workers.

Vladimir already suggested a bug fix in branch bridge-logs.

Actions #1

Updated by Evgeny Novikov about 7 years ago

It is worth noting that for the Bridge production mode requests that cause unhandled exceptions will result in corresponding so called parallel groups will hang forever until administrators will analyze and fix issues.

Actions #2

Updated by Evgeny Novikov about 7 years ago

  • Status changed from Resolved to Open

I didn't find an administrator URL to unblock Bridge in case of failures. I suppose to make it accessible just by users with role Administrator.

Actions #3

Updated by Vladimir Gratinskiy about 7 years ago

  • Due date set to 03/13/2017
  • Status changed from Open to Resolved

Can't find commit where I already added such url. I've implemented it again (/tools/manual_unlock/), but now only for administrators.

Actions #4

Updated by Evgeny Novikov about 7 years ago

  • Status changed from Resolved to Open
  • Priority changed from Immediate to High

Testing showed that there are some unexpected and unacceptable delays in requests processing.

Actions #5

Updated by Evgeny Novikov about 7 years ago

  • Priority changed from High to Urgent

I specified an incorrect priority. Of course this issue is the most important for Bridge now.

Actions #6

Updated by Vladimir Gratinskiy about 7 years ago

Evgeny Novikov wrote:

Testing showed that there are some unexpected and unacceptable delays in requests processing.

Sometimes I face such delays in other branches. So I'll consider this is problem of Django server, not in blocking requests.

Actions #7

Updated by Evgeny Novikov almost 7 years ago

  • Status changed from Open to Closed

By some reason I don't observe very long delays when editing jobs anymore. The only case is when a job has quite many files - then about 5 seconds are required. That's why I at last merge the branch to master in 3f0ad507. Don't forget to migrate your database and be sure that parallel requests are actually blocked if this is necessary.

BTW, very seldom I saw quite strange errors when I conducted a lot of huge experiments. I guess that they were exactly due to data races and now they shouldn't happen anymore.

Actions #8

Updated by Evgeny Novikov almost 7 years ago

  • Status changed from Closed to Open
  • Priority changed from Urgent to Immediate

It turned out that there a couple of issues in the new implementation. So, when some request is terminated with an internal error and a lock file is remained (this also can happen, say, when a process is forcibly killed), then several requests can try to release this lock concurrently causing exceptions like:

[02.May.2017 10:41:23] Internal Server Error: /service/update_nodes/
Traceback (most recent call last):
  File "/home/novikov/work/klever/bridge/tools/profiling.py", line 48, in lock
    with open(self.lockfile, mode='x'):
FileExistsError: [Errno 17] File exists: 'media/.lock'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/novikov/.pyenv/versions/virtual-env-3.6.1/lib/python3.6/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)
  File "/home/novikov/.pyenv/versions/virtual-env-3.6.1/lib/python3.6/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
    response = self._get_response(request)
  File "/home/novikov/.pyenv/versions/virtual-env-3.6.1/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/novikov/.pyenv/versions/virtual-env-3.6.1/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/novikov/work/klever/bridge/tools/profiling.py", line 122, in wait
    locker.lock()
  File "/home/novikov/work/klever/bridge/tools/profiling.py", line 55, in lock
    os.remove(self.lockfile)
FileNotFoundError: [Errno 2] No such file or directory: 'media/.lock'

Another issue is that when some request really needs much time (more than hard coded 30 seconds) other requests will be executed in parallel that can cause races and most likely some issues with lock files creation/removing like above. I guess that you can easily check this.

As far as I understand both these issues are just for the development mode when we are trying to automatically recover from errors.

Actions #9

Updated by Vladimir Gratinskiy almost 7 years ago

  • Status changed from Open to Resolved

Fixed in "fix_8015".

Actions #10

Updated by Evgeny Novikov almost 7 years ago

  • Status changed from Resolved to Closed

I hope that this will help at least in case when one encounters really time consuming requests but one has to know such the limitations. I merged the branch to master in 3b60303.

Actions

Also available in: Atom PDF