Project

General

Profile

Actions

Bug #8037

closed

Bridge should check better for error sizes provided by schedulers

Added by Evgeny Novikov about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
High
Category:
Bridge
Target version:
-
Start date:
03/17/2017
Due date:
04/18/2017
% Done:

100%

Estimated time:
Detected in build:
svn
Platform:
Published in build:

Description

Unknown error
Traceback (most recent call last):
  File "/usr/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
psycopg2.DataError: ОШИБКА:  значение не умещается в тип character varying(1024)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/novikov/work/klever/bridge/service/utils.py", line 241, in __init__
    self.__get_tasks(tasks)
  File "/home/novikov/work/klever/bridge/service/utils.py", line 283, in __get_tasks
    task.save()
  File "/usr/lib/python3.4/site-packages/django/db/models/base.py", line 708, in save
    force_update=force_update, update_fields=update_fields)
  File "/usr/lib/python3.4/site-packages/django/db/models/base.py", line 736, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/usr/lib/python3.4/site-packages/django/db/models/base.py", line 801, in _save_table
    forced_update)
  File "/usr/lib/python3.4/site-packages/django/db/models/base.py", line 851, in _do_update
    return filtered._update(values) > 0
  File "/usr/lib/python3.4/site-packages/django/db/models/query.py", line 645, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
  File "/usr/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1149, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "/usr/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 848, in execute_sql
    cursor.execute(sql, params)
  File "/usr/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/home/novikov/work/klever/bridge/bridge/__init__.py", line 39, in execute_wrapper
    return original(*args, **kwargs)
  File "/usr/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/usr/lib/python3.4/site-packages/django/db/utils.py", line 95, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/usr/lib/python3.4/site-packages/django/utils/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
django.db.utils.DataError: ОШИБКА:  значение не умещается в тип character varying(1024)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/novikov/work/klever/bridge/service/views.py", line 148, in get_jobs_and_tasks
    jobs_and_tasks = GetTasks(request.session['scheduler'], request.POST['jobs and tasks status']).newtasks
  File "/home/novikov/work/klever/bridge/service/utils.py", line 245, in __init__
    raise ServiceError('Unknown error')
service.utils.ServiceError: Unknown error

Related issues 1 (0 open1 closed)

Blocked by Klever - Bug #8035: Bridge infinitely tries to decide a job if scheduler reports that it is finished but there was not any requests from CoreClosedVladimir Gratinskiy03/16/201704/13/2017

Actions
Actions #1

Updated by Vladimir Gratinskiy about 7 years ago

  • Due date set to 04/13/2017
  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Fixed in "some_fixes".

Actions #2

Updated by Evgeny Novikov about 7 years ago

  • Status changed from Resolved to Open

Likely I got this exception after your fixes:

'job errors'
Traceback (most recent call last):
  File "/home/novikov/work/klever/bridge/service/views.py", line 148, in get_jobs_and_tasks
    jobs_and_tasks = GetTasks(request.session['scheduler'], request.POST['jobs and tasks status']).newtasks
  File "/home/novikov/work/klever/bridge/service/utils.py", line 278, in __init__
    self.__get_tasks(tasks)
  File "/home/novikov/work/klever/bridge/service/utils.py", line 288, in __get_tasks
    for j_id in data['job errors']:
KeyError: 'job errors'

Also, please, do not put fixes of various quite unrelated issues within one branch. Now I can't merge the branch due to the exception specified although other issues can be properly fixed in it.

Actions #3

Updated by Vladimir Gratinskiy about 7 years ago

This exception revealed problems with scheduler. Keys 'job errors' and 'task errors' should always be in json data provided by scheduler as bridge accept only this format.

For the case when there are no, for example, 'finished' tasks the scheduler provides empty list data['tasks']['finished'], and bridge doesn't check if there is key 'finished' in data['tasks']. So for 'job errors' and 'task errors' the scheduler behavior should be the same.

Another example is when there is one 'error' task without error description. So now the scheduler would provide data without 'task errors' and the exception would happen instead of finishing the task with error "The scheduler hasn't given error description".

This exception is successfully caught by the bridge and no results are corrupted because of it. Better I should check json data before doing something with tasks or jobs and I will fix it soon (if format is wrong the request will return error "wrong format" or smth). But the problem with scheduler should be fixed also.

Actions #4

Updated by Vladimir Gratinskiy about 7 years ago

  • Due date changed from 04/13/2017 to 04/18/2017
  • Status changed from Open to Resolved

Fixed so no changes need to be done with scheduler.

Actions #5

Updated by Evgeny Novikov about 7 years ago

  • Status changed from Resolved to Open

Does Bridge change status of such jobs? It looks like it remains them processing and sends to schedulers again and just schedulers "finish" them (but due to bugs in schedulers this loop can be unterminated).

Actions #6

Updated by Vladimir Gratinskiy about 7 years ago

Evgeny Novikov wrote:

Does Bridge change status of such jobs? It looks like it remains them processing and sends to schedulers again and just schedulers "finish" them (but due to bugs in schedulers this loop can be unterminated).

If scheduler says the job is finished (with error or success), the bridge would finish it (if nothing unexpected happened during finishing, but you'll get error messages in log). There was bug with pending->finished jobs, but it was fixed (#8035).

Actions #7

Updated by Evgeny Novikov about 7 years ago

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Does Bridge change status of such jobs? It looks like it remains them processing and sends to schedulers again and just schedulers "finish" them (but due to bugs in schedulers this loop can be unterminated).

If scheduler says the job is finished (with error or success), the bridge would finish it (if nothing unexpected happened during finishing, but you'll get error messages in log). There was bug with pending->finished jobs, but it was fixed (#8035).

Hmm, this bug was fixed in the same branch but this doesn't work. When a scheduler reports an error job with extremely long error description (like in this issue) Bridge returns an error to the scheduler causing its restart (in the production mode) but it remains a job processing and sends it soon to the restarted scheduler one more time.

Actions #8

Updated by Vladimir Gratinskiy about 7 years ago

Evgeny Novikov wrote:

Vladimir Gratinskiy wrote:

Evgeny Novikov wrote:

Does Bridge change status of such jobs? It looks like it remains them processing and sends to schedulers again and just schedulers "finish" them (but due to bugs in schedulers this loop can be unterminated).

If scheduler says the job is finished (with error or success), the bridge would finish it (if nothing unexpected happened during finishing, but you'll get error messages in log). There was bug with pending->finished jobs, but it was fixed (#8035).

Hmm, this bug was fixed in the same branch but this doesn't work. When a scheduler reports an error job with extremely long error description (like in this issue) Bridge returns an error to the scheduler causing its restart (in the production mode) but it remains a job processing and sends it soon to the restarted scheduler one more time.

This is what I mean "something unexpected happened during finishing". The scheduler better should shut down and wait until user will check logs, not just restart.
Anyway I've fixed this bug so if message is large, then another message will be saved for job and its status will be "corrupted". For tasks Bridge just saves new messages and nothing else.

Actions #9

Updated by Ilja Zakharov about 7 years ago

The scheduler better should shut down and wait until the user will check logs, not just restart.

It happens only if a production mode is turned on by a corresponding flag in configuration options. If the flag is unset the scheduler terminates.

Actions #10

Updated by Evgeny Novikov about 7 years ago

  • Status changed from Open to Closed

I tested the bug fix and merged the given branch to master in a4a7e508.

Actions

Also available in: Atom PDF