Feature #8170
closedSet maximum restrictions for verification tasks at Bridge web interface
Added by Ilja Zakharov over 7 years ago. Updated over 7 years ago.
100%
Description
To implement more advanced scheduling, Native scheduler requires information about resource restrictions for both jobs and tasks. Moreover, it is more convenient to set the limits through web-interface. Thus, a user starting job in addition to restrictions for Klever Core should be able to set same kinds of restrictions for a verifier (Memory, the number of cores, disk memory, CPU model, CPU time, wall time). The limitations should be provided further to Native scheduler within the configuration. Native scheduler in its turn generates core.json and add restrictions to the file. Core component proceeds as it does.
Summarizing there are three subtasks:
1) Add fields to the page for starting a job and provide limitations to Native scheduler. (Vladimir)
2) Implement Native scheduler forwarding the data to Core. (Ilja)
3) Adjust Core to extract the limits from core.json instead of job.json. (Ilja)
Updated by Evgeny Novikov over 7 years ago
Ilja Zakharov wrote:
Summarizing there are three subtasks:
1) Add fields to the page for starting a job and provide limitations to Native scheduler. (Vladimir)
To match perfectly the logic that values specified when starting a job decision don't affect verification results, values for these fields (resources per one verification task decision) should be specified at the job page using job editing. Initial values for these fields can come from settings.json specified for all preset verification job. Also they should be inherited when copying verification jobs.
2) Implement Native scheduler forwarding the data to Core. (Ilja)
Indeed Bridge can easily write these values into job.json when Core will request a verification job archive.
3) Adjust Core to extract the limits from core.json instead of job.json. (Ilja)
This doesn't match the old logic. Limits should be specified/read within/from job.json.
Updated by Alexey Khoroshilov over 7 years ago
I guess different schedulers have different sets of supported restrictions.
Should Bridge support just a hardcoded set for one scheduler?
Or scheduler can communicates to Bridge which restrictions are interesting to it?
Updated by Vladimir Gratinskiy over 7 years ago
Evgeny Novikov wrote:
Indeed Bridge can easily write these values into job.json when Core will request a verification job archive.
Bridge doesn't rely on what files do job has. So I can't specify limits there. So the "start job" page is the only place where I can add limits for tasks.
Alexey Khoroshilov wrote:
I guess different schedulers have different sets of supported restrictions.
Should Bridge support just a hardcoded set for one scheduler?
Bridge doesn't know how many schedulers will be deciding tasks. When a user starts job decision he can choose only one type of schedulers that can get job's tasks for decision. So tasks resource limits will be just one "hardcoded set".
Updated by Ilja Zakharov over 7 years ago
Bridge doesn't rely on what files do job has. So I can't specify limits there. So the "start job" page is the only place where I can add limits for tasks.
Yes, it is better to set limits at "start job" page and then provide them to the scheduler. Native scheduler can edit job.json and provide necessary restrictions to Core.
Updated by Evgeny Novikov over 7 years ago
These restrictions should be specified at the job page since they affect verification results.
Perhaps this isn't an ideal way but let's introduce a special file within a job for accessing by both schedulers and Core. I suppose to name it tasks.json. So, users will specify resource limits per one verification task there (later there can be added some more options to be shared). Bridge will send its content to schedulers in addition to other data it already sends. Also this file will be available for Core as a part of a job archive. The only question is who should check this file content. I suppose to do this in schedulers and Core since just they now semantics. Bridge can forbid job decision if this file doesn't exist (also it can send an empty verification task options causing scheduler failures).
Updated by Ilja Zakharov over 7 years ago
Ok, lets make the file required at starting job solution. Its content will be provided to scheduler but Scheduler will not provide it to Core somehow.
Updated by Vladimir Gratinskiy over 7 years ago
- Due date set to 05/04/2017
- Status changed from New to Resolved
- % Done changed from 0 to 100
Evgeny Novikov wrote:
These restrictions should be specified at the job page since they affect verification results.
Perhaps this isn't an ideal way but let's introduce a special file within a job for accessing by both schedulers and Core. I suppose to name it tasks.json. So, users will specify resource limits per one verification task there (later there can be added some more options to be shared). Bridge will send its content to schedulers in addition to other data it already sends. Also this file will be available for Core as a part of a job archive. The only question is who should check this file content. I suppose to do this in schedulers and Core since just they now semantics. Bridge can forbid job decision if this file doesn't exist (also it can send an empty verification task options causing scheduler failures).
Bridge doesn't forbid job decision if it doesn't have "job.json" (I guess all jobs without this and some other files will be failed). So I don't see a reason to check if "tasks.json" exists at starting job stage. And the moment when pending job become processing (when scheduler get jobs' configurations) is not a good place for it. So I'll just add tasks limits to job start configuration when scheduler ask if it exist and ignore if it doesn't. Let scheduler and core check existance themselves (anyway it should be checked there as Bridge is not ideal).
Implemented in feature_8170. So when tasks.json does not exists or it is wrong json then for scheduler empty dictionary will be attached (new key "task resource limits" in job configuration). For example if tasks.json contains {"cpu time": 1000} then:configuration["task resource limits"] = {"cpu time": 1000}
.
If file doesn't exists:configuration["task resource limits"] = {}
Updated by Evgeny Novikov over 7 years ago
Vladimir Gratinskiy wrote:
Evgeny Novikov wrote:
These restrictions should be specified at the job page since they affect verification results.
Perhaps this isn't an ideal way but let's introduce a special file within a job for accessing by both schedulers and Core. I suppose to name it tasks.json. So, users will specify resource limits per one verification task there (later there can be added some more options to be shared). Bridge will send its content to schedulers in addition to other data it already sends. Also this file will be available for Core as a part of a job archive. The only question is who should check this file content. I suppose to do this in schedulers and Core since just they now semantics. Bridge can forbid job decision if this file doesn't exist (also it can send an empty verification task options causing scheduler failures).
Bridge doesn't forbid job decision if it doesn't have "job.json" (I guess all jobs without this and some other files will be failed). So I don't see a reason to check if "tasks.json" exists at starting job stage.
There is a considerable difference actually since Bridge doesn't read job.json while it tries to read task.json. But nevertheless I agree with the suggested solution.
Updated by Evgeny Novikov over 7 years ago
I suppose to merge this branch when everything related in schedulers, Core and preset jobs will be implemented.
Updated by Ilja Zakharov over 7 years ago
- Status changed from Resolved to Open
- Priority changed from Urgent to Immediate
After I added the file to different two jobs, Brdige stopped working. JSON exchanging step fails with exceptions:
Traceback (most recent call last):
File "/work/zakharov/src/klever/bridge/service/views.py", line 150, in get_jobs_and_tasks
jobs_and_tasks = GetTasks(request.session['scheduler'], request.POST['jobs and tasks status']).newtasks
File "/work/zakharov/src/klever/bridge/service/utils.py", line 282, in init
self.__get_tasks(tasks)
File "/work/zakharov/src/klever/bridge/service/utils.py", line 316, in _get_tasks
self._get_tasks_limits(progress.job_id)
File "/work/zakharov/src/klever/bridge/service/utils.py", line 456, in _get_tasks_limits
tasks = FileSystem.objects.get(job_job_id=job_id, name='tasks.json', parent=None)
File "/home/zakharov/.local/lib/python3.5/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/zakharov/.local/lib/python3.5/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, num)
jobs.models.MultipleObjectsReturned: get() returned more than one FileSystem -- it returned 2!
get() returned more than one FileSystem -- it returned 2!
After that Scheduler gets error "You are not signing in". Moreover, deletion of the second file from the web-interface does not help and restart of any services does not help. Seems that my database is corrupted now.
Updated by Vladimir Gratinskiy over 7 years ago
- Status changed from Open to Resolved
I don't know why this has happened. The branch feature_8170 doesn't affect database, just read it. Something wrong happened while saving jobs with new file "tasks.json" - Bridge created two similar files. I've tried it myself and couldn't repeat the bug. If you meet it again then create another ticket with explanation what you did, what Bridge said and what is in logs.
Updated by Ilja Zakharov over 7 years ago
- Status changed from Resolved to Open
Yes, I have repeated the problem and broke my database once more, it completely blocks my development.
The problem is connected with the branch, since the stack trace refers to the following line:
tasks = FileSystem.objects.get(job__job_id=job_id, name='tasks.json', parent=None)
Moreover, I have never met it before.
The crash happens immediately after starting a solution of a job. And on base of my first try seems it occured only after I had added tasks.json to the second job. When database contained only one job with tasks.json all worked like a charm, but immediately when I added the file to the second job and tried to solve it - all broken.
Updated by Vladimir Gratinskiy over 7 years ago
- Status changed from Open to Resolved
Sorry, I've understood what was wrong. The bug was fixed.
Updated by Ilja Zakharov over 7 years ago
- Status changed from Resolved to Closed
Merged in caf859214.