Feature #817
open
Smart management of node capabilities
Added by Alexey Khoroshilov almost 14 years ago.
Updated over 13 years ago.
Description
User should be able to run a distributed task and to set TIME_LIMIT and MEMORY_LIMIT for the task.
If not all available nodes have required capabilities, nevertheless they should be used to execute the task.
Of course, if the execution failed with TIME_LIMIT or MEMORY_LIMIT, it should be restarted at more capable node.
Open question: how to measure TIME_LIMIT correctly if we have different CPUs?
TIME_LIMIT could be expressed in "seconds on a 2k MHz CPU", and be scaled proportionally.
RAM limit isn't scalable, and it's an interesting idea to dynamically re-arrange tasks to more capable nodes. There's nothing in the current architecture that prevents such scenario. However, that's still a prototype, and we may postpone this bug until we'll be designing the final version of cluster algorithms.
For example, if all "less capable" nodes are busy, and we're to route a task, we might consider holding it, instead of routing to a "more capable" node outright. I guess, we should first finish the straightforward scenario before diving into such matters.
- Priority changed from Normal to High
- Assignee set to Pavel Shved
- Priority changed from High to Normal
Also available in: Atom
PDF