From 59ab26026e8008ac15b9f98060ec33abad25cd95 Mon Sep 17 00:00:00 2001 From: Lin Yang Date: Tue, 13 Jun 2017 21:37:12 -0700 Subject: [PATCH] Add scheduler to support multiple PODM Change-Id: I37c48a80b1ca2de12a60b673d99b684129b1d3fe --- .../multiple-podmanager-scheduler.rst | 183 ++++++++++++++++++ 1 file changed, 183 insertions(+) create mode 100644 specs/pike/approved/multiple-podmanager-scheduler.rst diff --git a/specs/pike/approved/multiple-podmanager-scheduler.rst b/specs/pike/approved/multiple-podmanager-scheduler.rst new file mode 100644 index 0000000..60c3def --- /dev/null +++ b/specs/pike/approved/multiple-podmanager-scheduler.rst @@ -0,0 +1,183 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + +http://creativecommons.org/licenses/by/3.0/legalcode + +============================= +Multiple PodManager Scheduler +============================= + +This proposal describes adding new scheduler service into valence to determine +how to dispatch compose operation to the appropriate Pod manager. + +https://blueprints.launchpad.net/openstack-valence/+spec/valence-multipodm-scheduler + +Problem description +=================== + +Valence will support multiple Pod managers on the backend instead of one single +instance to improve its scalability. It requires valence to provide a scheduling +service to determine how to dispatch each compose operations on the appropriate +Pod manager. The scheduler should filter out the inappropriate Pod Manager +without requested hardware resource and rank the priority for the remaining Pod +manager with different algorithms. For different scheduling goal, it should +allow admin to plugin new algorithms. + +Proposed change +=============== + +The valence scheduler runs as a separate process alongside the other valence +services such as the API server. Its interface to the API server is accepting +the request proprieties of each compose operation, and it does a posts to +controller to indicate where the composition should be scheduled. + +The scheduler is divided into two layers from high level: +- Scheduler framework: +The main() entry that does service initialization and calls the scheduler +algorithm. +- Scheduling algorithm: +The scheduling algorithm that assigns target Pod manager for each compose +operation. + +The Scheduler tries to find a PODM for each compose operation, one at a time. +- First it applies a set of "filter functions" to filter out inappropriate +nodes. If the compose operation specifies resource requests, then the scheduler +will filter out PODM that don't have at least that much resources available. +- Second, it applies a set of "priority functions" that rank the PODM that +weren't filtered out in the first step. The "priority functions" may vary for +different scenarios. For example, it tries to spread all composed node across +all PODM. +- Finally, the PODM with the highest priority is chosen. If there are multiple +such PODM, then one of them is chosen at random. + +For given compose operations:: + + +---------------------------------------------+ + | Schedulable PODM: | + | | + | +--------+ +--------+ +--------+ | + | | PODM 1 | | PODM 2 | | PODM 3 | | + | +--------+ +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + Filters function: PODM 3 doesn't have enough + resource + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + | remaining PODM: | + | +--------+ +--------+ | + | | PODM 1 | | PODM 2 | | + | +--------+ +--------+ | + | | + +-------------------+-------------------------+ + | + | + v + +-------------------+-------------------------+ + Priority function: PODM 1: p=5 + PODM 2: p=3 + +-------------------+-------------------------+ + | + | + v + select max{PODM priority} = PODM 1 + +Both filters function and Priority function should be configurable to allow +admin to choose proper algorithm for different scenarios, like disable all +algorithms and let scheduler randomly choose one. + +Alternatives +------------ + +Make scheduler as a valence module instead of standalone service. This solution +will be more simple but tight couple with other services, which will bring more +overhead if scheduler service need to be upgraded or restarted. + +Data model impact +----------------- +None + +REST API impact +--------------- +Be default, scheduler will determine the target POD manager for each compose +operation. However, valence should also allow user to specify the target POD +manager. So a new parameter is needed for node composition request. + +``` +/v1/nodes/: +POST : add a new param to let user specify a POD manager for compose operation. +``` + +Driver API impact +----------------- +None + +Security impact +--------------- +None + +Other end user impact +--------------------- +User can specify the target POD manager for compose operation if needed. + +Scalability impact +------------------ +The valence scalability will be significantly improved by supporting dispatch +compose operations on multiple POD manager. + +Performance Impact +------------------ +The scheduler will bring more complexity and overhead, which might add +latency into valence response one compose operation. Given the compose +operations on the data center will not be so frequently as launch VM/continer, +so the scheduler will not be the performance bottleneck in the current stage. + +Other deployer impact +--------------------- +The admin should deploy and start scheduler process alongside other valence +services. + +Developer impact +---------------- +None + +Valence GUI / Horizon impact +---------------------------- +None + +Implementation +============== +Assignee(s) +----------- +Primary assignee: + Lin Yang + +Work Items +---------- +* Implement the framework of scheduler service. +* Implement the default algorithms for both filter and priority steps. +* Add unit tests. + +Dependencies +============ +None + +Testing +======= +* Add unit tests for service framework and scheduling algorithms. + +Documentation Impact +==================== +None + +References +========== +None