..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

http://creativecommons.org/licenses/by/3.0/legalcode

=============================
Multiple PodManager Scheduler
=============================

This proposal describes adding new scheduler service into valence to determine
how to dispatch compose operation to the appropriate Pod manager.

https://blueprints.launchpad.net/openstack-valence/+spec/valence-multipodm-scheduler

Problem description
===================

Valence will support multiple Pod managers on the backend instead of one single
instance to improve its scalability. It requires valence to provide a scheduling
service to determine how to dispatch each compose operations on the appropriate
Pod manager. The scheduler should filter out the inappropriate Pod Manager
without requested hardware resource and rank the priority for the remaining Pod
manager with different algorithms. For different scheduling goal, it should
allow admin to plugin new algorithms.

Proposed change
===============

The valence scheduler runs as a separate process alongside the other valence
services such as the API server. Its interface to the API server is accepting
the request proprieties of each compose operation, and it does a posts to
controller to indicate where the composition should be scheduled.

The scheduler is divided into two layers from high level:
- Scheduler framework:
The main() entry that does service initialization and calls the scheduler
algorithm.
- Scheduling algorithm:
The scheduling algorithm that assigns target Pod manager for each compose
operation.

The Scheduler tries to find a PODM for each compose operation, one at a time.
- First it applies a set of "filter functions" to filter out inappropriate
nodes. If the compose operation specifies resource requests, then the scheduler
will filter out PODM that don't have at least that much resources available.
- Second, it applies a set of "priority functions" that rank the PODM that
weren't filtered out in the first step. The "priority functions" may vary for
different scenarios. For example, it tries to spread all composed node across
all PODM.
- Finally, the PODM with the highest priority is chosen. If there are multiple
such PODM, then one of them is chosen at random.

For given compose operations::

    +---------------------------------------------+
    |               Schedulable PODM:             |
    |                                             |
    | +--------+    +--------+      +--------+    |
    | | PODM 1 |    | PODM 2 |      | PODM 3 |    |
    | +--------+    +--------+      +--------+    |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+
     Filters function: PODM 3 doesn't have enough
                       resource
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+
    |             remaining PODM:                 |
    |   +--------+                 +--------+     |
    |   | PODM 1 |                 | PODM 2 |     |
    |   +--------+                 +--------+     |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+
     Priority function: PODM 1: p=5
                        PODM 2: p=3
    +-------------------+-------------------------+
                        |
                        |
                        v
           select max{PODM priority} = PODM 1

Both filters function and Priority function should be configurable to allow
admin to choose proper algorithm for different scenarios, like disable all
algorithms and let scheduler randomly choose one.

Alternatives
------------

Make scheduler as a valence module instead of standalone service. This solution
will be more simple but tight couple with other services, which will bring more
overhead if scheduler service need to be upgraded or restarted.

Data model impact
-----------------
None

REST API impact
---------------
Be default, scheduler will determine the target POD manager for each compose
operation. However, valence should also allow user to specify the target POD
manager. So a new parameter is needed for node composition request.

```
/v1/nodes/:
POST : add a new param to let user specify a POD manager for compose operation.
```

Driver API impact
-----------------
None

Security impact
---------------
None

Other end user impact
---------------------
User can specify the target POD manager for compose operation if needed.

Scalability impact
------------------
The valence scalability will be significantly improved by supporting dispatch
compose operations on multiple POD manager.

Performance Impact
------------------
The scheduler will bring more complexity and overhead, which might add
latency into valence response one compose operation. Given the compose
operations on the data center will not be so frequently as launch VM/continer,
so the scheduler will not be the performance bottleneck in the current stage.

Other deployer impact
---------------------
The admin should deploy and start scheduler process alongside other valence
services.

Developer impact
----------------
None

Valence GUI / Horizon impact
----------------------------
None

Implementation
==============
Assignee(s)
-----------
Primary assignee:
  Lin Yang

Work Items
----------
* Implement the framework of scheduler service.
* Implement the default algorithms for both filter and priority steps.
* Add unit tests.

Dependencies
============
None

Testing
=======
* Add unit tests for service framework and scheduling algorithms.

Documentation Impact
====================
None

References
==========
None