valence-specs/specs/pike/approved/multiple-podmanager-scheduler.rst
Lin Yang 59ab26026e Add scheduler to support multiple PODM
Change-Id: I37c48a80b1ca2de12a60b673d99b684129b1d3fe
2017-06-16 15:42:27 -07:00

5.9 KiB

http://creativecommons.org/licenses/by/3.0/legalcode

Multiple PodManager Scheduler

This proposal describes adding new scheduler service into valence to determine how to dispatch compose operation to the appropriate Pod manager.

https://blueprints.launchpad.net/openstack-valence/+spec/valence-multipodm-scheduler

Problem description

Valence will support multiple Pod managers on the backend instead of one single instance to improve its scalability. It requires valence to provide a scheduling service to determine how to dispatch each compose operations on the appropriate Pod manager. The scheduler should filter out the inappropriate Pod Manager without requested hardware resource and rank the priority for the remaining Pod manager with different algorithms. For different scheduling goal, it should allow admin to plugin new algorithms.

Proposed change

The valence scheduler runs as a separate process alongside the other valence services such as the API server. Its interface to the API server is accepting the request proprieties of each compose operation, and it does a posts to controller to indicate where the composition should be scheduled.

The scheduler is divided into two layers from high level: - Scheduler framework: The main() entry that does service initialization and calls the scheduler algorithm. - Scheduling algorithm: The scheduling algorithm that assigns target Pod manager for each compose operation.

The Scheduler tries to find a PODM for each compose operation, one at a time. - First it applies a set of "filter functions" to filter out inappropriate nodes. If the compose operation specifies resource requests, then the scheduler will filter out PODM that don't have at least that much resources available. - Second, it applies a set of "priority functions" that rank the PODM that weren't filtered out in the first step. The "priority functions" may vary for different scenarios. For example, it tries to spread all composed node across all PODM. - Finally, the PODM with the highest priority is chosen. If there are multiple such PODM, then one of them is chosen at random.

For given compose operations:

+---------------------------------------------+
|               Schedulable PODM:             |
|                                             |
| +--------+    +--------+      +--------+    |
| | PODM 1 |    | PODM 2 |      | PODM 3 |    |
| +--------+    +--------+      +--------+    |
|                                             |
+-------------------+-------------------------+
                    |
                    |
                    v
+-------------------+-------------------------+
 Filters function: PODM 3 doesn't have enough
                   resource
+-------------------+-------------------------+
                    |
                    |
                    v
+-------------------+-------------------------+
|             remaining PODM:                 |
|   +--------+                 +--------+     |
|   | PODM 1 |                 | PODM 2 |     |
|   +--------+                 +--------+     |
|                                             |
+-------------------+-------------------------+
                    |
                    |
                    v
+-------------------+-------------------------+
 Priority function: PODM 1: p=5
                    PODM 2: p=3
+-------------------+-------------------------+
                    |
                    |
                    v
       select max{PODM priority} = PODM 1

Both filters function and Priority function should be configurable to allow admin to choose proper algorithm for different scenarios, like disable all algorithms and let scheduler randomly choose one.

Alternatives

Make scheduler as a valence module instead of standalone service. This solution will be more simple but tight couple with other services, which will bring more overhead if scheduler service need to be upgraded or restarted.

Data model impact

None

REST API impact

Be default, scheduler will determine the target POD manager for each compose operation. However, valence should also allow user to specify the target POD manager. So a new parameter is needed for node composition request.

` /v1/nodes/: POST : add a new param to let user specify a POD manager for compose operation.`

Driver API impact

None

Security impact

None

Other end user impact

User can specify the target POD manager for compose operation if needed.

Scalability impact

The valence scalability will be significantly improved by supporting dispatch compose operations on multiple POD manager.

Performance Impact

The scheduler will bring more complexity and overhead, which might add latency into valence response one compose operation. Given the compose operations on the data center will not be so frequently as launch VM/continer, so the scheduler will not be the performance bottleneck in the current stage.

Other deployer impact

The admin should deploy and start scheduler process alongside other valence services.

Developer impact

None

Valence GUI / Horizon impact

None

Implementation

Assignee(s)

Primary assignee:

Lin Yang

Work Items

  • Implement the framework of scheduler service.
  • Implement the default algorithms for both filter and priority steps.
  • Add unit tests.

Dependencies

None

Testing

  • Add unit tests for service framework and scheduling algorithms.

Documentation Impact

None

References

None