akanda/specs/liberty/rug_ha.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode


Title of your blueprint
=======================

Rug HA and scaleout

Problem Description
===================

The RUG is a multi-process, multi-worker service but it be cannot be
scaled out to multiple nodes for purposes of high-availability and
distributed handling of load.  The only currently option for a
highly-available is to do an active/passive cluster using Pacemaker
or similar, which is less than ideal and does not address scale-out
concerns.

Proposed Change
===============

This proposes allowing multiple RUG processes to be spawned across
many nodes.  Each RUG process is responsible for a fraction of the
total running appliances.  RUG_process->appliance(s) mapping will be
managed by a consistent hash ring.  An external coordination service
(ie, zookeeper) will be leveraged to provide cluster membership
capabilities, and python-tooz will be used to manage cluster events.
When new members join or depart, the hash ring will be rebalanced and
appliances re-distributed across the RUG.

This allows operators to scale out to many RUG instances, eliminating
the single-point-of-failure and allowing appliances to be evenly
distributed across multiple worker processes.


Data Model Impact
-----------------

n/a

REST API Impact
---------------

n/a

Security Impact
---------------

None

Notifications Impact
--------------------


Other End User Impact
---------------------

n/a

Performance Impact
------------------

There will be some new overhead introduced the messaging layer as Neutron
notifications and RPCs will need to be distributed to per-RUG message queues.

Other Deployer Impact
---------------------

Deployers will need to evaluate and choose an appropriate backend to be used
by tooz for leader election.  memcached is a simple yet non-robust solution,
while zookeeper is a less light-weight but proven one.  More info at [2]

Developer Impact
----------------

n/a

Community Impact
----------------

n/a


Alternatives
------------

One alternative to having each RUG instance declare its own messaging queue and
inspect all incoming messages would be to have the DHT master also serve as a
notification master. That is, the leader would be the only instance of the RUG
listening to and processing incoming Neutron notificatons, and then
re-distributing them to specific RUG workers based on the state of the DHT.

Another option would be to do away with the use of Neutron notifications
entirely and hard-wire the akanda-neutron plugin to the RUG via a dedicated
message queue.


Implementation
==============

This proposes enabling operators to run multiple instances of the RUG.
Each instance of the RUG will be responsible for a subset of the managed
appliances.  A distributed, consistent hash ring will be used to map appliances
to their respective RUG instance. The Ironic project is already doing
something similar and has a hashring implementation we can likely leverage
to get started [1]

The RUG cluster is essentially leaderless.  The hash ring is constructed
using the active node list and each indvidual RUG instance is capable of
constructing a ring given a list of members.  This ring is consistent
across nodes provided the coordination service is properly reporting membership
events and they are processed correctly.  Using metadata attached to incoming
events (ie, tenant_id), a consumer is able to check the hash ring to determine
which node in the ring the event is mapped to.

The RUG will spawn a new subprocess called the coordinator.  It's only purpose
is to listen for cluster membership events using python-tooz.  When a member
joins or departs, the coordinator will create a new Event of type REBALANCE
and put it onto the notifications queue.  This event's body will contain an
updated list of current cluster nodes.

Each RUG worker process will maintain a copy of the hash ring, which is
shared by its worker threads.  When it receives a REBALANCE event, it will
rebalance the hash ring given the new membership list.  When it receives
normal CRUD events for resources, it will first check the hash ring to see
if it is mapped to its host based on target tenant_id for the event. If it is,
the event will be processed. If it is not, the event will be ignored and
serviced by another worker.

Ideally, REBALANCE events should be serviced before CRUD events.

Assignee(s)
-----------


Work Items
----------

* Implement a distributed hash ring for managing worker:appliance
assignment

* Add new coordination sub-process to the RUG that publishes REBALANCE
events to the notifications queue when membership changes

* Setup per-RUG message queues such that notifications are distributed to all
RUG processes equally.

* Update worker to manage its own copy of the hash ring

* Update worker /w ability to respond to new REBALANCE events by rebalancing
the ring with an updated membership list

* Update worker to drop events for resources that are not mapped to its host in
the hash ring.

Dependencies
============

Testing
=======

Tempest Tests
-------------


Functional Tests
----------------

If we cannot sufficiently test this using unit tests, we could potentially
spin up our devstack job with multiple copies of the akanda-rug-service
running on a single host, and having multiple router appliances.  This
would allow us to test ring rebalancing by killing off one of the multiple
akanda-rug-service processes.

API Tests
---------


Documentation Impact
====================

User Documentation
------------------

Deployment docs need to be updated to mention this feature is dependent
on an external coordination service.

Developer Documentation
-----------------------


References
==========

[1] https://git.openstack.org/cgit/openstack/ironic/tree/ironic/common/hash_ring.py
[2] http://docs.openstack.org/developer/tooz/drivers.html