512 lines
16 KiB
ReStructuredText
512 lines
16 KiB
ReStructuredText
====================
|
|
Appendix L: Masakari
|
|
====================
|
|
|
|
Overview
|
|
--------
|
|
|
|
As of the 20.05 charm release, Masakari can be deployed to provide automated
|
|
instance recovery for clouds that use shared storage for its instances. The
|
|
following functionality is provided:
|
|
|
|
#. **Evacuation of instances** (supported since OpenStack Stein)
|
|
|
|
In the event of hypervisor software failure the associated compute node is
|
|
shut down and instance images are started on another hypervisor.
|
|
|
|
#. **Restarting of instances** (supported since OpenStack Ussuri)
|
|
|
|
A failed instance can be restarted on its current hypervisor.
|
|
|
|
See the `Masakari charm`_ for an overview of the charms involved.
|
|
|
|
.. note::
|
|
|
|
`MAAS`_ is required when enabling Masakari on Charmed OpenStack.
|
|
|
|
Software
|
|
--------
|
|
|
|
Install the software necessary for configuring Masakari:
|
|
|
|
.. code-block:: none
|
|
|
|
sudo snap install openstackclients
|
|
|
|
Verify that the ``segment`` sub-command is available (this is provided by the
|
|
``python-masakariclient`` plugin):
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment --help
|
|
|
|
.. important::
|
|
|
|
If the ``segment`` sub-command is not available you will need a more recent
|
|
version of the ``openstackclients`` snap. For example, you may need to use
|
|
the 'edge' channel: :command:`sudo snap refresh openstackclients
|
|
--channel=edge`.
|
|
|
|
Instance evacuation mechanics
|
|
-----------------------------
|
|
|
|
In order for an instance to be relocated to another hypervisor some form of
|
|
shared storage must be implemented. As a result, the scenario where a
|
|
hypervisor has lost network access to its peers yet continues to access that
|
|
shared storage must be considered.
|
|
|
|
The mechanics of instance evacuation is now described:
|
|
|
|
Masakari Monitors, on a hypervisor, detects that its peer is unavailable and
|
|
notifies the Masakari API server. This in turn triggers the Masakari engine to
|
|
initiate a failover of the instance via Nova. Assuming that Nova concurs that
|
|
the hypervisor is absent, it will attempt to start the instance on another
|
|
hypervisor. At this point there are two instances competing for the same disk
|
|
image, which can lead to data corruption.
|
|
|
|
The solution is to enable a STONITH Pacemaker plugin, which will power off the
|
|
compute node via the MAAS API when Pacemaker detects the hypervisor as being
|
|
offline.
|
|
|
|
.. caution::
|
|
|
|
Since nova-compute is typically deployed on bare metal, which may host
|
|
containerised applications and possibly even applications alongside
|
|
nova-compute (e.g. ceph-osd), care is advised when designing a cloud with
|
|
Masakari to avoid a powered-off compute node from disrupting crucial non-HA
|
|
cloud services.
|
|
|
|
Ensure that Masakari functionality has been fully validated in a staging
|
|
environment prior to using it in production.
|
|
|
|
Usage
|
|
-----
|
|
|
|
Configuration
|
|
~~~~~~~~~~~~~
|
|
|
|
.. warning::
|
|
|
|
Throughout this guide make sure ``openstack-origin`` matches the value you
|
|
used when `deploying OpenStack`_.
|
|
|
|
The below overlay bundle can be used to deploy Masakari when using a bundle to
|
|
deploy OpenStack.
|
|
|
|
Ensure that the ``machines`` section and the placement directives (i.e. the
|
|
``to`` option under the masakari application) can co-exist with your OpenStack
|
|
bundle.
|
|
|
|
Provide values for the ``maas_url``, ``maas_credentials``, and ``vip``
|
|
hacluster charm options . A VIP is a virtual IP needed for Masakari to enable
|
|
HA (a requirement when using the masakari charm). If multiple networks are
|
|
used, multiple (space separated) VIPs should be provided. See `OpenStack high
|
|
availability`_ for HA guidance.
|
|
|
|
Enable STONITH via the ``enable-stonith`` pacemaker-remote charm option.
|
|
|
|
Provide values for the ``binding`` (network spaces) masakari charm option
|
|
according to your local environment. For simplicity (or for testing), the same
|
|
network space can be used for all Masakari bindings.
|
|
|
|
.. code-block:: yaml
|
|
|
|
machines:
|
|
'0':
|
|
series: bionic
|
|
'1':
|
|
series: bionic
|
|
'2':
|
|
series: bionic
|
|
'3':
|
|
series: bionic
|
|
relations:
|
|
- - nova-compute:juju-info
|
|
- masakari-monitors:container
|
|
- - masakari:ha
|
|
- hacluster:ha
|
|
- - keystone:identity-credentials
|
|
- masakari-monitors:identity-credentials
|
|
- - nova-compute:juju-info
|
|
- pacemaker-remote:juju-info
|
|
- - hacluster:pacemaker-remote
|
|
- pacemaker-remote:pacemaker-remote
|
|
- - masakari:identity-service
|
|
- keystone:identity-service
|
|
- - masakari:shared-db
|
|
- mysql:shared-db
|
|
- - masakari:amqp
|
|
- rabbitmq-server:amqp
|
|
series: bionic
|
|
applications:
|
|
masakari-monitors:
|
|
charm: cs:masakari-monitors
|
|
hacluster:
|
|
charm: cs:hacluster
|
|
options:
|
|
maas_url: <INSERT MAAS URL>
|
|
maas_credentials: <INSERT MAAS API KEY>
|
|
pacemaker-remote:
|
|
charm: cs:pacemaker-remote
|
|
options:
|
|
enable-stonith: True
|
|
enable-resources: False
|
|
masakari:
|
|
charm: cs:masakari
|
|
series: bionic
|
|
num_units: 3
|
|
options:
|
|
openstack-origin: cloud:bionic-stein
|
|
vip: <INSERT VIP(S)>
|
|
bindings:
|
|
public: public
|
|
admin: admin
|
|
internal: internal
|
|
shared-db: internal
|
|
amqp: internal
|
|
to:
|
|
- 'lxd:1'
|
|
- 'lxd:2'
|
|
- 'lxd:3'
|
|
|
|
Deployment
|
|
~~~~~~~~~~
|
|
|
|
To deploy Masakari during the deployment of a new cloud (e.g. via the
|
|
`openstack-base`_ bundle):
|
|
|
|
.. code-block:: none
|
|
|
|
juju deploy ./bundle.yaml --overlay masakari-overlay.yaml
|
|
|
|
To add Masakari to an existing deployment (i.e. the Juju model has pre-existing
|
|
machines) the ``--map-machines`` option should be used.
|
|
|
|
The cloud should then be configured for usage. See `Configure OpenStack`_ for
|
|
assistance.
|
|
|
|
For the purposes of this document the below hypervisors are presumed:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+---------+-------+
|
|
| Host | Status | State |
|
|
+-------------------+---------+-------+
|
|
| virt-node-01.maas | enabled | up |
|
|
| virt-node-10.maas | enabled | up |
|
|
| virt-node-02.maas | enabled | up |
|
|
+-------------------+---------+-------+
|
|
|
|
In addition let us assume that instance 'bionic-1' now resides on host
|
|
'virt-node-02.maas':
|
|
|
|
.. code-block:: console
|
|
|
|
+----------------------+-------------------+
|
|
| Field | Value |
|
|
+----------------------+-------------------+
|
|
| OS-EXT-SRV-ATTR:host | virt-node-02.maas |
|
|
+----------------------+-------------------+
|
|
|
|
The above information was obtained by the following two commands,
|
|
respectively:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack compute service list -c Host -c Status -c State --service nova-compute
|
|
openstack server show bionic-1 -c OS-EXT-SRV-ATTR:host
|
|
|
|
Instance evacuation recovery methods
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
With Masakari, compute nodes are grouped into failover segments. In the event
|
|
of a compute node failure, that node's instances are moved onto another compute
|
|
node within the same segment.
|
|
|
|
The destination node is determined by the recovery method configured for the
|
|
affected segment. There are four methods:
|
|
|
|
* ``reserved_host``
|
|
* ``auto``
|
|
* ``rh_priority``
|
|
* ``auto_priority``
|
|
|
|
A compute node failure can be simulated by bringing down its primary network
|
|
interface. For example, to bring down a node that corresponds to unit
|
|
``nova-compute/2``:
|
|
|
|
.. code-block:: none
|
|
|
|
juju run --unit nova-compute/2 sudo ip link set br-ens3 down
|
|
|
|
'reserved_host'
|
|
^^^^^^^^^^^^^^^
|
|
|
|
The ``reserved_host`` recovery method relocates instances to a subset of
|
|
non-active nodes. Because these nodes are not active and are typically
|
|
resourced adequately for failover duty, there is a guarantee that sufficient
|
|
resources will exist on a reserved node to accommodate migrated instances.
|
|
|
|
For example, to create segment 'S1', configure it to use the ``reserved_host``
|
|
method, and assign it three compute nodes, with one being tagged as a reserved
|
|
node:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment create S1 reserved_host COMPUTE
|
|
openstack segment host create virt-node-10.maas COMPUTE SSH S1
|
|
openstack segment host create virt-node-02.maas COMPUTE SSH S1
|
|
openstack segment host create --reserved True virt-node-01.maas COMPUTE SSH S1
|
|
|
|
View the details of a segment:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment list
|
|
|
|
Sample output:
|
|
|
|
.. code-block:: console
|
|
|
|
+--------------------------------------+------+-------------+--------------+-----------------+
|
|
| uuid | name | description | service_type | recovery_method |
|
|
+--------------------------------------+------+-------------+--------------+-----------------+
|
|
| 3af6dfe7-1619-486f-a2c6-8453488c6a66 | S2 | None | COMPUTE | auto |
|
|
+--------------------------------------+------+-------------+--------------+-----------------+
|
|
|
|
A segment's hosts can be listed like this:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment host list -c name -c reserved -c on_maintenance S2
|
|
|
|
The output should show a value of 'True' in the 'reserved' column for the
|
|
appropriate node:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+----------------+
|
|
| name | reserved | on_maintenance |
|
|
+-------------------+----------+----------------+
|
|
| virt-node-01.maas | True | False |
|
|
| virt-node-10.maas | False | False |
|
|
| virt-node-02.maas | False | False |
|
|
+-------------------+----------+----------------+
|
|
|
|
Finally, disable the reserved node in Nova so that it becomes non-active, and
|
|
thus available for failover:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack compute service set --disable virt-node-01.maas nova-compute
|
|
|
|
The cloud's compute node list should show a status of 'disabled' for the
|
|
appropriate node:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+-------+
|
|
| Host | Status | State |
|
|
+-------------------+----------+-------+
|
|
| virt-node-01.maas | disabled | up |
|
|
| virt-node-10.maas | enabled | up |
|
|
| virt-node-02.maas | enabled | up |
|
|
+-------------------+----------+-------+
|
|
|
|
When a compute node failure is detected, Masakari will, in Nova, disable the
|
|
failed node and enable a reserved node. The state of the node should also show
|
|
as 'down'.
|
|
|
|
Presuming that node 'virt-node-02.maas' has failed the cloud's compute node
|
|
list should become:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+-------+
|
|
| Host | Status | State |
|
|
+-------------------+----------+-------+
|
|
| virt-node-01.maas | enabled | up |
|
|
| virt-node-10.maas | enabled | up |
|
|
| virt-node-02.maas | disabled | down |
|
|
+-------------------+----------+-------+
|
|
|
|
The reserved node will begin hosting evacuated instances and Masakari will
|
|
remove the reserved flag from it. It will also place the failed node in
|
|
maintenance mode.
|
|
|
|
The segment's host list should show:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+----------------+
|
|
| name | reserved | on_maintenance |
|
|
+-------------------+----------+----------------+
|
|
| virt-node-01.maas | False | False |
|
|
| virt-node-10.maas | False | False |
|
|
| virt-node-02.maas | False | True |
|
|
+-------------------+----------+----------------+
|
|
|
|
The expectation is that instance 'bionic-1' has been moved from
|
|
'virt-node-02.maas' to the reserved node, host 'virt-node-01.maas':
|
|
|
|
.. code-block:: console
|
|
|
|
+----------------------+-------------------+
|
|
| Field | Value |
|
|
+----------------------+-------------------+
|
|
| OS-EXT-SRV-ATTR:host | virt-node-01.maas |
|
|
+----------------------+-------------------+
|
|
|
|
'auto'
|
|
^^^^^^
|
|
|
|
The ``auto`` recovery method relocates instances to any available node in the
|
|
same segment. Because all the nodes are active, contrarily to the
|
|
``reserved_host`` method, there is no guarantee that sufficient resources will
|
|
exist on the destination node to accommodate migrated instances.
|
|
|
|
For example, to create segment 'S2', configure it to use the ``auto`` method,
|
|
and assign it three compute nodes:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment create S2 auto COMPUTE
|
|
openstack segment host create virt-node-01.maas COMPUTE SSH S2
|
|
openstack segment host create virt-node-02.maas COMPUTE SSH S2
|
|
openstack segment host create virt-node-10.maas COMPUTE SSH S2
|
|
|
|
In contrast to the ``reserved_host`` method all the nodes show as active (i.e.
|
|
none are reserved):
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+----------------+
|
|
| name | reserved | on_maintenance |
|
|
+-------------------+----------+----------------+
|
|
| virt-node-10.maas | False | False |
|
|
| virt-node-02.maas | False | False |
|
|
| virt-node-01.maas | False | False |
|
|
+-------------------+----------+----------------+
|
|
|
|
Continuing with the above observation, upon node failure, there are no
|
|
hypervisors for Masakari to enable in Nova. A failed node will however be put
|
|
``on_maintenance`` in Masakari:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------+----------+----------------+
|
|
| name | reserved | on_maintenance |
|
|
+-------------------+----------+----------------+
|
|
| virt-node-10.maas | False | False |
|
|
| virt-node-02.maas | False | False |
|
|
| virt-node-01.maas | False | True |
|
|
+-------------------+----------+----------------+
|
|
|
|
'rh_priority' and 'auto_priority'
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The below recovery methods utilise one of the previously described methods but
|
|
use the other as a failover.
|
|
|
|
* ``rh_priority``
|
|
|
|
Attempts to evacuate instances using the ``reserved_host`` method. If the
|
|
latter is unsuccessful the ``auto`` method will be used.
|
|
|
|
* ``auto_priority``
|
|
|
|
Attempts to evacuate instances using the ``auto`` method. If the latter is
|
|
unsuccessful the ``reserved_host`` method will be used.
|
|
|
|
Instance restart
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The enabling of the instance restart feature is done on a per-instance basis.
|
|
|
|
For example, tag instance 'bionic-1' as HA-enabled in order to have it
|
|
restarted automatically on its hypervisor:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack server set --property HA_Enabled=True bionic-1
|
|
|
|
.. important::
|
|
|
|
Perhaps non-intuitively, if the instance evacuation feature is not desired a
|
|
hypervisor must nonetheless be assigned a failover segment in order for the
|
|
restart feature to be available to its instances.
|
|
|
|
An instance failure can be simulated by killing its process. First determine
|
|
its hypervisor and ``qemu`` guest name:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack server show bionic-1 -c OS-EXT-SRV-ATTR:host -c OS-EXT-SRV-ATTR:instance_name
|
|
|
|
Output:
|
|
|
|
.. code-block:: console
|
|
|
|
+-------------------------------+-------------------+
|
|
| Field | Value |
|
|
+-------------------------------+-------------------+
|
|
| OS-EXT-SRV-ATTR:host | virt-node-02.maas |
|
|
| OS-EXT-SRV-ATTR:instance_name | instance-00000001 |
|
|
+-------------------------------+-------------------+
|
|
|
|
If you do not have admin rights in the cloud the above fields may not be
|
|
visible.
|
|
|
|
This hypervisor corresponds to unit ``nova-compute/2`` in this example cloud.
|
|
|
|
Check the current PID, kill the process, wait a minute, and verify that a new
|
|
process gets started:
|
|
|
|
.. code-block:: none
|
|
|
|
juju run --unit nova-compute/2 'pgrep -f guest=instance-00000001'
|
|
juju run --unit nova-compute/2 'sudo pkill -f -9 guest=instance-00000001'
|
|
juju run --unit nova-compute/2 'pgrep -f guest=instance-00000001'
|
|
|
|
Supplementary information
|
|
-------------------------
|
|
|
|
This section contains information that can be useful when working with
|
|
Masakari.
|
|
|
|
* Once a failed node has been re-inserted into the cloud it will show, in
|
|
Nova, as 'disabled' but 'up' and, in Masakari, as 'on_maintenance'. It can
|
|
become an active hypervisor with:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack compute service set --enable <host-name> nova-compute
|
|
openstack segment host update --on_maintenance=False <segment-name> <host-name>
|
|
|
|
* A segment's recovery method can be updated with:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment update --recovery_method <method> --service_type COMPUTE <segment-name>
|
|
|
|
* A node cannot be assigned to a segment while it's assigned to another
|
|
segment. It must first be removed from the current segment with:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment host delete <segment-name> <host-name>
|
|
|
|
* A node's reserved status can be updated with:
|
|
|
|
.. code-block:: none
|
|
|
|
openstack segment host update --reserved=<boolean> <segment-name> <host-name>
|
|
|
|
.. LINKS
|
|
.. _MAAS: https://maas.io
|
|
.. _Masakari charm: http://jaas.ai/masakari
|
|
.. _openstack-base: https://jaas.ai/openstack-base
|
|
.. _OpenStack high availability: app-ha.html#ha-applications
|
|
.. _Configure OpenStack: configure-openstack.html
|
|
.. _deploying OpenStack: install-openstack.html
|