Update RBD mirroring section

It was team-agreed to remove the single model
scenario. A note was left behind stating this
as an optional topology.

Move over advanced usage from the ceph-rbd-mirror
charm README. The latter content will be removed
in an imminent PR.

Removed reference to test bundles because those
bundles only cover the single-model scenario. An
overlay is also required for the two-model scenario.

Change-Id: I22e69ed67720dd98f2005b48e264ad05fa5c2573
This commit is contained in:
Peter Matulis 2020-08-28 14:42:27 -04:00
parent 67151f74b3
commit a746ee4a9d

View File

@ -1,216 +1,346 @@
==============================
Appendix K: Ceph RBD Mirroring
==============================
Overview
++++++++
--------
RADOS Block Device (RBD) mirroring is a process of asynchronous replication of
Ceph block device images between two or more Ceph clusters. Mirroring ensures
point-in-time consistent replicas of all changes to an image, including reads
and writes, block device resizing, snapshots, clones, and flattening. RBD
mirroring is mainly used for disaster recovery (i.e. having a secondary site as
a failover).
a failover). See `Upstream Ceph documentation on RBD mirroring`_ for complete
information.
This appendix includes detailed steps for deploying Ceph clusters with RBD
mirroring with the use of the ``ceph-rbd-mirror`` charm. The `ceph-rbd-mirror charm project page`_
includes in-depth information on this charm as well as how to use it for
cluster failover and disaster recovery. It also includes more brief deployment
steps as the ones shown here.
This guide will show how to deploy two Ceph clusters with RBD mirroring between
them with the use of the ceph-rbd-mirror charm. See the `charm's
documentation`_ for basic information and charm limitations.
RBD mirroring is only one specific element in datacentre redundancy. Refer
to `Ceph RADOS Gateway Multisite Replication`_ and other work to arrive at
a complete solution.
See `Upstream Ceph documentation on RBD mirroring`_ for complete information.
.. warning::
RBD mirroring makes use of the journalling feature of Ceph; this incurs
an overhead on all write activity on a RBD which will impact performance.
See Florian Haas' excellent performance analysis of `RBD mirror`_ from
Cephalocon Barcelona 2019.
Deployment
++++++++++
This section will show how to use charms to deploy two Ceph clusters (site 'a'
and site 'b') with RBD mirroring between them. Two scenarios will be shown:
#. Both clusters within the same model
#. Each cluster within a separate model
Notes that apply to both scenarios:
- each cluster/site employs 7 units (3 ``ceph-osd``, 3 ``ceph-mon``, and 1
``ceph-rbd-mirror``)
- application names are used to distinguish between applications in site 'a'
from those in site 'b'
- the ``ceph-osd`` units use block device ``/dev/vdd`` for OSD storage
Using one model
---------------
Create the common model ``sites-ab``:
.. code::
juju add-model sites-ab
Deploy units for site 'a':
.. code::
juju deploy -n 3 ceph-osd ceph-osd-a --config osd-devices=/dev/vdd
juju deploy -n 3 ceph-mon ceph-mon-a
juju deploy ceph-rbd-mirror ceph-rbd-mirror-a
Deploy units for site 'b':
.. code::
juju deploy -n 3 ceph-osd ceph-osd-b --config osd-devices=/dev/vdd
juju deploy -n 3 ceph-mon ceph-mon-b
juju deploy ceph-rbd-mirror ceph-rbd-mirror-b
Add relations between application endpoints. Notice how ``ceph-mon`` in one
site gets related to ``ceph-rbd-mirror`` in the other site (the "inter-site"
relations):
For site 'a',
.. code::
juju add-relation ceph-mon-a ceph-osd-a
juju add-relation ceph-mon-a ceph-rbd-mirror-a:ceph-local
juju add-relation ceph-mon-a ceph-rbd-mirror-b:ceph-remote
For site 'b',
.. code::
juju add-relation ceph-mon-b ceph-osd-b
juju add-relation ceph-mon-b ceph-rbd-mirror-b:ceph-local
juju add-relation ceph-mon-b ceph-rbd-mirror-a:ceph-remote
Verify the output of ``juju status`` for the model (only partial output is shown):
.. code::
juju status -m sites-ab
Unit Workload Agent Machine Public address Ports Message
ceph-mon-a/0* active idle 3 10.5.0.20 Unit is ready and clustered
ceph-mon-a/1 active idle 4 10.5.0.9 Unit is ready and clustered
ceph-mon-a/2 active idle 5 10.5.0.10 Unit is ready and clustered
ceph-mon-b/0* active idle 10 10.5.0.4 Unit is ready and clustered
ceph-mon-b/1 active idle 11 10.5.0.11 Unit is ready and clustered
ceph-mon-b/2 active idle 12 10.5.0.24 Unit is ready and clustered
ceph-osd-a/0* active idle 0 10.5.0.3 Unit is ready (1 OSD)
ceph-osd-a/1 active idle 1 10.5.0.12 Unit is ready (1 OSD)
ceph-osd-a/2 active idle 2 10.5.0.7 Unit is ready (1 OSD)
ceph-osd-b/0* active idle 7 10.5.0.21 Unit is ready (1 OSD)
ceph-osd-b/1 active idle 8 10.5.0.6 Unit is ready (1 OSD)
ceph-osd-b/2 active idle 9 10.5.0.23 Unit is ready (1 OSD)
ceph-rbd-mirror-a/0* waiting idle 6 10.5.0.30 Waiting for pools to be created
ceph-rbd-mirror-b/0* waiting idle 13 10.5.0.39 Waiting for pools to be created
You're done.
Note that Ceph pools have not yet been initialised. This can be done by other
charms or directly within Ceph.
Using two models
----------------
For this scenario we use model names ``site-a`` and ``site-b``.
For site 'a',
.. code::
juju add-model site-a
juju deploy -n 3 ceph-osd ceph-osd-a --config osd-devices=/dev/vdd
juju deploy -n 3 ceph-mon ceph-mon-a
juju deploy ceph-rbd-mirror ceph-rbd-mirror-a
For site 'b',
.. code::
juju add-model site-b
juju deploy -n 3 ceph-osd ceph-osd-b --config osd-devices=/dev/vdd
juju deploy -n 3 ceph-mon ceph-mon-b
juju deploy ceph-rbd-mirror ceph-rbd-mirror-b
Add relations between local application endpoints as before:
.. code::
juju add-relation -m site-a ceph-mon-a ceph-osd-a
juju add-relation -m site-a ceph-mon-a ceph-rbd-mirror-a:ceph-local
juju add-relation -m site-b ceph-mon-b ceph-osd-b
juju add-relation -m site-b ceph-mon-b ceph-rbd-mirror-b:ceph-local
To create the inter-site relations one must export one of the application
endpoints from each model by means of an "offer". Here, we make offers for
``ceph-rbd-mirror`` in each model:
.. code::
juju switch site-a
juju offer ceph-rbd-mirror-a:ceph-remote
Application "ceph-rbd-mirror-a" endpoints [ceph-remote] available at "admin/site-a.ceph-rbd-mirror-a"
juju switch site-b
juju offer ceph-rbd-mirror-b:ceph-remote
Application "ceph-rbd-mirror-b" endpoints [ceph-remote] available at "admin/site-b.ceph-rbd-mirror-b"
These *cross model relations* can now be made by referring to the offer URLs
(included in the output above) as if they were applications in the local model:
.. code::
juju add-relation -m site-a ceph-mon-a admin/site-b.ceph-rbd-mirror-b
juju add-relation -m site-b ceph-mon-b admin/site-a.ceph-rbd-mirror-a
Verify the output of ``juju status`` for both models (only partial output is shown):
.. code::
juju status -m site-a
Unit Workload Agent Machine Public address Ports Message
ceph-mon-a/0* active idle 3 10.5.0.23 Unit is ready and clustered
ceph-mon-a/1 active idle 4 10.5.0.5 Unit is ready and clustered
ceph-mon-a/2 active idle 5 10.5.0.9 Unit is ready and clustered
ceph-osd-a/0* active idle 0 10.5.0.19 Unit is ready (1 OSD)
ceph-osd-a/1 active idle 1 10.5.0.7 Unit is ready (1 OSD)
ceph-osd-a/2 active idle 2 10.5.0.10 Unit is ready (1 OSD)
ceph-rbd-mirror-a/0* waiting idle 6 10.5.0.11 Waiting for pools to be created
juju status -m site-b
Unit Workload Agent Machine Public address Ports Message
ceph-mon-b/0* active idle 3 10.5.0.29 Unit is ready and clustered
ceph-mon-b/1 active idle 4 10.5.0.4 Unit is ready and clustered
ceph-mon-b/2 active idle 5 10.5.0.8 Unit is ready and clustered
ceph-osd-b/0* active idle 0 10.5.0.13 Unit is ready (1 OSD)
ceph-osd-b/1 active idle 1 10.5.0.24 Unit is ready (1 OSD)
ceph-osd-b/2 active idle 2 10.5.0.33 Unit is ready (1 OSD)
Ceph-rbd-mirror-b/0* waiting idle 6 10.5.0.27 Waiting for pools to be created
You're done.
RBD mirroring is only one aspect of datacentre redundancy. Refer to `Ceph RADOS
Gateway Multisite Replication`_ and other work to arrive at a complete
solution.
.. note::
Minimal two-cluster test bundles can be found in the ``ceph-rbd-mirror``
charm's ``src/tests/bundles`` subdirectory. Examples include both clusters
deployed in one model as well as in separate models.
RBD mirroring makes use of the journaling feature of Ceph. This incurs an
overhead for write activity on an RBD image that will adversely affect
performance. See Florian Haas' performance analysis of `RBD mirror`_ from
Cephalocon Barcelona 2019.
Requirements
------------
The two Ceph clusters will correspond to sites 'a' and 'b' and each cluster
will reside within a separate model (models 'site-a' and 'site-b'). The
deployment will require the use of `Cross model relations`_.
Deployment characteristics:
* each cluster will have 7 units:
* 3 x ceph-osd
* 3 x ceph-mon
* 1 x ceph-rbd-mirror
* application names will be used to distinguish between applications in site
'a' from those in site 'b' (e.g. ceph-mon-a and ceph-mon-b)
* the ceph-osd units will use block device ``/dev/vdd`` for their OSD volumes
.. note::
The two Ceph clusters can optionally be placed within the same model, and
thus obviate the need for cross model relations. This topology is not
generally considered to be a real world scenario.
Deployment
----------
For site 'a' the following configuration is placed into file ``site-a.yaml``:
.. code-block:: yaml
ceph-mon-a:
monitor-count: 3
expected-osd-count: 3
source: distro
ceph-osd-a:
osd-devices: /dev/vdd
source: distro
ceph-rbd-mirror-a:
source: distro
Create the model and deploy the software for each site:
* Site 'a'
.. code-block:: none
juju add-model site-a
juju deploy -n 3 --config site-a.yaml ceph-osd ceph-osd-a
juju deploy -n 3 --config site-a.yaml ceph-mon ceph-mon-a
juju deploy --config site-a.yaml ceph-rbd-mirror ceph-rbd-mirror-a
* Site 'b'
An analogous configuration file is used (i.e. replace 'a' with 'b'):
.. code-block:: none
juju add-model site-b
juju deploy -n 3 --config site-b.yaml ceph-osd ceph-osd-b
juju deploy -n 3 --config site-b.yaml ceph-mon ceph-mon-b
juju deploy --config site-b.yaml ceph-rbd-mirror ceph-rbd-mirror-b
Add two local relations for each site:
* Site 'a'
.. code-block:: none
juju add-relation -m site-a ceph-mon-a:osd ceph-osd-a:mon
juju add-relation -m site-a ceph-mon-a:rbd-mirror ceph-rbd-mirror-a:ceph-local
* Site 'b'
.. code-block:: none
juju add-relation -m site-b ceph-mon-b:osd ceph-osd-b:mon
juju add-relation -m site-b ceph-mon-b:rbd-mirror ceph-rbd-mirror-b:ceph-local
Export a ceph-rbd-mirror endpoint (by means of an "offer") for each site. This
will enable us to create the inter-site (cross model) relations:
* Site 'a'
.. code-block:: none
juju switch site-a
juju offer ceph-rbd-mirror-a:ceph-remote
Output:
.. code-block:: console
Application "ceph-rbd-mirror-a" endpoints [ceph-remote] available at "admin/site-a.ceph-rbd-mirror-a"
* Site 'b'
.. code-block:: none
juju switch site-b
juju offer ceph-rbd-mirror-b:ceph-remote
Output:
.. code-block:: console
Application "ceph-rbd-mirror-b" endpoints [ceph-remote] available at "admin/site-b.ceph-rbd-mirror-b"
Add the two inter-site relations by referring to the offer URLs (included in
the output above) as if they were applications in the local model:
.. code-block:: none
juju add-relation -m site-a ceph-mon-a admin/site-b.ceph-rbd-mirror-b
juju add-relation -m site-b ceph-mon-b admin/site-a.ceph-rbd-mirror-a
Verify the output of :command:`juju status` for each model:
.. code-block:: none
juju status -m site-a
Output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
site-a maas-prod-1 acme-1/default 2.8.1 unsupported 20:00:41Z
SAAS Status Store URL
ceph-rbd-mirror-b waiting icarus-maas admin/site-b.ceph-rbd-mirror-b
App Version Status Scale Charm Store Rev OS Notes
ceph-mon-a 15.2.3 active 3 ceph-mon jujucharms 49 ubuntu
ceph-osd-a 15.2.3 active 3 ceph-osd jujucharms 304 ubuntu
ceph-rbd-mirror-a 15.2.3 waiting 1 ceph-rbd-mirror jujucharms 12 ubuntu
Unit Workload Agent Machine Public address Ports Message
ceph-mon-a/0* active idle 0/lxd/0 10.0.0.57 Unit is ready and clustered
ceph-mon-a/1 active idle 1/lxd/0 10.0.0.58 Unit is ready and clustered
ceph-mon-a/2 active idle 2/lxd/0 10.0.0.59 Unit is ready and clustered
ceph-osd-a/0* active idle 0 10.0.0.69 Unit is ready (1 OSD)
ceph-osd-a/1 active idle 1 10.0.0.19 Unit is ready (1 OSD)
ceph-osd-a/2 active idle 2 10.0.0.20 Unit is ready (1 OSD)
ceph-rbd-mirror-a/0* waiting idle 3 10.0.0.22 Waiting for pools to be created
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.69 virt-node-08 focal default Deployed
0/lxd/0 started 10.0.0.57 juju-bb0dc1-0-lxd-0 focal default Container started
1 started 10.0.0.19 virt-node-10 focal default Deployed
1/lxd/0 started 10.0.0.58 juju-bb0dc1-1-lxd-0 focal default Container started
2 started 10.0.0.20 virt-node-11 focal default Deployed
2/lxd/0 started 10.0.0.59 juju-bb0dc1-2-lxd-0 focal default Container started
3 started 10.0.0.22 virt-node-03 focal default Deployed
Offer Application Charm Rev Connected Endpoint Interface Role
ceph-rbd-mirror-a ceph-rbd-mirror-a ceph-rbd-mirror 12 1/1 ceph-remote ceph-rbd-mirror requirer
.. code-block:: none
juju status -m site-b
Output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
site-b maas-prod-1 acme-1/default 2.8.1 unsupported 20:02:58Z
SAAS Status Store URL
ceph-rbd-mirror-a waiting icarus-maas admin/site-a.ceph-rbd-mirror-a
App Version Status Scale Charm Store Rev OS Notes
ceph-mon-b 15.2.3 active 3 ceph-mon jujucharms 49 ubuntu
ceph-osd-b 15.2.3 active 3 ceph-osd jujucharms 304 ubuntu
ceph-rbd-mirror-b 15.2.3 waiting 1 ceph-rbd-mirror jujucharms 12 ubuntu
Unit Workload Agent Machine Public address Ports Message
ceph-mon-b/0* active idle 0/lxd/0 10.0.0.60 Unit is ready and clustered
ceph-mon-b/1 active idle 1/lxd/0 10.0.0.61 Unit is ready and clustered
ceph-mon-b/2 active idle 2/lxd/0 10.0.0.62 Unit is ready and clustered
ceph-osd-b/0* active idle 0 10.0.0.21 Unit is ready (1 OSD)
ceph-osd-b/1 active idle 1 10.0.0.54 Unit is ready (1 OSD)
ceph-osd-b/2 active idle 2 10.0.0.55 Unit is ready (1 OSD)
ceph-rbd-mirror-b/0* waiting idle 3 10.0.0.56 Waiting for pools to be created
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.21 virt-node-02 focal default Deployed
0/lxd/0 started 10.0.0.60 juju-3ef7c5-0-lxd-0 focal default Container started
1 started 10.0.0.54 virt-node-04 focal default Deployed
1/lxd/0 started 10.0.0.61 juju-3ef7c5-1-lxd-0 focal default Container started
2 started 10.0.0.55 virt-node-05 focal default Deployed
2/lxd/0 started 10.0.0.62 juju-3ef7c5-2-lxd-0 focal default Container started
3 started 10.0.0.56 virt-node-06 focal default Deployed
Offer Application Charm Rev Connected Endpoint Interface Role
ceph-rbd-mirror-b ceph-rbd-mirror-b ceph-rbd-mirror 12 1/1 ceph-remote ceph-rbd-mirror requirer
There are no Ceph pools created by default. The next section ('Pool creation')
provides guidance.
Pool creation
-------------
RBD pools can be created by either a supporting charm (through the Ceph broker
protocol) or manually by the operator:
#. A charm-created pool (e.g. the glance or nova-compute charms) will
automatically be detected and acted upon (i.e. a remote pool will be set up
in the peer cluster).
#. A manually-created pool, whether done via the ceph-mon application or
through Ceph directly, will require an action to be run on the
ceph-rbd-mirror application leader in order for the remote pool to come
online.
For example, to create a pool manually in site 'a' and have ceph-rbd-mirror
(of site 'a') initialise a pool in site 'b':
.. code-block:: none
juju run-action --wait -m site-a ceph-mon-a/leader create-pool name=mypool app-name=rbd
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader refresh-pools
This can be verified by listing the pools in site 'b':
.. code-block:: none
juju run-action --wait -m site-b ceph-mon-b/leader list-pools
.. note::
Automatic peer-pool creation (for a charm-created pool) is based on the
local pool being labelled with a Ceph 'rbd' tag. This Ceph-internal
labelling occurs when the newly-created local pool is associated with the
RBD application. This last feature is supported starting with Ceph Luminous
(OpenStack Queens).
Failover and fallback
---------------------
To manage failover and fallback, the ``demote`` and ``promote`` actions are
applied to the ceph-rbd-mirror application leader.
For instance, to fail over from site 'a' to site 'b' the former is demoted and
the latter is promoted. The rest of the commands are status checks:
.. code-block:: none
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader status verbose=true
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader status verbose=true
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader demote
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader status verbose=true
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader status verbose=true
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader promote
To fall back to site 'a' the actions are reversed:
.. code-block:: none
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader demote
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader promote
.. note::
With Ceph Luminous (and greater), the mirror status information may not be
accurate. Specifically, the ``entries_behind_master`` counter may never get
to '0' even though the image has been fully synchronised.
Recovering from abrupt shutdown
-------------------------------
It is possible that an abrupt shutdown and/or an interruption to communication
channels may lead to a "split-brain" condition. This may cause the mirroring
daemon in each cluster to claim to be the primary. In such cases, the operator
must make a call as to which daemon is correct. Generally speaking, this means
deciding which cluster has the most recent data.
Elect a primary by applying the ``demote`` and ``promote`` actions to the
appropriate ceph-rbd-mirror leader. After doing so, the ``resync-pools`` action
must be run on the secondary cluster leader. The ``promote`` action may require
a force option.
Here, we make site 'a' be the primary by demoting site 'b' and promoting site
'a':
.. code-block:: none
juju run-action --wait -m site-b ceph-rbd-mirror/leader demote
juju run-action --wait -m site-a ceph-rbd-mirror/leader promote force=true
juju run-action --wait -m site-a ceph-rbd-mirror/leader status verbose=true
juju run-action --wait -m site-b ceph-rbd-mirror/leader status verbose=true
juju run-action --wait -m site-b ceph-rbd-mirror/leader resync-pools i-really-mean-it=true
.. note::
When using Ceph Luminous, the mirror state information will not be accurate
after recovering from unclean shutdown. Regardless of the output of the
status information, you will be able to write to images after a forced
promote.
.. LINKS
.. _ceph-rbd-mirror charm project page: https://opendev.org/openstack/charm-ceph-rbd-mirror/src/branch/master/src/README.md
.. _charm's documentation: https://opendev.org/openstack/charm-ceph-rbd-mirror/src/branch/master/src/README.md
.. _Ceph RADOS Gateway Multisite replication: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-rgw-multisite.html
.. _Upstream Ceph documentation on RBD mirroring: https://docs.ceph.com/docs/mimic/rbd/rbd-mirroring/
.. _RBD mirror: https://fghaas.github.io/cephalocon2019-rbdmirror/#/7/6
.. _Cross model relations: https://juju.is/docs/cross-model-relations