
Section Appendices removed - there are no more appendices left. Retitle pages. Change-Id: Ia209523d7c9a91696aeadef3ee132d40f0f5b789
347 lines
14 KiB
ReStructuredText
347 lines
14 KiB
ReStructuredText
==================
|
|
Ceph RBD mirroring
|
|
==================
|
|
|
|
Overview
|
|
--------
|
|
|
|
RADOS Block Device (RBD) mirroring is a process of asynchronous replication of
|
|
Ceph block device images between two or more Ceph clusters. Mirroring ensures
|
|
point-in-time consistent replicas of all changes to an image, including reads
|
|
and writes, block device resizing, snapshots, clones, and flattening. RBD
|
|
mirroring is mainly used for disaster recovery (i.e. having a secondary site as
|
|
a failover). See `Upstream Ceph documentation on RBD mirroring`_ for complete
|
|
information.
|
|
|
|
This guide will show how to deploy two Ceph clusters with RBD mirroring between
|
|
them with the use of the ceph-rbd-mirror charm. See the `charm's
|
|
documentation`_ for basic information and charm limitations.
|
|
|
|
RBD mirroring is only one aspect of datacentre redundancy. Refer to `Ceph RADOS
|
|
Gateway Multisite Replication`_ and other work to arrive at a complete
|
|
solution.
|
|
|
|
.. note::
|
|
|
|
RBD mirroring makes use of the journaling feature of Ceph. This incurs an
|
|
overhead for write activity on an RBD image that will adversely affect
|
|
performance. See Florian Haas' performance analysis of `RBD mirror`_ from
|
|
Cephalocon Barcelona 2019.
|
|
|
|
Requirements
|
|
------------
|
|
|
|
The two Ceph clusters will correspond to sites 'a' and 'b' and each cluster
|
|
will reside within a separate model (models 'site-a' and 'site-b'). The
|
|
deployment will require the use of `Cross model relations`_.
|
|
|
|
Deployment characteristics:
|
|
|
|
* each cluster will have 7 units:
|
|
|
|
* 3 x ceph-osd
|
|
* 3 x ceph-mon
|
|
* 1 x ceph-rbd-mirror
|
|
|
|
* application names will be used to distinguish between applications in site
|
|
'a' from those in site 'b' (e.g. ceph-mon-a and ceph-mon-b)
|
|
|
|
* the ceph-osd units will use block device ``/dev/vdd`` for their OSD volumes
|
|
|
|
.. note::
|
|
|
|
The two Ceph clusters can optionally be placed within the same model, and
|
|
thus obviate the need for cross model relations. This topology is not
|
|
generally considered to be a real world scenario.
|
|
|
|
Deployment
|
|
----------
|
|
|
|
For site 'a' the following configuration is placed into file ``site-a.yaml``:
|
|
|
|
.. code-block:: yaml
|
|
|
|
ceph-mon-a:
|
|
monitor-count: 3
|
|
expected-osd-count: 3
|
|
source: distro
|
|
|
|
ceph-osd-a:
|
|
osd-devices: /dev/vdd
|
|
source: distro
|
|
|
|
ceph-rbd-mirror-a:
|
|
source: distro
|
|
|
|
Create the model and deploy the software for each site:
|
|
|
|
* Site 'a'
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-model site-a
|
|
juju deploy -n 3 --config site-a.yaml ceph-osd ceph-osd-a
|
|
juju deploy -n 3 --config site-a.yaml ceph-mon ceph-mon-a
|
|
juju deploy --config site-a.yaml ceph-rbd-mirror ceph-rbd-mirror-a
|
|
|
|
* Site 'b'
|
|
|
|
An analogous configuration file is used (i.e. replace 'a' with 'b'):
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-model site-b
|
|
juju deploy -n 3 --config site-b.yaml ceph-osd ceph-osd-b
|
|
juju deploy -n 3 --config site-b.yaml ceph-mon ceph-mon-b
|
|
juju deploy --config site-b.yaml ceph-rbd-mirror ceph-rbd-mirror-b
|
|
|
|
Add two local relations for each site:
|
|
|
|
* Site 'a'
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-relation -m site-a ceph-mon-a:osd ceph-osd-a:mon
|
|
juju add-relation -m site-a ceph-mon-a:rbd-mirror ceph-rbd-mirror-a:ceph-local
|
|
|
|
* Site 'b'
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-relation -m site-b ceph-mon-b:osd ceph-osd-b:mon
|
|
juju add-relation -m site-b ceph-mon-b:rbd-mirror ceph-rbd-mirror-b:ceph-local
|
|
|
|
Export a ceph-rbd-mirror endpoint (by means of an "offer") for each site. This
|
|
will enable us to create the inter-site (cross model) relations:
|
|
|
|
* Site 'a'
|
|
|
|
.. code-block:: none
|
|
|
|
juju switch site-a
|
|
juju offer ceph-rbd-mirror-a:ceph-remote
|
|
|
|
Output:
|
|
|
|
.. code-block:: console
|
|
|
|
Application "ceph-rbd-mirror-a" endpoints [ceph-remote] available at "admin/site-a.ceph-rbd-mirror-a"
|
|
|
|
* Site 'b'
|
|
|
|
.. code-block:: none
|
|
|
|
juju switch site-b
|
|
juju offer ceph-rbd-mirror-b:ceph-remote
|
|
|
|
Output:
|
|
|
|
.. code-block:: console
|
|
|
|
Application "ceph-rbd-mirror-b" endpoints [ceph-remote] available at "admin/site-b.ceph-rbd-mirror-b"
|
|
|
|
Add the two inter-site relations by referring to the offer URLs (included in
|
|
the output above) as if they were applications in the local model:
|
|
|
|
.. code-block:: none
|
|
|
|
juju add-relation -m site-a ceph-mon-a admin/site-b.ceph-rbd-mirror-b
|
|
juju add-relation -m site-b ceph-mon-b admin/site-a.ceph-rbd-mirror-a
|
|
|
|
Verify the output of :command:`juju status` for each model:
|
|
|
|
.. code-block:: none
|
|
|
|
juju status -m site-a
|
|
|
|
Output:
|
|
|
|
.. code-block:: console
|
|
|
|
Model Controller Cloud/Region Version SLA Timestamp
|
|
site-a maas-prod-1 acme-1/default 2.8.1 unsupported 20:00:41Z
|
|
|
|
SAAS Status Store URL
|
|
ceph-rbd-mirror-b waiting icarus-maas admin/site-b.ceph-rbd-mirror-b
|
|
|
|
App Version Status Scale Charm Store Rev OS Notes
|
|
ceph-mon-a 15.2.3 active 3 ceph-mon jujucharms 49 ubuntu
|
|
ceph-osd-a 15.2.3 active 3 ceph-osd jujucharms 304 ubuntu
|
|
ceph-rbd-mirror-a 15.2.3 waiting 1 ceph-rbd-mirror jujucharms 12 ubuntu
|
|
|
|
Unit Workload Agent Machine Public address Ports Message
|
|
ceph-mon-a/0* active idle 0/lxd/0 10.0.0.57 Unit is ready and clustered
|
|
ceph-mon-a/1 active idle 1/lxd/0 10.0.0.58 Unit is ready and clustered
|
|
ceph-mon-a/2 active idle 2/lxd/0 10.0.0.59 Unit is ready and clustered
|
|
ceph-osd-a/0* active idle 0 10.0.0.69 Unit is ready (1 OSD)
|
|
ceph-osd-a/1 active idle 1 10.0.0.19 Unit is ready (1 OSD)
|
|
ceph-osd-a/2 active idle 2 10.0.0.20 Unit is ready (1 OSD)
|
|
ceph-rbd-mirror-a/0* waiting idle 3 10.0.0.22 Waiting for pools to be created
|
|
|
|
Machine State DNS Inst id Series AZ Message
|
|
0 started 10.0.0.69 virt-node-08 focal default Deployed
|
|
0/lxd/0 started 10.0.0.57 juju-bb0dc1-0-lxd-0 focal default Container started
|
|
1 started 10.0.0.19 virt-node-10 focal default Deployed
|
|
1/lxd/0 started 10.0.0.58 juju-bb0dc1-1-lxd-0 focal default Container started
|
|
2 started 10.0.0.20 virt-node-11 focal default Deployed
|
|
2/lxd/0 started 10.0.0.59 juju-bb0dc1-2-lxd-0 focal default Container started
|
|
3 started 10.0.0.22 virt-node-03 focal default Deployed
|
|
|
|
Offer Application Charm Rev Connected Endpoint Interface Role
|
|
ceph-rbd-mirror-a ceph-rbd-mirror-a ceph-rbd-mirror 12 1/1 ceph-remote ceph-rbd-mirror requirer
|
|
|
|
.. code-block:: none
|
|
|
|
juju status -m site-b
|
|
|
|
Output:
|
|
|
|
.. code-block:: console
|
|
|
|
Model Controller Cloud/Region Version SLA Timestamp
|
|
site-b maas-prod-1 acme-1/default 2.8.1 unsupported 20:02:58Z
|
|
|
|
SAAS Status Store URL
|
|
ceph-rbd-mirror-a waiting icarus-maas admin/site-a.ceph-rbd-mirror-a
|
|
|
|
App Version Status Scale Charm Store Rev OS Notes
|
|
ceph-mon-b 15.2.3 active 3 ceph-mon jujucharms 49 ubuntu
|
|
ceph-osd-b 15.2.3 active 3 ceph-osd jujucharms 304 ubuntu
|
|
ceph-rbd-mirror-b 15.2.3 waiting 1 ceph-rbd-mirror jujucharms 12 ubuntu
|
|
|
|
Unit Workload Agent Machine Public address Ports Message
|
|
ceph-mon-b/0* active idle 0/lxd/0 10.0.0.60 Unit is ready and clustered
|
|
ceph-mon-b/1 active idle 1/lxd/0 10.0.0.61 Unit is ready and clustered
|
|
ceph-mon-b/2 active idle 2/lxd/0 10.0.0.62 Unit is ready and clustered
|
|
ceph-osd-b/0* active idle 0 10.0.0.21 Unit is ready (1 OSD)
|
|
ceph-osd-b/1 active idle 1 10.0.0.54 Unit is ready (1 OSD)
|
|
ceph-osd-b/2 active idle 2 10.0.0.55 Unit is ready (1 OSD)
|
|
ceph-rbd-mirror-b/0* waiting idle 3 10.0.0.56 Waiting for pools to be created
|
|
|
|
Machine State DNS Inst id Series AZ Message
|
|
0 started 10.0.0.21 virt-node-02 focal default Deployed
|
|
0/lxd/0 started 10.0.0.60 juju-3ef7c5-0-lxd-0 focal default Container started
|
|
1 started 10.0.0.54 virt-node-04 focal default Deployed
|
|
1/lxd/0 started 10.0.0.61 juju-3ef7c5-1-lxd-0 focal default Container started
|
|
2 started 10.0.0.55 virt-node-05 focal default Deployed
|
|
2/lxd/0 started 10.0.0.62 juju-3ef7c5-2-lxd-0 focal default Container started
|
|
3 started 10.0.0.56 virt-node-06 focal default Deployed
|
|
|
|
Offer Application Charm Rev Connected Endpoint Interface Role
|
|
ceph-rbd-mirror-b ceph-rbd-mirror-b ceph-rbd-mirror 12 1/1 ceph-remote ceph-rbd-mirror requirer
|
|
|
|
There are no Ceph pools created by default. The next section ('Pool creation')
|
|
provides guidance.
|
|
|
|
Pool creation
|
|
-------------
|
|
|
|
RBD pools can be created by either a supporting charm (through the Ceph broker
|
|
protocol) or manually by the operator:
|
|
|
|
#. A charm-created pool (e.g. the glance or nova-compute charms) will
|
|
automatically be detected and acted upon (i.e. a remote pool will be set up
|
|
in the peer cluster).
|
|
|
|
#. A manually-created pool, whether done via the ceph-mon application or
|
|
through Ceph directly, will require an action to be run on the
|
|
ceph-rbd-mirror application leader in order for the remote pool to come
|
|
online.
|
|
|
|
For example, to create a pool manually in site 'a' and have ceph-rbd-mirror
|
|
(of site 'a') initialise a pool in site 'b':
|
|
|
|
.. code-block:: none
|
|
|
|
juju run-action --wait -m site-a ceph-mon-a/leader create-pool name=mypool app-name=rbd
|
|
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader refresh-pools
|
|
|
|
This can be verified by listing the pools in site 'b':
|
|
|
|
.. code-block:: none
|
|
|
|
juju run-action --wait -m site-b ceph-mon-b/leader list-pools
|
|
|
|
.. note::
|
|
|
|
Automatic peer-pool creation (for a charm-created pool) is based on the
|
|
local pool being labelled with a Ceph 'rbd' tag. This Ceph-internal
|
|
labelling occurs when the newly-created local pool is associated with the
|
|
RBD application. This last feature is supported starting with Ceph Luminous
|
|
(OpenStack Queens).
|
|
|
|
Failover and fallback
|
|
---------------------
|
|
|
|
To manage failover and fallback, the ``demote`` and ``promote`` actions are
|
|
applied to the ceph-rbd-mirror application leader.
|
|
|
|
For instance, to fail over from site 'a' to site 'b' the former is demoted and
|
|
the latter is promoted. The rest of the commands are status checks:
|
|
|
|
.. code-block:: none
|
|
|
|
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader status verbose=true
|
|
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader status verbose=true
|
|
|
|
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader demote
|
|
|
|
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader status verbose=true
|
|
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader status verbose=true
|
|
|
|
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader promote
|
|
|
|
To fall back to site 'a' the actions are reversed:
|
|
|
|
.. code-block:: none
|
|
|
|
juju run-action --wait -m site-b ceph-rbd-mirror-b/leader demote
|
|
juju run-action --wait -m site-a ceph-rbd-mirror-a/leader promote
|
|
|
|
.. note::
|
|
|
|
With Ceph Luminous (and greater), the mirror status information may not be
|
|
accurate. Specifically, the ``entries_behind_master`` counter may never get
|
|
to '0' even though the image has been fully synchronised.
|
|
|
|
Recovering from abrupt shutdown
|
|
-------------------------------
|
|
|
|
It is possible that an abrupt shutdown and/or an interruption to communication
|
|
channels may lead to a "split-brain" condition. This may cause the mirroring
|
|
daemon in each cluster to claim to be the primary. In such cases, the operator
|
|
must make a call as to which daemon is correct. Generally speaking, this means
|
|
deciding which cluster has the most recent data.
|
|
|
|
Elect a primary by applying the ``demote`` and ``promote`` actions to the
|
|
appropriate ceph-rbd-mirror leader. After doing so, the ``resync-pools`` action
|
|
must be run on the secondary cluster leader. The ``promote`` action may require
|
|
a force option.
|
|
|
|
Here, we make site 'a' be the primary by demoting site 'b' and promoting site
|
|
'a':
|
|
|
|
.. code-block:: none
|
|
|
|
juju run-action --wait -m site-b ceph-rbd-mirror/leader demote
|
|
juju run-action --wait -m site-a ceph-rbd-mirror/leader promote force=true
|
|
|
|
juju run-action --wait -m site-a ceph-rbd-mirror/leader status verbose=true
|
|
juju run-action --wait -m site-b ceph-rbd-mirror/leader status verbose=true
|
|
|
|
juju run-action --wait -m site-b ceph-rbd-mirror/leader resync-pools i-really-mean-it=true
|
|
|
|
.. note::
|
|
|
|
When using Ceph Luminous, the mirror state information will not be accurate
|
|
after recovering from unclean shutdown. Regardless of the output of the
|
|
status information, you will be able to write to images after a forced
|
|
promote.
|
|
|
|
.. LINKS
|
|
.. _charm's documentation: https://opendev.org/openstack/charm-ceph-rbd-mirror/src/branch/master/src/README.md
|
|
.. _Ceph RADOS Gateway Multisite replication: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-rgw-multisite.html
|
|
.. _Upstream Ceph documentation on RBD mirroring: https://docs.ceph.com/docs/mimic/rbd/rbd-mirroring/
|
|
.. _RBD mirror: https://fghaas.github.io/cephalocon2019-rbdmirror/#/7/6
|
|
.. _Cross model relations: https://juju.is/docs/cross-model-relations
|