Merge "RADOS Gateway: EC and Storage Classing"
This commit is contained in:
commit
770a8c2943
173
deploy-guide/source/app-erasure-coding.rst
Normal file
173
deploy-guide/source/app-erasure-coding.rst
Normal file
@ -0,0 +1,173 @@
|
|||||||
|
Appendix M: Ceph Erasure Coding and Device Classing
|
||||||
|
===================================================
|
||||||
|
|
||||||
|
Overview
|
||||||
|
++++++++
|
||||||
|
|
||||||
|
This appendix is intended as a post deployment guide to re-configuring RADOS
|
||||||
|
gateway pools to use erasure coding rather than replication. It also covers
|
||||||
|
use of a specific device class (NVMe, SSD or HDD) when creating the erasure
|
||||||
|
coding profile as well as other configuration options that need to be
|
||||||
|
considered during deployment.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Any existing data is maintained by following this process, however
|
||||||
|
reconfiguration should take place immediately post deployment to avoid
|
||||||
|
prolonged ‘copy-pool’ operations.
|
||||||
|
|
||||||
|
RADOS Gateway bucket weighting
|
||||||
|
++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
The weighting of the various pools in a deployment drives the number of
|
||||||
|
placement groups (PG’s) created to support each pool. In the ceph-radosgw
|
||||||
|
charm this is configured for the data bucket using:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju config ceph-radosgw rgw-buckets-pool-weight=20
|
||||||
|
|
||||||
|
Note the default of 20% - if the deployment is a pure ceph-radosgw
|
||||||
|
deployment this value should be increased to the expected % use of
|
||||||
|
storage. The device class also needs to be taken into account (but
|
||||||
|
for erasure coding this needs to be specified post deployment via action
|
||||||
|
execution).
|
||||||
|
|
||||||
|
Ceph automatic device classing
|
||||||
|
++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
Newer versions of Ceph do automatic classing of OSD devices. Each OSD
|
||||||
|
will be placed into ‘nvme’, ‘ssd’ or ‘hdd’ device classes. These can
|
||||||
|
be used when creating erasure profiles or new CRUSH rules (see following
|
||||||
|
sections).
|
||||||
|
|
||||||
|
The classes can be inspected using:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
sudo ceph osd crush tree
|
||||||
|
|
||||||
|
ID CLASS WEIGHT TYPE NAME
|
||||||
|
-1 8.18729 root default
|
||||||
|
-5 2.72910 host node-laveran
|
||||||
|
2 nvme 0.90970 osd.2
|
||||||
|
5 ssd 0.90970 osd.5
|
||||||
|
7 ssd 0.90970 osd.7
|
||||||
|
-7 2.72910 host node-mees
|
||||||
|
1 nvme 0.90970 osd.1
|
||||||
|
6 ssd 0.90970 osd.6
|
||||||
|
8 ssd 0.90970 osd.8
|
||||||
|
-3 2.72910 host node-pytheas
|
||||||
|
0 nvme 0.90970 osd.0
|
||||||
|
3 ssd 0.90970 osd.3
|
||||||
|
4 ssd 0.90970 osd.4
|
||||||
|
|
||||||
|
|
||||||
|
Configuring erasure coding
|
||||||
|
++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
The RADOS gateway makes use of a number of pools, but the only pool
|
||||||
|
that should be converted to use erasure coding (EC) is the data pool:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
default.rgw.buckets.data
|
||||||
|
|
||||||
|
All other pools should be replicated as they are by default.
|
||||||
|
|
||||||
|
To create a new EC profile and pool:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 create-erasure-profile \
|
||||||
|
name=nvme-ec device-class=nvme
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 create-pool \
|
||||||
|
name=default.rgw.buckets.data.new \
|
||||||
|
pool-type=erasure \
|
||||||
|
erasure-profile-name=nvme-ec \
|
||||||
|
percent-data=90
|
||||||
|
|
||||||
|
The percent-data option should be set based on the type of deployment
|
||||||
|
but if the RADOS gateway is the only target for the NVMe storage class,
|
||||||
|
then 90% is appropriate (other RADOS gateway pools are tiny and use
|
||||||
|
between 0.10% and 3% of storage)
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The create-erasure-profile action has a number of other
|
||||||
|
options including adjustment of the K/M values which affect the
|
||||||
|
computational overhead and underlying storage consumed per MB stored.
|
||||||
|
Sane defaults are provided but they require a minimum of five hosts
|
||||||
|
with block devices of the right class.
|
||||||
|
|
||||||
|
To avoid any creation/mutation of stored data during migration,
|
||||||
|
shutdown all RADOS gateway instances:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run --application ceph-radosgw \
|
||||||
|
"sudo systemctl stop ceph-radosgw.target"
|
||||||
|
|
||||||
|
The existing buckets.data pool can then be copied and switched:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 rename-pool \
|
||||||
|
name=default.rgw.buckets.data \
|
||||||
|
new-name=default.rgw.buckets.data.old
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 rename-pool \
|
||||||
|
name=default.rgw.buckets.data.new \
|
||||||
|
new-name=default.rgw.buckets.data
|
||||||
|
|
||||||
|
At this point the RADOS gateway instances can be restarted:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run --application ceph-radosgw \
|
||||||
|
"sudo systemctl start ceph-radosgw.target"
|
||||||
|
|
||||||
|
Once successful operation of the deployment has been confirmed,
|
||||||
|
the old pool can be deleted:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 delete-pool \
|
||||||
|
name=default.rgw.buckets.data.old
|
||||||
|
|
||||||
|
Moving other RADOS gateway pools to NVMe storage
|
||||||
|
++++++++++++++++++++++++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
The buckets.data pool is the largest pool and the one that can make
|
||||||
|
use of EC; other pools could also be migrated to the same storage
|
||||||
|
class for consistent performance:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
juju run-action --wait ceph-mon/0 create-crush-rule \
|
||||||
|
name=replicated_nvme device-class=nvme
|
||||||
|
|
||||||
|
The CRUSH rule for the other RADOS gateway pools can then be updated:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
pools=".rgw.root
|
||||||
|
default.rgw.control
|
||||||
|
default.rgw.data.root
|
||||||
|
default.rgw.gc
|
||||||
|
default.rgw.log
|
||||||
|
default.rgw.intent-log
|
||||||
|
default.rgw.meta
|
||||||
|
default.rgw.usage
|
||||||
|
default.rgw.users.keys
|
||||||
|
default.rgw.users.uid
|
||||||
|
default.rgw.buckets.extra
|
||||||
|
default.rgw.buckets.index
|
||||||
|
default.rgw.users.email
|
||||||
|
default.rgw.users.swift"
|
||||||
|
|
||||||
|
for pool in $pools; do
|
||||||
|
juju run-action --wait ceph-mon/0 pool-set \
|
||||||
|
name=$pool key=crush_rule value=replicated_nvme
|
||||||
|
done
|
@ -17,3 +17,4 @@ Appendices
|
|||||||
app-rgw-multisite.rst
|
app-rgw-multisite.rst
|
||||||
app-ceph-rbd-mirror.rst
|
app-ceph-rbd-mirror.rst
|
||||||
app-masakari.rst
|
app-masakari.rst
|
||||||
|
app-erasure-coding.rst
|
||||||
|
Loading…
x
Reference in New Issue
Block a user