charm-deployment-guide/deploy-guide/source/app-series-upgrade.rst
Peter Matulis 67151f74b3 Miscellaneous doc improvements
Change-Id: Ie4425ac60bfc649c48a08308ede478b1fd6e7c9c
2020-08-05 14:02:21 -04:00

359 lines
14 KiB
ReStructuredText

===========================
Appendix F1: Series upgrade
===========================
Overview
--------
The purpose of this document is to provide foundational knowledge for preparing
an administrator to perform a series upgrade across a Charmed OpenStack cloud.
This translates to upgrading the operating system of every cloud node to an
entirely new version.
.. note::
A series upgrade, a charm upgrade, and an OpenStack upgrade are all
conceptually different and involve separate operations.
Once this document has been studied the administrator will be ready to graduate
to the :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>` guide
that describes the process in more detail.
Concerning the cloud being operated upon, the following is assumed:
* It is being upgraded from one LTS series to another (e.g. xenial to
bionic, bionic to focal, etc.).
* Its nodes are backed by MAAS.
* Its services are highly available.
* It is being upgraded with minimal downtime.
.. warning::
Upgrading a single production machine from one LTS to another is a serious
task. Doing so for every cloud node can be that much harder. Attempting to
do this with minimal cloud downtime is an order of magnitude more complex.
Such an undertaking should be executed by persons who are intimately
familiar with Juju and the currently deployed charms (and their related
applications). It should first be tested on a non-production cloud that
closely resembles the production environment.
The Juju :command:`upgrade-series` command
------------------------------------------
The Juju :command:`upgrade-series` command is the cornerstone of the entire
procedure. This command manages an operating system upgrade of a targeted
machine and operates on every application unit hosted on that machine. The
command works in conjunction with either the :command:`prepare` or the
:command:`complete` sub-command.
The basic process is to inform the units on a machine that a series upgrade
is about to commence, to perform the upgrade, and then inform the units that
the upgrade has finished. In most cases with the OpenStack charms, units will
first be paused and be left with a workload status of "blocked" and a message
of "Ready for do-release-upgrade and reboot."
For example, to inform units on machine '0' that an upgrade (to series
'bionic') is about to occur:
.. code-block:: none
juju upgrade-series 0 prepare bionic
The :command:`prepare` sub-command causes **all** the charms (including
subordinates) on the machine to run their ``pre-series-upgrade`` hook.
The administrator must then perform the traditional steps involved in upgrading
the OS on the targeted machine (in this example, machine '0'). For example,
update/upgrade packages with :command:`apt update && apt full-upgrade`; invoke
the :command:`do-release-upgrade` command; and reboot the machine once
complete.
The :command:`complete` sub-command causes **all** the charms (including
subordinates) on the machine to run their ``post-series-upgrade`` hook. In most
cases with the OpenStack charms, configuration files will be re-written, units
will be resumed automatically (if paused), and be left with a workload status
of "active" and a message of "Unit is ready":
.. code-block:: none
juju upgrade-series 0 complete
At this point the series upgrade on the machine and its charms is now done. In
the :command:`juju status` output the machine's entry under the Series column
will have changed from 'xenial' to 'bionic'.
.. note::
Charms are not obliged to support the two series upgrade hooks but they do
make for a more intelligent and a less error-prone series upgrade.
Containers (and their charms) hosted on the target machine remain unaffected by
this command. However, during the required post-upgrade reboot of the host all
containerised services will naturally be unavailable.
See the Juju documentation to learn more about the `series upgrade`_ feature.
.. _pre-upgrade_requirements:
Pre-upgrade requirements
------------------------
This is a list of requirements that apply to any cloud. They must be met before
making any changes.
* All the cloud nodes should be using the same series, be in good working
order, and be updated with the latest stable software packages (APT
upgrades).
* The cloud should be running the latest OpenStack release supported by the
current series (e.g. Mitaka for trusty, Queens for xenial, etc.). See `Ubuntu
OpenStack release cycle`_ and `OpenStack upgrades`_.
* The cloud should be fully operational and error-free.
* All currently deployed charms should be upgraded to the latest stable charm
revision. See `Charm upgrades`_.
* The Juju model comprising the cloud should be error-free (e.g. there should
be no charm hook errors).
* `Automatic package updates`_ should be disabled on the nodes to avoid
potential conflicts with the manual (or scripted) APT steps.
.. _series_specific_procedures:
Specific series upgrade procedures
----------------------------------
Charms belonging to the OpenStack Charms project are designed to accommodate
the next LTS target series wherever possible. However, a new series may
occasionally introduce unavoidable challenges for a deployed charm. For
instance, it could be that a charm is replaced by an entirely new charm on the
new series. This can happen due to development policy concerning the charms
themselves (e.g. the ceph charm is replaced by the ceph-mon and ceph-osd
charms) or due to reasons independent of the charms (e.g. the workload software
is no longer supported on the new operating system). Any core OpenStack charms
affected in this way will be documented below.
* :ref:`percona-cluster charm: series upgrade to Focal <percona_series_upgrade_to_focal>`
Known series-related issues
---------------------------
Ensure that your deployment will not be adversely affected by known
series-related problems when upgrading. The following issues have been flagged
for consideration.
DNS HA with the focal series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DNS HA has been reported to not work on the focal series. See `LP #1882508`_
for more information.
.. _workload_specific_preparations:
Workload specific preparations
------------------------------
These are preparations that are specific to the current cloud deployment.
Completing them in advance is an integral part of the upgrade.
Charm upgradability
~~~~~~~~~~~~~~~~~~~
Verify the documented series upgrade processes for all currently deployed
charms. Some charms, especially third-party charms, may either not have
implemented series upgrade yet or simply may not work with the target series.
Pay particular attention to SDN (software defined networking) and storage
charms as these play a crucial role in cloud operations.
Workload maintenance
~~~~~~~~~~~~~~~~~~~~
Any workload-specific pre and post series upgrade maintenance tasks should be
readied in advance. For example, if a node's workload requires a database then
a pre-upgrade backup plan should be drawn up. Similarly, if a workload requires
settings to be adjusted post-upgrade then those changes should be prepared
ahead of time. Pay particular attention to stateful services due to their
importance in cloud operations. Examples include evacuating a compute node,
switching an HA router to another node, and storage rebalancing.
Pre-upgrade tasks are performed before issuing the :command:`prepare`
subcommand, and post-upgrade tasks are done immediately prior to issuing the
:command:`complete` subcommand.
Workflow: sequential vs. concurrent
-----------------------------------
In terms of the workflow there are two approaches:
* Sequential - upgrading one machine at a time
* Concurrent - upgrading a group of machines simultaneously
Normally, it is best to upgrade sequentially as this ensures data reliability
and availability (we've assumed an HA cloud). This approach also minimises
adverse effects to the deployment if something goes wrong.
However, for even moderately sized clouds, an intervention based purely on a
sequential approach can take a very long time to complete. This is where the
concurrent method becomes attractive.
In general, a concurrent approach is a viable option for API applications but
is not an option for stateful applications. During the course of the cloud-wide
series upgrade a hybrid strategy is a reasonable choice.
To be clear, the above pertains to upgrading the series on machines associated
with a single application. It is also possible however to employ similar
thinking to multiple applications.
Application leadership
----------------------
`Application leadership`_ plays an important role in determining the order in
which machines (and their applications) will have their series upgraded. The
guiding principle is that an application's unit leader is acted upon by a
series upgrade before its non-leaders are (the leader is typically used to
coordinate aspects with other services over relations).
.. note::
Juju will not transfer the leadership of an application (and any
subordinate) to another unit while the application is undergoing a series
upgrade. This allows a charm to make assumptions that will lead to a more
reliable outcome.
Assuming that a cloud is intended to eventually undergo a series upgrade, this
guideline will generally influence the cloud's topology. Containerisation is an
effective response to this.
.. important::
Applications should be co-located on the same machine only if leadership
plays a negligible role. Applications deployed with the compute and storage
charms fall into this category.
.. _generic_series_upgrade:
Generic series upgrade
----------------------
This section contains a generic overview of a series upgrade for three
machines, each hosting a unit of the `ubuntu`_ application. The initial and
target series are xenial and bionic, respectively.
This scenario is represented by the following :command:`juju status` command
output:
.. code-block:: console
Model Controller Cloud/Region Version SLA Timestamp
upgrade maas-controller mymaas/default 2.7.6 unsupported 18:33:49Z
App Version Status Scale Charm Store Rev OS Notes
ubuntu1 16.04 active 3 ubuntu jujucharms 15 ubuntu
Unit Workload Agent Machine Public address Ports Message
ubuntu1/0* active idle 0 10.0.0.241 ready
ubuntu1/1 active idle 1 10.0.0.242 ready
ubuntu1/2 active idle 2 10.0.0.243 ready
Machine State DNS Inst id Series AZ Message
0 started 10.0.0.241 node2 xenial zone3 Deployed
1 started 10.0.0.242 node3 xenial zone4 Deployed
2 started 10.0.0.243 node1 xenial zone5 Deployed
First ensure that any new applications will (by default) use the new series, in
this case bionic. This is done by configuring at the model level:
.. code-block:: none
juju model-config default-series=bionic
Now do the same at the application level. This will affect any new units of the
existing application, in this case 'ubuntu1':
.. code-block:: none
juju set-series ubuntu1 bionic
Perform the actual series upgrade. We begin with the machine that houses the
application unit leader, machine 0 (see the asterisk in the Unit column). Note
that :command:`juju run` is preferred over :command:`juju ssh` but the latter
should be used for sessions requiring user interaction:
.. code-block:: none
:linenos:
# Perform any workload maintenance pre-upgrade steps here
juju upgrade-series 0 prepare bionic
juju run --machine=0 -- sudo apt update
juju ssh 0 sudo apt full-upgrade
juju ssh 0 sudo do-release-upgrade
# Perform any workload maintenance post-upgrade steps here
# Reboot the machine (if not already done)
juju upgrade-series 0 complete
In this generic example there are no `workload maintenance`_ steps to perform.
If there were post-upgrade steps then the prompt to reboot the machine at the
end of :command:`do-release-upgrade` should be answered in the negative and the
reboot will be initiated manually on line 7 (i.e. :command:`sudo reboot`).
It is possible to invoke the :command:`complete` sub-command before the
upgraded machine is ready to process it. Juju will block until the unit is
ready after being restarted.
In lines 4 and 5 the upgrade proceeds in the usual interactive fashion. If a
non-interactive mode is preferred, those two lines can be replaced with:
.. code-block:: none
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive apt-get --assume-yes \
-o "Dpkg::Options::=--force-confdef" \
-o "Dpkg::Options::=--force-confold" dist-upgrade
juju run --machine=0 --timeout=30m -- sudo DEBIAN_FRONTEND=noninteractive \
do-release-upgrade -f DistUpgradeViewNonInteractive
The :command:`apt-get` command is preferred while in non-interactive mode (or
with scripting).
By default, an LTS release will not have an upgrade candidate until the "point
release" of the next LTS is published. You can override this policy by using
the ``-d`` (development) option with the :command:`do-release-upgrade` command.
.. caution::
Performing a series upgrade non-interactively can be risky so the decision
to do so should be made only after careful deliberation.
Machines 1 and 2 should now be upgraded in the same way (in no particular
order).
.. note::
It has been reported that a trusty:xenial series upgrade may require an
additional step to ensure a purely non-interactive mode. A file under
``/etc/apt/apt.conf.d`` with a single line as its contents needs to be added
to the target machine pre-upgrade and be removed post-upgrade. It can be
created (here on machine 0) in this way:
juju run --machine=0 -- "echo 'DPkg::options { "--force-confdef"; "--force-confnew"; }' | sudo tee /etc/apt/apt.conf.d/local"
Next steps
----------
When you are ready to perform a series upgrade across your cloud proceed to
appendix :doc:`Series upgrade OpenStack <app-series-upgrade-openstack>`.
.. LINKS
.. _Charm upgrades: app-upgrade-openstack#charm-upgrades
.. _OpenStack upgrades: app-series-upgrade-openstack
.. _series upgrade: https://juju.is/docs/upgrading-series
.. _automatic package updates: https://help.ubuntu.com/lts/serverguide/automatic-updates.html.en
.. _Ubuntu OpenStack release cycle: https://ubuntu.com/about/release-cycle#ubuntu-openstack-release-cycle
.. _Application leadership: https://juju.is/docs/implementing-leadership
.. _ubuntu: https://jaas.ai/ubuntu
.. BUGS
.. _LP #1882508: https://bugs.launchpad.net/charm-deployment-guide/+bug/1882508