diff --git a/deploy-guide/source/app-managing-power-events-juju-status.rst b/deploy-guide/source/app-managing-power-events-juju-status.rst new file mode 100644 index 0000000..7dc1c1c --- /dev/null +++ b/deploy-guide/source/app-managing-power-events-juju-status.rst @@ -0,0 +1,603 @@ +:orphan: + +.. _reference_cloud: + +Reference cloud +=============== + +.. note:: + + The information on this page is associated with the topic of :ref:`Managing + Power Events `. See that page for background + information. + +The cloud is represented in the form of ``juju status`` output. + +.. code:: + + Model Controller Cloud/Region Version SLA Timestamp + openstack foundations-maas maas_cloud 2.6.2 unsupported 16:26:29Z + + App Version Status Scale Charm Store Rev OS Notes + aodh 7.0.0 active 3 aodh jujucharms 83 ubuntu + bcache-tuning active 9 bcache-tuning jujucharms 10 ubuntu + canonical-livepatch active 22 canonical-livepatch jujucharms 32 ubuntu + ceilometer 11.0.1 blocked 3 ceilometer jujucharms 339 ubuntu + ceilometer-agent 11.0.1 active 7 ceilometer-agent jujucharms 302 ubuntu + ceph-mon 13.2.4+dfsg1 active 3 ceph-mon jujucharms 390 ubuntu + ceph-osd 13.2.4+dfsg1 active 9 ceph-osd jujucharms 411 ubuntu + ceph-radosgw 13.2.4+dfsg1 active 3 ceph-radosgw jujucharms 334 ubuntu + cinder 13.0.3 active 3 cinder jujucharms 375 ubuntu + cinder-ceph 13.0.3 active 3 cinder-ceph jujucharms 300 ubuntu + designate 7.0.0 active 3 designate jujucharms 122 ubuntu + designate-bind 9.11.3+dfsg active 2 designate-bind jujucharms 65 ubuntu + elasticsearch active 2 elasticsearch jujucharms 37 ubuntu + filebeat 5.6.16 active 74 filebeat jujucharms 24 ubuntu + glance 17.0.0 active 3 glance jujucharms 372 ubuntu + gnocchi 4.3.2 active 3 gnocchi jujucharms 60 ubuntu + grafana active 1 grafana jujucharms 29 ubuntu + graylog 2.5.1 active 1 graylog jujucharms 31 ubuntu + graylog-mongodb 3.6.3 active 1 mongodb jujucharms 52 ubuntu + hacluster-aodh active 3 hacluster jujucharms 102 ubuntu + hacluster-ceilometer active 3 hacluster jujucharms 102 ubuntu + hacluster-cinder active 3 hacluster jujucharms 102 ubuntu + hacluster-designate active 3 hacluster jujucharms 102 ubuntu + hacluster-glance active 3 hacluster jujucharms 102 ubuntu + hacluster-gnocchi active 3 hacluster jujucharms 102 ubuntu + hacluster-heat active 3 hacluster jujucharms 102 ubuntu + hacluster-horizon active 3 hacluster jujucharms 102 ubuntu + hacluster-keystone active 3 hacluster jujucharms 102 ubuntu + hacluster-mysql active 3 hacluster jujucharms 102 ubuntu + hacluster-neutron active 3 hacluster jujucharms 102 ubuntu + hacluster-nova active 3 hacluster jujucharms 102 ubuntu + hacluster-radosgw active 3 hacluster jujucharms 102 ubuntu + heat 11.0.0 active 3 heat jujucharms 326 ubuntu + keystone 14.0.1 active 3 keystone jujucharms 445 ubuntu + keystone-ldap 14.0.1 active 3 keystone-ldap jujucharms 17 ubuntu + landscape-haproxy unknown 1 haproxy jujucharms 50 ubuntu + landscape-postgresql 10.8 maintenance 2 postgresql jujucharms 199 ubuntu + landscape-rabbitmq-server 3.6.10 active 3 rabbitmq-server jujucharms 89 ubuntu + landscape-server active 3 landscape-server jujucharms 32 ubuntu + lldpd active 9 lldpd jujucharms 5 ubuntu + memcached active 2 memcached jujucharms 23 ubuntu + mysql 5.7.20-29.24 active 3 percona-cluster jujucharms 340 ubuntu + nagios active 1 nagios jujucharms 32 ubuntu + neutron-api 13.0.2 active 3 neutron-api jujucharms 401 ubuntu + neutron-gateway 13.0.2 active 2 neutron-gateway jujucharms 371 ubuntu + neutron-openvswitch 13.0.2 active 7 neutron-openvswitch jujucharms 358 ubuntu + nova-cloud-controller 18.1.0 active 3 nova-cloud-controller jujucharms 424 ubuntu + nova-compute-kvm 18.1.0 active 5 nova-compute jujucharms 448 ubuntu + nova-compute-lxd 18.1.0 active 2 nova-compute jujucharms 448 ubuntu + nrpe-container active 51 nrpe jujucharms 57 ubuntu + nrpe-host active 32 nrpe jujucharms 57 ubuntu + ntp 3.2 active 24 ntp jujucharms 32 ubuntu + openstack-dashboard 14.0.2 active 3 openstack-dashboard jujucharms 425 ubuntu + openstack-service-checks active 1 openstack-service-checks jujucharms 18 ubuntu + prometheus active 1 prometheus2 jujucharms 10 ubuntu + prometheus-ceph-exporter active 1 prometheus-ceph-exporter jujucharms 5 ubuntu + prometheus-openstack-exporter active 1 prometheus-openstack-exporter jujucharms 7 ubuntu + rabbitmq-server 3.6.10 active 3 rabbitmq-server jujucharms 344 ubuntu + telegraf active 74 telegraf jujucharms 29 ubuntu + telegraf-prometheus active 1 telegraf jujucharms 29 ubuntu + thruk-agent unknown 1 thruk-agent jujucharms 6 ubuntu + + Unit Workload Agent Machine Public address Ports Message + aodh/0* active idle 18/lxd/0 10.244.40.236 8042/tcp Unit is ready + filebeat/46 active idle 10.244.40.236 Filebeat ready + hacluster-aodh/0* active idle 10.244.40.236 Unit is ready and clustered + nrpe-container/24 active idle 10.244.40.236 icmp,5666/tcp ready + telegraf/46 active idle 10.244.40.236 9103/tcp Monitoring aodh/0 + aodh/1 active idle 20/lxd/0 10.244.41.74 8042/tcp Unit is ready + filebeat/61 active idle 10.244.41.74 Filebeat ready + hacluster-aodh/1 active idle 10.244.41.74 Unit is ready and clustered + nrpe-container/38 active idle 10.244.41.74 icmp,5666/tcp ready + telegraf/61 active idle 10.244.41.74 9103/tcp Monitoring aodh/1 + aodh/2 active idle 21/lxd/0 10.244.41.66 8042/tcp Unit is ready + filebeat/65 active idle 10.244.41.66 Filebeat ready + hacluster-aodh/2 active idle 10.244.41.66 Unit is ready and clustered + nrpe-container/42 active idle 10.244.41.66 icmp,5666/tcp ready + telegraf/65 active idle 10.244.41.66 9103/tcp Monitoring aodh/2 + ceilometer/0 blocked idle 18/lxd/1 10.244.40.239 Run the ceilometer-upgrade action on the leader to initialize ceilometer and gnocchi + filebeat/51 active idle 10.244.40.239 Filebeat ready + hacluster-ceilometer/1 active idle 10.244.40.239 Unit is ready and clustered + nrpe-container/28 active idle 10.244.40.239 icmp,5666/tcp ready + telegraf/51 active idle 10.244.40.239 9103/tcp Monitoring ceilometer/0 + ceilometer/1 blocked idle 20/lxd/1 10.244.41.77 Run the ceilometer-upgrade action on the leader to initialize ceilometer and gnocchi + filebeat/70 active idle 10.244.41.77 Filebeat ready + hacluster-ceilometer/2 active idle 10.244.41.77 Unit is ready and clustered + nrpe-container/47 active idle 10.244.41.77 icmp,5666/tcp ready + telegraf/70 active idle 10.244.41.77 9103/tcp Monitoring ceilometer/1 + ceilometer/2* blocked idle 21/lxd/1 10.244.40.229 Run the ceilometer-upgrade action on the leader to initialize ceilometer and gnocchi + filebeat/22 active idle 10.244.40.229 Filebeat ready + hacluster-ceilometer/0* active idle 10.244.40.229 Unit is ready and clustered + nrpe-container/4 active idle 10.244.40.229 icmp,5666/tcp ready + telegraf/22 active idle 10.244.40.229 9103/tcp Monitoring ceilometer/2 + ceph-mon/0* active idle 15/lxd/0 10.244.40.227 Unit is ready and clustered + filebeat/17 active idle 10.244.40.227 Filebeat ready + nrpe-container/2 active idle 10.244.40.227 icmp,5666/tcp ready + telegraf/17 active idle 10.244.40.227 9103/tcp Monitoring ceph-mon/0 + ceph-mon/1 active idle 16/lxd/0 10.244.40.253 Unit is ready and clustered + filebeat/47 active idle 10.244.40.253 Filebeat ready + nrpe-container/25 active idle 10.244.40.253 icmp,5666/tcp ready + telegraf/47 active idle 10.244.40.253 9103/tcp Monitoring ceph-mon/1 + ceph-mon/2 active idle 17/lxd/0 10.244.41.78 Unit is ready and clustered + filebeat/71 active idle 10.244.41.78 Filebeat ready + nrpe-container/48 active idle 10.244.41.78 icmp,5666/tcp ready + telegraf/71 active idle 10.244.41.78 9103/tcp Monitoring ceph-mon/2 + ceph-osd/0* active idle 15 10.244.40.206 Unit is ready (1 OSD) + bcache-tuning/1 active idle 10.244.40.206 bcache devices tuned + nrpe-host/16 active idle 10.244.40.206 icmp,5666/tcp ready + ceph-osd/1 active idle 16 10.244.40.213 Unit is ready (1 OSD) + bcache-tuning/8 active idle 10.244.40.213 bcache devices tuned + nrpe-host/30 active idle 10.244.40.213 icmp,5666/tcp ready + ceph-osd/2 active idle 17 10.244.40.220 Unit is ready (1 OSD) + bcache-tuning/4 active idle 10.244.40.220 bcache devices tuned + nrpe-host/23 active idle 10.244.40.220 ready + ceph-osd/3 active idle 18 10.244.40.225 Unit is ready (1 OSD) + bcache-tuning/5 active idle 10.244.40.225 bcache devices tuned + nrpe-host/25 active idle 10.244.40.225 icmp,5666/tcp ready + ceph-osd/4 active idle 19 10.244.40.221 Unit is ready (1 OSD) + bcache-tuning/2 active idle 10.244.40.221 bcache devices tuned + nrpe-host/18 active idle 10.244.40.221 icmp,5666/tcp ready + ceph-osd/5 active idle 20 10.244.40.224 Unit is ready (1 OSD) + bcache-tuning/6 active idle 10.244.40.224 bcache devices tuned + nrpe-host/27 active idle 10.244.40.224 icmp,5666/tcp ready + ceph-osd/6 active idle 21 10.244.40.222 Unit is ready (1 OSD) + bcache-tuning/7 active idle 10.244.40.222 bcache devices tuned + nrpe-host/29 active idle 10.244.40.222 ready + ceph-osd/7 active idle 22 10.244.40.223 Unit is ready (1 OSD) + bcache-tuning/3 active idle 10.244.40.223 bcache devices tuned + nrpe-host/20 active idle 10.244.40.223 icmp,5666/tcp ready + ceph-osd/8 active idle 23 10.244.40.219 Unit is ready (1 OSD) + bcache-tuning/0* active idle 10.244.40.219 bcache devices tuned + nrpe-host/14 active idle 10.244.40.219 ready + ceph-radosgw/0* active idle 15/lxd/1 10.244.40.228 80/tcp Unit is ready + filebeat/15 active idle 10.244.40.228 Filebeat ready + hacluster-radosgw/0* active idle 10.244.40.228 Unit is ready and clustered + nrpe-container/1 active idle 10.244.40.228 icmp,5666/tcp ready + telegraf/15 active idle 10.244.40.228 9103/tcp Monitoring ceph-radosgw/0 + ceph-radosgw/1 active idle 16/lxd/1 10.244.40.241 80/tcp Unit is ready + filebeat/35 active idle 10.244.40.241 Filebeat ready + hacluster-radosgw/2 active idle 10.244.40.241 Unit is ready and clustered + nrpe-container/15 active idle 10.244.40.241 icmp,5666/tcp ready + telegraf/35 active idle 10.244.40.241 9103/tcp Monitoring ceph-radosgw/1 + ceph-radosgw/2 active idle 17/lxd/1 10.244.40.233 80/tcp Unit is ready + filebeat/21 active idle 10.244.40.233 Filebeat ready + hacluster-radosgw/1 active idle 10.244.40.233 Unit is ready and clustered + nrpe-container/3 active idle 10.244.40.233 icmp,5666/tcp ready + telegraf/21 active idle 10.244.40.233 9103/tcp Monitoring ceph-radosgw/2 + cinder/0* active idle 15/lxd/2 10.244.40.249 8776/tcp Unit is ready + cinder-ceph/0* active idle 10.244.40.249 Unit is ready + filebeat/29 active idle 10.244.40.249 Filebeat ready + hacluster-cinder/0* active idle 10.244.40.249 Unit is ready and clustered + nrpe-container/9 active idle 10.244.40.249 icmp,5666/tcp ready + telegraf/29 active idle 10.244.40.249 9103/tcp Monitoring cinder/0 + cinder/1 active idle 16/lxd/2 10.244.40.248 8776/tcp Unit is ready + cinder-ceph/2 active idle 10.244.40.248 Unit is ready + filebeat/59 active idle 10.244.40.248 Filebeat ready + hacluster-cinder/2 active idle 10.244.40.248 Unit is ready and clustered + nrpe-container/36 active idle 10.244.40.248 icmp,5666/tcp ready + telegraf/59 active idle 10.244.40.248 9103/tcp Monitoring cinder/1 + cinder/2 active idle 17/lxd/2 10.244.41.2 8776/tcp Unit is ready + cinder-ceph/1 active idle 10.244.41.2 Unit is ready + filebeat/42 active idle 10.244.41.2 Filebeat ready + hacluster-cinder/1 active idle 10.244.41.2 Unit is ready and clustered + nrpe-container/21 active idle 10.244.41.2 icmp,5666/tcp ready + telegraf/42 active idle 10.244.41.2 9103/tcp Monitoring cinder/2 + designate-bind/0* active idle 16/lxd/3 10.244.40.250 Unit is ready + filebeat/45 active idle 10.244.40.250 Filebeat ready + nrpe-container/23 active idle 10.244.40.250 icmp,5666/tcp ready + telegraf/45 active idle 10.244.40.250 9103/tcp Monitoring designate-bind/0 + designate-bind/1 active idle 17/lxd/3 10.244.40.255 Unit is ready + filebeat/40 active idle 10.244.40.255 Filebeat ready + nrpe-container/20 active idle 10.244.40.255 icmp,5666/tcp ready + telegraf/40 active idle 10.244.40.255 9103/tcp Monitoring designate-bind/1 + designate/0* active idle 18/lxd/2 10.244.41.70 9001/tcp Unit is ready + filebeat/57 active idle 10.244.41.70 Filebeat ready + hacluster-designate/0* active idle 10.244.41.70 Unit is ready and clustered + nrpe-container/34 active idle 10.244.41.70 icmp,5666/tcp ready + telegraf/57 active idle 10.244.41.70 9103/tcp Monitoring designate/0 + designate/1 active idle 20/lxd/2 10.244.41.72 9001/tcp Unit is ready + filebeat/63 active idle 10.244.41.72 Filebeat ready + hacluster-designate/1 active idle 10.244.41.72 Unit is ready and clustered + nrpe-container/40 active idle 10.244.41.72 icmp,5666/tcp ready + telegraf/63 active idle 10.244.41.72 9103/tcp Monitoring designate/1 + designate/2 active idle 21/lxd/2 10.244.41.71 9001/tcp Unit is ready + filebeat/69 active idle 10.244.41.71 Filebeat ready + hacluster-designate/2 active idle 10.244.41.71 Unit is ready and clustered + nrpe-container/46 active idle 10.244.41.71 icmp,5666/tcp ready + telegraf/69 active idle 10.244.41.71 9103/tcp Monitoring designate/2 + elasticsearch/0 active idle 5 10.244.40.217 9200/tcp Unit is ready + canonical-livepatch/3 active idle 10.244.40.217 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/4 active idle 10.244.40.217 Filebeat ready + nrpe-host/3 active idle 10.244.40.217 icmp,5666/tcp ready + ntp/4 active idle 10.244.40.217 123/udp chrony: Ready + telegraf/4 active idle 10.244.40.217 9103/tcp Monitoring elasticsearch/0 + elasticsearch/1* active idle 13 10.244.40.209 9200/tcp Unit is ready + canonical-livepatch/2 active idle 10.244.40.209 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/3 active idle 10.244.40.209 Filebeat ready + nrpe-host/2 active idle 10.244.40.209 icmp,5666/tcp ready + ntp/3 active idle 10.244.40.209 123/udp chrony: Ready + telegraf/3 active idle 10.244.40.209 9103/tcp Monitoring elasticsearch/1 + glance/0 active idle 15/lxd/3 10.244.40.237 9292/tcp Unit is ready + filebeat/36 active idle 10.244.40.237 Filebeat ready + hacluster-glance/0* active idle 10.244.40.237 Unit is ready and clustered + nrpe-container/16 active idle 10.244.40.237 icmp,5666/tcp ready + telegraf/36 active idle 10.244.40.237 9103/tcp Monitoring glance/0 + glance/1 active idle 16/lxd/4 10.244.41.5 9292/tcp Unit is ready + filebeat/67 active idle 10.244.41.5 Filebeat ready + hacluster-glance/2 active idle 10.244.41.5 Unit is ready and clustered + nrpe-container/44 active idle 10.244.41.5 icmp,5666/tcp ready + telegraf/66 active idle 10.244.41.5 9103/tcp Monitoring glance/1 + glance/2* active idle 17/lxd/4 10.244.40.234 9292/tcp Unit is ready + filebeat/37 active idle 10.244.40.234 Filebeat ready + hacluster-glance/1 active idle 10.244.40.234 Unit is ready and clustered + nrpe-container/17 active idle 10.244.40.234 icmp,5666/tcp ready + telegraf/37 active idle 10.244.40.234 9103/tcp Monitoring glance/2 + gnocchi/0 active idle 18/lxd/3 10.244.40.231 8041/tcp Unit is ready + filebeat/24 active idle 10.244.40.231 Filebeat ready + hacluster-gnocchi/0* active idle 10.244.40.231 Unit is ready and clustered + nrpe-container/5 active idle 10.244.40.231 icmp,5666/tcp ready + telegraf/24 active idle 10.244.40.231 9103/tcp Monitoring gnocchi/0 + gnocchi/1 active idle 20/lxd/3 10.244.40.244 8041/tcp Unit is ready + filebeat/55 active idle 10.244.40.244 Filebeat ready + hacluster-gnocchi/2 active idle 10.244.40.244 Unit is ready and clustered + nrpe-container/32 active idle 10.244.40.244 icmp,5666/tcp ready + telegraf/55 active idle 10.244.40.244 9103/tcp Monitoring gnocchi/1 + gnocchi/2* active idle 21/lxd/3 10.244.40.230 8041/tcp Unit is ready + filebeat/27 active idle 10.244.40.230 Filebeat ready + hacluster-gnocchi/1 active idle 10.244.40.230 Unit is ready and clustered + nrpe-container/7 active idle 10.244.40.230 icmp,5666/tcp ready + telegraf/27 active idle 10.244.40.230 9103/tcp Monitoring gnocchi/2 + grafana/0* active idle 1 10.244.40.202 3000/tcp Started snap.grafana.grafana + canonical-livepatch/1 active idle 10.244.40.202 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/2 active idle 10.244.40.202 Filebeat ready + nrpe-host/1 active idle 10.244.40.202 icmp,5666/tcp ready + ntp/2 active idle 10.244.40.202 123/udp chrony: Ready + telegraf/2 active idle 10.244.40.202 9103/tcp Monitoring grafana/0 + graylog-mongodb/0* active idle 10/lxd/0 10.244.40.226 27017/tcp,27019/tcp,27021/tcp,28017/tcp Unit is ready + filebeat/14 active idle 10.244.40.226 Filebeat ready + nrpe-container/0* active idle 10.244.40.226 icmp,5666/tcp ready + telegraf/14 active idle 10.244.40.226 9103/tcp Monitoring graylog-mongodb/0 + graylog/0* active idle 10 10.244.40.218 5044/tcp Ready with: filebeat, elasticsearch, mongodb + canonical-livepatch/12 active idle 10.244.40.218 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + nrpe-host/12 active idle 10.244.40.218 icmp,5666/tcp ready + ntp/13 active idle 10.244.40.218 123/udp chrony: Ready + telegraf/13 active idle 10.244.40.218 9103/tcp Monitoring graylog/0 + heat/0 active idle 15/lxd/4 10.244.40.246 8000/tcp,8004/tcp Unit is ready + filebeat/34 active idle 10.244.40.246 Filebeat ready + hacluster-heat/0* active idle 10.244.40.246 Unit is ready and clustered + nrpe-container/14 active idle 10.244.40.246 icmp,5666/tcp ready + telegraf/34 active idle 10.244.40.246 9103/tcp Monitoring heat/0 + heat/1* active idle 16/lxd/5 10.244.40.238 8000/tcp,8004/tcp Unit is ready + filebeat/56 active idle 10.244.40.238 Filebeat ready. + hacluster-heat/2 active idle 10.244.40.238 Unit is ready and clustered + nrpe-container/33 active idle 10.244.40.238 icmp,5666/tcp ready + telegraf/56 active idle 10.244.40.238 9103/tcp Monitoring heat/1 + heat/2 active idle 17/lxd/5 10.244.41.0 8000/tcp,8004/tcp Unit is ready + filebeat/43 active idle 10.244.41.0 Filebeat ready. + hacluster-heat/1 active idle 10.244.41.0 Unit is ready and clustered + nrpe-container/22 active idle 10.244.41.0 icmp,5666/tcp ready + telegraf/43 active idle 10.244.41.0 9103/tcp Monitoring heat/2 + keystone/0* active idle 15/lxd/5 10.244.40.243 5000/tcp Unit is ready + filebeat/33 active idle 10.244.40.243 Filebeat ready + hacluster-keystone/0* active idle 10.244.40.243 Unit is ready and clustered + keystone-ldap/0* active idle 10.244.40.243 Unit is ready + nrpe-container/13 active idle 10.244.40.243 icmp,5666/tcp ready + telegraf/33 active idle 10.244.40.243 9103/tcp Monitoring keystone/0 + keystone/1 active idle 16/lxd/6 10.244.40.254 5000/tcp Unit is ready + filebeat/60 active idle 10.244.40.254 Filebeat ready + hacluster-keystone/2 active idle 10.244.40.254 Unit is ready and clustered + keystone-ldap/2 active idle 10.244.40.254 Unit is ready + nrpe-container/37 active idle 10.244.40.254 icmp,5666/tcp ready + telegraf/60 active idle 10.244.40.254 9103/tcp Monitoring keystone/1 + keystone/2 active idle 17/lxd/6 10.244.41.3 5000/tcp Unit is ready + filebeat/48 active idle 10.244.41.3 Filebeat ready + hacluster-keystone/1 active idle 10.244.41.3 Unit is ready and clustered + keystone-ldap/1 active idle 10.244.41.3 Unit is ready + nrpe-container/26 active idle 10.244.41.3 icmp,5666/tcp ready + telegraf/48 active idle 10.244.41.3 9103/tcp Monitoring keystone/2 + landscape-haproxy/0* unknown idle 2 10.244.40.203 80/tcp,443/tcp + filebeat/1 active idle 10.244.40.203 Filebeat ready + nrpe-host/0* active idle 10.244.40.203 icmp,5666/tcp ready + ntp/1 active idle 10.244.40.203 123/udp chrony: Ready + telegraf/1 active idle 10.244.40.203 9103/tcp Monitoring landscape-haproxy/0 + landscape-postgresql/0* maintenance idle 3 10.244.40.215 5432/tcp Installing postgresql-.*-debversion,postgresql-plpython-.* + canonical-livepatch/9 active idle 10.244.40.215 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/10 active idle 10.244.40.215 Filebeat ready + nrpe-host/9 active idle 10.244.40.215 icmp,5666/tcp ready + ntp/10 active idle 10.244.40.215 123/udp chrony: Ready + telegraf/10 active idle 10.244.40.215 9103/tcp Monitoring landscape-postgresql/0 + landscape-postgresql/1 active idle 8 10.244.40.214 5432/tcp Live secondary (10.8) + canonical-livepatch/10 active idle 10.244.40.214 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/11 active idle 10.244.40.214 Filebeat ready + nrpe-host/10 active idle 10.244.40.214 icmp,5666/tcp ready + ntp/11 active idle 10.244.40.214 123/udp chrony: Ready + telegraf/11 active idle 10.244.40.214 9103/tcp Monitoring landscape-postgresql/1 + landscape-rabbitmq-server/0* active idle 4 10.244.40.211 5672/tcp Unit is ready and clustered + canonical-livepatch/8 active idle 10.244.40.211 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/9 active idle 10.244.40.211 Filebeat ready + nrpe-host/8 active idle 10.244.40.211 icmp,5666/tcp ready + ntp/9 active idle 10.244.40.211 123/udp chrony: Ready + telegraf/9 active idle 10.244.40.211 9103/tcp Monitoring landscape-rabbitmq-server/0 + landscape-rabbitmq-server/1 active idle 7 10.244.40.208 5672/tcp Unit is ready and clustered + canonical-livepatch/11 active idle 10.244.40.208 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/12 active idle 10.244.40.208 Filebeat ready + nrpe-host/11 active idle 10.244.40.208 icmp,5666/tcp ready + ntp/12 active idle 10.244.40.208 123/udp chrony: Ready + telegraf/12 active idle 10.244.40.208 9103/tcp Monitoring landscape-rabbitmq-server/1 + landscape-rabbitmq-server/2 active idle 12 10.244.40.207 5672/tcp Unit is ready and clustered + canonical-livepatch/7 active idle 10.244.40.207 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/8 active idle 10.244.40.207 Filebeat ready + nrpe-host/7 active idle 10.244.40.207 icmp,5666/tcp ready + ntp/8 active idle 10.244.40.207 123/udp chrony: Ready + telegraf/8 active idle 10.244.40.207 9103/tcp Monitoring landscape-rabbitmq-server/2 + landscape-server/0* active idle 6 10.244.40.210 + canonical-livepatch/4 active idle 10.244.40.210 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/5 active idle 10.244.40.210 Filebeat ready + nrpe-host/4 active idle 10.244.40.210 icmp,5666/tcp ready + ntp/5 active idle 10.244.40.210 123/udp chrony: Ready + telegraf/5 active idle 10.244.40.210 9103/tcp Monitoring landscape-server/0 + landscape-server/1 active idle 11 10.244.40.212 + canonical-livepatch/5 active idle 10.244.40.212 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/6 active idle 10.244.40.212 Filebeat ready + nrpe-host/5 active idle 10.244.40.212 icmp,5666/tcp ready + ntp/6 active idle 10.244.40.212 123/udp chrony: Ready + telegraf/6 active idle 10.244.40.212 9103/tcp Monitoring landscape-server/1 + landscape-server/2 active idle 14 10.244.40.204 + canonical-livepatch/6 active idle 10.244.40.204 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/7 active idle 10.244.40.204 Filebeat ready + nrpe-host/6 active idle 10.244.40.204 icmp,5666/tcp ready + ntp/7 active idle 10.244.40.204 123/udp chrony: Ready + telegraf/7 active idle 10.244.40.204 9103/tcp Monitoring landscape-server/2 + memcached/0* active idle 16/lxd/3 10.244.40.250 11211/tcp Unit is ready and clustered + memcached/1 active idle 17/lxd/3 10.244.40.255 11211/tcp Unit is ready and clustered + mysql/0* active idle 15/lxd/6 10.244.40.251 3306/tcp Unit is ready + filebeat/28 active idle 10.244.40.251 Filebeat ready + hacluster-mysql/1 active idle 10.244.40.251 Unit is ready and clustered + nrpe-container/8 active idle 10.244.40.251 icmp,5666/tcp ready + telegraf/28 active idle 10.244.40.251 9103/tcp Monitoring mysql/0 + mysql/1 active idle 16/lxd/7 10.244.40.252 3306/tcp Unit is ready + filebeat/25 active idle 10.244.40.252 Filebeat ready + hacluster-mysql/0* active idle 10.244.40.252 Unit is ready and clustered + nrpe-container/6 active idle 10.244.40.252 icmp,5666/tcp ready + telegraf/25 active idle 10.244.40.252 9103/tcp Monitoring mysql/1 + mysql/2 active idle 17/lxd/7 10.244.41.68 3306/tcp Unit is ready + filebeat/50 active idle 10.244.41.68 Filebeat ready + hacluster-mysql/2 active idle 10.244.41.68 Unit is ready and clustered + nrpe-container/27 active idle 10.244.41.68 icmp,5666/tcp ready + telegraf/50 active idle 10.244.41.68 9103/tcp Monitoring mysql/2 + nagios/0* active idle 0 10.244.40.201 80/tcp ready + canonical-livepatch/0* active idle 10.244.40.201 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/0* active idle 10.244.40.201 Filebeat ready + ntp/0* active idle 10.244.40.201 123/udp chrony: Ready + telegraf/0* active idle 10.244.40.201 9103/tcp Monitoring nagios/0 + thruk-agent/0* unknown idle 10.244.40.201 + neutron-api/0 active idle 18/lxd/4 10.244.41.67 9696/tcp Unit is ready + filebeat/53 active idle 10.244.41.67 Filebeat ready + hacluster-neutron/0* active idle 10.244.41.67 Unit is ready and clustered + nrpe-container/30 active idle 10.244.41.67 icmp,5666/tcp ready + telegraf/53 active idle 10.244.41.67 9103/tcp Monitoring neutron-api/0 + neutron-api/1 active idle 20/lxd/4 10.244.41.73 9696/tcp Unit is ready + filebeat/58 active idle 10.244.41.73 Filebeat ready + hacluster-neutron/1 active idle 10.244.41.73 Unit is ready and clustered + nrpe-container/35 active idle 10.244.41.73 icmp,5666/tcp ready + telegraf/58 active idle 10.244.41.73 9103/tcp Monitoring neutron-api/1 + neutron-api/2* active idle 21/lxd/4 10.244.41.6 9696/tcp Unit is ready + filebeat/64 active idle 10.244.41.6 Filebeat ready + hacluster-neutron/2 active idle 10.244.41.6 Unit is ready and clustered + nrpe-container/41 active idle 10.244.41.6 icmp,5666/tcp ready + telegraf/64 active idle 10.244.41.6 9103/tcp Monitoring neutron-api/2 + neutron-gateway/0 active idle 20 10.244.40.224 Unit is ready + canonical-livepatch/21 active idle 10.244.40.224 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/49 active idle 10.244.40.224 Filebeat ready + lldpd/8 active idle 10.244.40.224 LLDP daemon running + nrpe-host/31 active idle 10.244.40.224 ready + ntp/23 active idle 10.244.40.224 123/udp chrony: Ready + telegraf/49 active idle 10.244.40.224 9103/tcp Monitoring neutron-gateway/0 + neutron-gateway/1* active idle 21 10.244.40.222 Unit is ready + canonical-livepatch/20 active idle 10.244.40.222 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + filebeat/44 active idle 10.244.40.222 Filebeat ready + lldpd/7 active idle 10.244.40.222 LLDP daemon running + nrpe-host/28 active idle 10.244.40.222 icmp,5666/tcp ready + ntp/22 active idle 10.244.40.222 123/udp chrony: Ready + telegraf/44 active idle 10.244.40.222 9103/tcp Monitoring neutron-gateway/1 + nova-cloud-controller/0 active idle 18/lxd/5 10.244.40.242 8774/tcp,8775/tcp,8778/tcp Unit is ready + filebeat/54 active idle 10.244.40.242 Filebeat ready + hacluster-nova/1 active idle 10.244.40.242 Unit is ready and clustered + nrpe-container/31 active idle 10.244.40.242 icmp,5666/tcp ready + telegraf/54 active idle 10.244.40.242 9103/tcp Monitoring nova-cloud-controller/0 + nova-cloud-controller/1 active idle 20/lxd/5 10.244.41.76 8774/tcp,8775/tcp,8778/tcp Unit is ready + filebeat/68 active idle 10.244.41.76 Filebeat ready + hacluster-nova/2 active idle 10.244.41.76 Unit is ready and clustered + nrpe-container/45 active idle 10.244.41.76 icmp,5666/tcp ready + telegraf/68 active idle 10.244.41.76 9103/tcp Monitoring nova-cloud-controller/1 + nova-cloud-controller/2* active idle 21/lxd/5 10.244.40.235 8774/tcp,8775/tcp,8778/tcp Unit is ready + filebeat/52 active idle 10.244.40.235 Filebeat ready + hacluster-nova/0* active idle 10.244.40.235 Unit is ready and clustered + nrpe-container/29 active idle 10.244.40.235 icmp,5666/tcp ready + telegraf/52 active idle 10.244.40.235 9103/tcp Monitoring nova-cloud-controller/2 + nova-compute-kvm/0* active idle 15 10.244.40.206 Unit is ready + canonical-livepatch/17 active idle 10.244.40.206 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/4 active idle 10.244.40.206 Unit is ready + filebeat/23 active idle 10.244.40.206 Filebeat ready + lldpd/4 active idle 10.244.40.206 LLDP daemon running + neutron-openvswitch/4 active idle 10.244.40.206 Unit is ready + nrpe-host/22 active idle 10.244.40.206 ready + ntp/19 active idle 10.244.40.206 123/udp chrony: Ready + telegraf/23 active idle 10.244.40.206 9103/tcp Monitoring nova-compute-kvm/0 + nova-compute-kvm/1 active idle 16 10.244.40.213 Unit is ready + canonical-livepatch/14 active idle 10.244.40.213 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/1 active idle 10.244.40.213 Unit is ready + filebeat/18 active idle 10.244.40.213 Filebeat ready + lldpd/1 active idle 10.244.40.213 LLDP daemon running + neutron-openvswitch/1 active idle 10.244.40.213 Unit is ready + nrpe-host/17 active idle 10.244.40.213 ready + ntp/16 active idle 10.244.40.213 123/udp chrony: Ready + telegraf/18 active idle 10.244.40.213 9103/tcp Monitoring nova-compute-kvm/1 + nova-compute-kvm/2 active idle 17 10.244.40.220 Unit is ready + canonical-livepatch/18 active idle 10.244.40.220 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/5 active idle 10.244.40.220 Unit is ready + filebeat/26 active idle 10.244.40.220 Filebeat ready + lldpd/5 active idle 10.244.40.220 LLDP daemon running + neutron-openvswitch/5 active idle 10.244.40.220 Unit is ready + nrpe-host/24 active idle 10.244.40.220 icmp,5666/tcp ready + ntp/20 active idle 10.244.40.220 123/udp chrony: Ready + telegraf/26 active idle 10.244.40.220 9103/tcp Monitoring nova-compute-kvm/2 + nova-compute-kvm/3 active idle 18 10.244.40.225 Unit is ready + canonical-livepatch/19 active idle 10.244.40.225 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/6 active idle 10.244.40.225 Unit is ready + filebeat/41 active idle 10.244.40.225 Filebeat ready + lldpd/6 active idle 10.244.40.225 LLDP daemon running + neutron-openvswitch/6 active idle 10.244.40.225 Unit is ready + nrpe-host/26 active idle 10.244.40.225 ready + ntp/21 active idle 10.244.40.225 123/udp chrony: Ready + telegraf/41 active idle 10.244.40.225 9103/tcp Monitoring nova-compute-kvm/3 + nova-compute-kvm/4 active idle 19 10.244.40.221 Unit is ready + canonical-livepatch/15 active idle 10.244.40.221 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/2 active idle 10.244.40.221 Unit is ready + filebeat/19 active idle 10.244.40.221 Filebeat ready + lldpd/2 active idle 10.244.40.221 LLDP daemon running + neutron-openvswitch/2 active idle 10.244.40.221 Unit is ready + nrpe-host/19 active idle 10.244.40.221 ready + ntp/17 active idle 10.244.40.221 123/udp chrony: Ready + telegraf/19 active idle 10.244.40.221 9103/tcp Monitoring nova-compute-kvm/4 + nova-compute-lxd/0 active idle 22 10.244.40.223 Unit is ready + canonical-livepatch/16 active idle 10.244.40.223 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/3 active idle 10.244.40.223 Unit is ready + filebeat/20 active idle 10.244.40.223 Filebeat ready + lldpd/3 active idle 10.244.40.223 LLDP daemon running + neutron-openvswitch/3 active idle 10.244.40.223 Unit is ready + nrpe-host/21 active idle 10.244.40.223 ready + ntp/18 active idle 10.244.40.223 123/udp chrony: Ready + telegraf/20 active idle 10.244.40.223 9103/tcp Monitoring nova-compute-lxd/0 + nova-compute-lxd/1* active idle 23 10.244.40.219 Unit is ready + canonical-livepatch/13 active idle 10.244.40.219 Running kernel 4.15.0-50.54-generic, patchState: nothing-to-apply + ceilometer-agent/0* active idle 10.244.40.219 Unit is ready + filebeat/16 active idle 10.244.40.219 Filebeat ready + lldpd/0* active idle 10.244.40.219 LLDP daemon running + neutron-openvswitch/0* active idle 10.244.40.219 Unit is ready + nrpe-host/15 active idle 10.244.40.219 icmp,5666/tcp ready + ntp/15 active idle 10.244.40.219 123/udp chrony: Ready + telegraf/16 active idle 10.244.40.219 9103/tcp Monitoring nova-compute-lxd/1 + openstack-dashboard/0* active idle 18/lxd/6 10.244.40.232 80/tcp,443/tcp Unit is ready + filebeat/30 active idle 10.244.40.232 Filebeat ready + hacluster-horizon/0* active idle 10.244.40.232 Unit is ready and clustered + nrpe-container/10 active idle 10.244.40.232 icmp,5666/tcp ready + telegraf/30 active idle 10.244.40.232 9103/tcp Monitoring openstack-dashboard/0 + openstack-dashboard/1 active idle 20/lxd/6 10.244.41.75 80/tcp,443/tcp Unit is ready + filebeat/73 active idle 10.244.41.75 Filebeat ready + hacluster-horizon/2 active idle 10.244.41.75 Unit is ready and clustered + nrpe-container/50 active idle 10.244.41.75 icmp,5666/tcp ready + telegraf/73 active idle 10.244.41.75 9103/tcp Monitoring openstack-dashboard/1 + openstack-dashboard/2 active idle 21/lxd/6 10.244.41.69 80/tcp,443/tcp Unit is ready + filebeat/72 active idle 10.244.41.69 Filebeat ready + hacluster-horizon/1 active idle 10.244.41.69 Unit is ready and clustered + nrpe-container/49 active idle 10.244.41.69 icmp,5666/tcp ready + telegraf/72 active idle 10.244.41.69 9103/tcp Monitoring openstack-dashboard/2 + openstack-service-checks/0* active idle 15/lxd/7 10.244.40.240 Unit is ready + filebeat/31 active idle 10.244.40.240 Filebeat ready + nrpe-container/11 active idle 10.244.40.240 icmp,5666/tcp ready + telegraf/31 active idle 10.244.40.240 9103/tcp Monitoring openstack-service-checks/0 + prometheus-ceph-exporter/0* active idle 16/lxd/8 10.244.40.245 9128/tcp Running + filebeat/38 active idle 10.244.40.245 Filebeat ready + nrpe-container/18 active idle 10.244.40.245 icmp,5666/tcp ready + telegraf/38 active idle 10.244.40.245 9103/tcp Monitoring prometheus-ceph-exporter/0 + prometheus-openstack-exporter/0* active idle 17/lxd/8 10.244.41.1 Ready + filebeat/39 active idle 10.244.41.1 Filebeat ready + nrpe-container/19 active idle 10.244.41.1 icmp,5666/tcp ready + telegraf/39 active idle 10.244.41.1 9103/tcp Monitoring prometheus-openstack-exporter/0 + prometheus/0* active idle 9 10.244.40.216 9090/tcp,12321/tcp Ready + filebeat/13 active idle 10.244.40.216 Filebeat ready + nrpe-host/13 active idle 10.244.40.216 icmp,5666/tcp ready + ntp/14 active idle 10.244.40.216 123/udp chrony: Ready + telegraf-prometheus/0* active idle 10.244.40.216 9103/tcp Monitoring prometheus/0 + rabbitmq-server/0 active idle 18/lxd/7 10.244.41.65 5672/tcp Unit is ready and clustered + filebeat/62 active idle 10.244.41.65 Filebeat ready + nrpe-container/39 active idle 10.244.41.65 icmp,5666/tcp ready + telegraf/62 active idle 10.244.41.65 9103/tcp Monitoring rabbitmq-server/0 + rabbitmq-server/1* active idle 20/lxd/7 10.244.40.247 5672/tcp Unit is ready and clustered + filebeat/32 active idle 10.244.40.247 Filebeat ready + nrpe-container/12 active idle 10.244.40.247 icmp,5666/tcp ready + telegraf/32 active idle 10.244.40.247 9103/tcp Monitoring rabbitmq-server/1 + rabbitmq-server/2 active idle 21/lxd/7 10.244.41.4 5672/tcp Unit is ready and clustered + filebeat/66 active idle 10.244.41.4 Filebeat ready + nrpe-container/43 active idle 10.244.41.4 icmp,5666/tcp ready + telegraf/67 active idle 10.244.41.4 9103/tcp Monitoring rabbitmq-server/2 + + Machine State DNS Inst id Series AZ Message + 0 started 10.244.40.201 nagios-1 bionic default Deployed + 1 started 10.244.40.202 grafana-1 bionic default Deployed + 2 started 10.244.40.203 landscapeha-1 bionic default Deployed + 3 started 10.244.40.215 landscapesql-1 bionic default Deployed + 4 started 10.244.40.211 landscapeamqp-1 bionic default Deployed + 5 started 10.244.40.217 elastic-3 bionic zone3 Deployed + 6 started 10.244.40.210 landscape-2 bionic zone2 Deployed + 7 started 10.244.40.208 landscapeamqp-3 bionic zone3 Deployed + 8 started 10.244.40.214 landscapesql-2 bionic zone2 Deployed + 9 started 10.244.40.216 prometheus-3 bionic zone3 Deployed + 10 started 10.244.40.218 graylog-3 bionic zone3 Deployed + 10/lxd/0 started 10.244.40.226 juju-5aed61-10-lxd-0 bionic zone3 Container started + 11 started 10.244.40.212 landscape-3 bionic zone3 Deployed + 12 started 10.244.40.207 landscapeamqp-2 bionic zone2 Deployed + 13 started 10.244.40.209 elastic-2 bionic zone2 Deployed + 14 started 10.244.40.204 landscape-1 bionic default Deployed + 15 started 10.244.40.206 suicune bionic zone2 Deployed + 15/lxd/0 started 10.244.40.227 juju-5aed61-15-lxd-0 bionic zone2 Container started + 15/lxd/1 started 10.244.40.228 juju-5aed61-15-lxd-1 bionic zone2 Container started + 15/lxd/2 started 10.244.40.249 juju-5aed61-15-lxd-2 bionic zone2 Container started + 15/lxd/3 started 10.244.40.237 juju-5aed61-15-lxd-3 bionic zone2 Container started + 15/lxd/4 started 10.244.40.246 juju-5aed61-15-lxd-4 bionic zone2 Container started + 15/lxd/5 started 10.244.40.243 juju-5aed61-15-lxd-5 bionic zone2 Container started + 15/lxd/6 started 10.244.40.251 juju-5aed61-15-lxd-6 bionic zone2 Container started + 15/lxd/7 started 10.244.40.240 juju-5aed61-15-lxd-7 bionic zone2 Container started + 16 started 10.244.40.213 geodude bionic default Deployed + 16/lxd/0 started 10.244.40.253 juju-5aed61-16-lxd-0 bionic default Container started + 16/lxd/1 started 10.244.40.241 juju-5aed61-16-lxd-1 bionic default Container started + 16/lxd/2 started 10.244.40.248 juju-5aed61-16-lxd-2 bionic default Container started + 16/lxd/3 started 10.244.40.250 juju-5aed61-16-lxd-3 bionic default Container started + 16/lxd/4 started 10.244.41.5 juju-5aed61-16-lxd-4 bionic default Container started + 16/lxd/5 started 10.244.40.238 juju-5aed61-16-lxd-5 bionic default Container started + 16/lxd/6 started 10.244.40.254 juju-5aed61-16-lxd-6 bionic default Container started + 16/lxd/7 started 10.244.40.252 juju-5aed61-16-lxd-7 bionic default Container started + 16/lxd/8 started 10.244.40.245 juju-5aed61-16-lxd-8 bionic default Container started + 17 started 10.244.40.220 armaldo bionic default Deployed + 17/lxd/0 started 10.244.41.78 juju-5aed61-17-lxd-0 bionic default Container started + 17/lxd/1 started 10.244.40.233 juju-5aed61-17-lxd-1 bionic default Container started + 17/lxd/2 started 10.244.41.2 juju-5aed61-17-lxd-2 bionic default Container started + 17/lxd/3 started 10.244.40.255 juju-5aed61-17-lxd-3 bionic default Container started + 17/lxd/4 started 10.244.40.234 juju-5aed61-17-lxd-4 bionic default Container started + 17/lxd/5 started 10.244.41.0 juju-5aed61-17-lxd-5 bionic default Container started + 17/lxd/6 started 10.244.41.3 juju-5aed61-17-lxd-6 bionic default Container started + 17/lxd/7 started 10.244.41.68 juju-5aed61-17-lxd-7 bionic default Container started + 17/lxd/8 started 10.244.41.1 juju-5aed61-17-lxd-8 bionic default Container started + 18 started 10.244.40.225 elgyem bionic zone3 Deployed + 18/lxd/0 started 10.244.40.236 juju-5aed61-18-lxd-0 bionic zone3 Container started + 18/lxd/1 started 10.244.40.239 juju-5aed61-18-lxd-1 bionic zone3 Container started + 18/lxd/2 started 10.244.41.70 juju-5aed61-18-lxd-2 bionic zone3 Container started + 18/lxd/3 started 10.244.40.231 juju-5aed61-18-lxd-3 bionic zone3 Container started + 18/lxd/4 started 10.244.41.67 juju-5aed61-18-lxd-4 bionic zone3 Container started + 18/lxd/5 started 10.244.40.242 juju-5aed61-18-lxd-5 bionic zone3 Container started + 18/lxd/6 started 10.244.40.232 juju-5aed61-18-lxd-6 bionic zone3 Container started + 18/lxd/7 started 10.244.41.65 juju-5aed61-18-lxd-7 bionic zone3 Container started + 19 started 10.244.40.221 spearow bionic zone2 Deployed + 20 started 10.244.40.224 quilava bionic default Deployed + 20/lxd/0 started 10.244.41.74 juju-5aed61-20-lxd-0 bionic default Container started + 20/lxd/1 started 10.244.41.77 juju-5aed61-20-lxd-1 bionic default Container started + 20/lxd/2 started 10.244.41.72 juju-5aed61-20-lxd-2 bionic default Container started + 20/lxd/3 started 10.244.40.244 juju-5aed61-20-lxd-3 bionic default Container started + 20/lxd/4 started 10.244.41.73 juju-5aed61-20-lxd-4 bionic default Container started + 20/lxd/5 started 10.244.41.76 juju-5aed61-20-lxd-5 bionic default Container started + 20/lxd/6 started 10.244.41.75 juju-5aed61-20-lxd-6 bionic default Container started + 20/lxd/7 started 10.244.40.247 juju-5aed61-20-lxd-7 bionic default Container started + 21 started 10.244.40.222 rufflet bionic zone3 Deployed + 21/lxd/0 started 10.244.41.66 juju-5aed61-21-lxd-0 bionic zone3 Container started + 21/lxd/1 started 10.244.40.229 juju-5aed61-21-lxd-1 bionic zone3 Container started + 21/lxd/2 started 10.244.41.71 juju-5aed61-21-lxd-2 bionic zone3 Container started + 21/lxd/3 started 10.244.40.230 juju-5aed61-21-lxd-3 bionic zone3 Container started + 21/lxd/4 started 10.244.41.6 juju-5aed61-21-lxd-4 bionic zone3 Container started + 21/lxd/5 started 10.244.40.235 juju-5aed61-21-lxd-5 bionic zone3 Container started + 21/lxd/6 started 10.244.41.69 juju-5aed61-21-lxd-6 bionic zone3 Container started + 21/lxd/7 started 10.244.41.4 juju-5aed61-21-lxd-7 bionic zone3 Container started + 22 started 10.244.40.223 ralts bionic zone2 Deployed + 23 started 10.244.40.219 beartic bionic zone3 Deployed diff --git a/deploy-guide/source/app-managing-power-events-topology.rst b/deploy-guide/source/app-managing-power-events-topology.rst new file mode 100644 index 0000000..d78c828 --- /dev/null +++ b/deploy-guide/source/app-managing-power-events-topology.rst @@ -0,0 +1,334 @@ +:orphan: + +.. _cloud_topology_example: + +Cloud topology example +====================== + +.. note:: + + The information on this page is associated with the topic of :ref:`Managing + Power Events `. See that page for background + information. + +This page contains the analysis of cloud machines. The ideal is to do this for +every machine in a cloud in order to determine the *cloud topology*. Six +machines are features here. They represent a good cross-section of an *Ubuntu +OpenStack* cloud. See :ref:`Reference cloud ` for the cloud +upon which this exercise is based. + +Generally speaking, the cloud nodes are hyperconverged and this is the case for +three of the chosen machines, numbered **17**, **18**, and **20**. Yet this +analysis also looks at a trio of nodes dedicated to the `Landscape project`_: +machines **3**, **11**, and **12**, each of which are not hyperconverged. + +.. note:: + + Juju applications can be given custom names at deployement time (see + `Application groups`_ in the Juju documentation). This document will call + out these `named applications` wherever they occur. + +**machine 17** + +This is what's on machine 17: + +.. code:: + + Unit Workload Agent Machine + nova-compute-kvm/2 active idle 17 + canonical-livepatch/18 active idle + ceilometer-agent/5 active idle + filebeat/26 active idle + lldpd/5 active idle + neutron-openvswitch/5 active idle + nrpe-host/24 active idle + ntp/20 active idle + telegraf/26 active idle + ceph-osd/2 active idle 17 + bcache-tuning/4 active idle + nrpe-host/23 active idle + ceph-mon/2 active idle 17/lxd/0 + filebeat/71 active idle + nrpe-container/48 active idle + telegraf/71 active idle + ceph-radosgw/2 active idle 17/lxd/1 + filebeat/21 active idle + hacluster-radosgw/1 active idle + nrpe-container/3 active idle + telegraf/21 active idle + cinder/2 active idle 17/lxd/2 + cinder-ceph/1 active idle + filebeat/42 active idle + hacluster-cinder/1 active idle + nrpe-container/21 active idle + telegraf/42 active idle + designate-bind/1 active idle 17/lxd/3 + filebeat/40 active idle + nrpe-container/20 active idle + telegraf/40 active idle + glance/2* active idle 17/lxd/4 + filebeat/37 active idle + hacluster-glance/1 active idle + nrpe-container/17 active idle + telegraf/37 active idle + heat/2 active idle 17/lxd/5 + filebeat/43 active idle + hacluster-heat/1 active idle + nrpe-container/22 active idle + telegraf/43 active idle + keystone/2 active idle 17/lxd/6 + filebeat/48 active idle + hacluster-keystone/1 active idle + keystone-ldap/1 active idle + nrpe-container/26 active idle + telegraf/48 active idle + mysql/2 active idle 17/lxd/7 + filebeat/50 active idle + hacluster-mysql/2 active idle + nrpe-container/27 active idle + telegraf/50 active idle + prometheus-openstack-exporter/0* active idle 17/lxd/8 + filebeat/39 active idle + nrpe-container/19 active idle + telegraf/39 active idle + +.. attention:: + + In this example, ``mysql`` and ``nova-compute-kvm`` are `named + applications`. The rest of this section will use their real names of + ``percona-cluster`` and ``nova-compute``, respectively. + +The main applications (principle charms) for this machine are listed below +along with their HA status and machine type: + +- ``nova-compute`` (metal) +- ``ceph-osd`` (natively HA; metal) +- ``ceph-mon`` (natively HA; lxd) +- ``ceph-radosgw`` (natively HA; lxd) +- ``cinder`` (HA; lxd) +- ``designate-bind`` (HA; lxd) +- ``glance`` (HA; lxd) +- ``heat`` (HA; lxd) +- ``keystone`` (HA; lxd) +- ``percona-cluster`` (HA; lxd) +- ``prometheus-openstack-exporter`` (lxd) + +**machine 18** + +This is what's on machine 18: + +.. code:: + + Unit Workload Agent Machine + nova-compute-kvm/3 active idle 18 + canonical-livepatch/19 active idle + ceilometer-agent/6 active idle + filebeat/41 active idle + lldpd/6 active idle + neutron-openvswitch/6 active idle + nrpe-host/26 active idle + ntp/21 active idle + telegraf/41 active idle + ceph-osd/3 active idle 18 + bcache-tuning/5 active idle + nrpe-host/25 active idle + aodh/0* active idle 18/lxd/0 + filebeat/46 active idle + hacluster-aodh/0* active idle + nrpe-container/24 active idle + telegraf/46 active idle + ceilometer/0 blocked idle 18/lxd/1 + filebeat/51 active idle + hacluster-ceilometer/1 active idle + nrpe-container/28 active idle + telegraf/51 active idle + designate/0* active idle 18/lxd/2 + filebeat/57 active idle + hacluster-designate/0* active idle + nrpe-container/34 active idle + telegraf/57 active idle + gnocchi/0 active idle 18/lxd/3 + filebeat/24 active idle + hacluster-gnocchi/0* active idle + nrpe-container/5 active idle + telegraf/24 active idle + neutron-api/0 active idle 18/lxd/4 + filebeat/53 active idle + hacluster-neutron/0* active idle + nrpe-container/30 active idle + telegraf/53 active idle + nova-cloud-controller/0 active idle 18/lxd/5 + filebeat/54 active idle + hacluster-nova/1 active idle + nrpe-container/31 active idle + telegraf/54 active idle + openstack-dashboard/0* active idle 18/lxd/6 + filebeat/30 active idle + hacluster-horizon/0* active idle + nrpe-container/10 active idle + telegraf/30 active idle + rabbitmq-server/0 active idle 18/lxd/7 + filebeat/62 active idle + nrpe-container/39 active idle + telegraf/62 active idle + +.. attention:: + + In this example, ``nova-compute-kvm`` is a `named application` The rest of + this section will use its real name of ``nova-compute``. + +The main applications (principle charms) for this machine are listed below +along with their HA status and machine type: + +- ``nova-compute`` (metal) +- ``ceph-osd`` (natively HA; metal) +- ``aodh`` (HA; lxd) +- ``ceilometer`` (HA; lxd) +- ``designate`` (HA; lxd) +- ``gnocchi`` (HA; lxd) +- ``neutron-api`` (HA; lxd) +- ``nova-cloud-controller`` (HA; lxd) +- ``openstack-dashboard`` (HA; lxd) +- ``rabbitmq-server`` (natively HA; lxd) + +**machine 20** + +This is what's on machine 20: + +.. code:: + + Unit Workload Agent Machine + neutron-gateway/0 active idle 20 + canonical-livepatch/21 active idle + filebeat/49 active idle + lldpd/8 active idle + nrpe-host/31 active idle + ntp/23 active idle + telegraf/49 active idle + ceph-osd/5 active idle 20 + bcache-tuning/6 active idle + nrpe-host/27 active idle + aodh/1 active idle 20/lxd/0 + filebeat/61 active idle + hacluster-aodh/1 active idle + nrpe-container/38 active idle + telegraf/61 active idle + ceilometer/1 blocked idle 20/lxd/1 + filebeat/70 active idle + hacluster-ceilometer/2 active idle + nrpe-container/47 active idle + telegraf/70 active idle + designate/1 active idle 20/lxd/2 + filebeat/63 active idle + hacluster-designate/1 active idle + nrpe-container/40 active idle + telegraf/63 active idle + gnocchi/1 active idle 20/lxd/3 + filebeat/55 active idle + hacluster-gnocchi/2 active idle + nrpe-container/32 active idle + telegraf/55 active idle + neutron-api/1 active idle 20/lxd/4 + filebeat/58 active idle + hacluster-neutron/1 active idle + nrpe-container/35 active idle + telegraf/58 active idle + nova-cloud-controller/1 active idle 20/lxd/5 + filebeat/68 active idle + hacluster-nova/2 active idle + nrpe-container/45 active idle + telegraf/68 active idle + openstack-dashboard/1 active idle 20/lxd/6 + filebeat/73 active idle + hacluster-horizon/2 active idle + nrpe-container/50 active idle + telegraf/73 active idle + rabbitmq-server/1* active idle 20/lxd/7 + filebeat/32 active idle + nrpe-container/12 active idle + telegraf/32 active idle + +The main applications (principle charms) for this machine are listed below +along with their HA status and machine type: + +- ``neutron-gateway`` (natively HA; metal) +- ``ceph-osd`` (natively HA; metal) +- ``aodh`` (HA; lxd) +- ``ceilometer`` (HA; lxd) +- ``designate`` (HA; lxd) +- ``gnocchi`` (HA; lxd) +- ``neutron-api`` (HA; lxd) +- ``nova-cloud-controller`` (HA; lxd) +- ``openstack-dashboard`` (HA; lxd) +- ``rabbitmq-server`` (natively HA; lxd) + +**machine 3** + +This is what's on machine 3: + +.. code:: + + Unit Workload Agent Machine + landscape-postgresql/0* maintenance idle 3 + canonical-livepatch/9 active idle + filebeat/10 active idle + nrpe-host/9 active idle + ntp/10 active idle + telegraf/10 active idle + +.. attention:: + + In this example, ``landscape-postgresql`` is a `named application` The rest + of this section will use its real name of ``postgresql``. + +The main application (principle charm) for this machine is listed below along +along with their HA status and machine type: + +- ``postgresql`` (natively HA; metal) + +**machine 11** + +This is what's on machine 11: + +.. code:: + + Unit Workload Agent Machine + landscape-server/1 active idle 11 + canonical-livepatch/5 active idle + filebeat/6 active idle + nrpe-host/5 active idle + ntp/6 active idle + telegraf/6 active idle + +The main application (principle charm) for this machine is listed below along +along with their HA status and machine type: + +- ``landscape-server`` (natively HA; metal) + +**machine 12** + +This is what's on machine 12: + +.. code:: + + Unit Workload Agent Machine + landscape-rabbitmq-server/2 active idle 12 + canonical-livepatch/7 active idle + filebeat/8 active idle + nrpe-host/7 active idle + ntp/8 active idle + telegraf/8 active idle + +.. attention:: + + In this example, ``landscape-rabbitmq-server`` is a `named application`. + The rest of this section will use its real name of ``rabbitmq-server``. + +The main application (principle charm) for this machine is listed below along +along with their HA status and machine type: + +- ``rabbitmq-server`` (natively HA; metal) + +.. LINKS +.. _Application groups: https://jaas.ai/docs/application-groups +.. _Landscape project: https://landscape.canonical.com diff --git a/deploy-guide/source/app-managing-power-events.rst b/deploy-guide/source/app-managing-power-events.rst new file mode 100644 index 0000000..ccc8d8b --- /dev/null +++ b/deploy-guide/source/app-managing-power-events.rst @@ -0,0 +1,1029 @@ +.. _managing_power_events: + +Appendix P: Managing Power Events +================================= + +Overview +++++++++ + +Once your OpenStack cloud is deployed and in production you will need to +consider how to manage applications in terms of shutting them down and starting +them up. Examples of situations where this knowledge would be useful include +controlled power events such as node reboots and restarting an AZ (or an entire +cloud). You will also be better able to counter uncontrolled power events like +a power outage. This guide covers how to manage these kinds of power events in +your cloud successfully. + +For the purposes of this document, a *node* is any non-containerised system +that houses at least one cloud service. In practice, this typically constitutes +a physical host. + +In addition, any `known issues`_ affecting the restarting of parts of the cloud +stack are documented. Although they are presented last, it is highly +recommended to review them prior to attempting to apply any of the information +shown here. + +An important assumption made in this document is that the cloud is +*hyperconverged*. That is, multiple applications cohabit each cloud node. This +aspect makes a power event especially significant as it can potentially affect +the entire cloud. + +.. note:: + + This document may help influence a cloud's initial design. Once it is + understood how an application should be treated in the context of a power + event the cloud architect will be able to make better informed decisions. + +Section `Notable applications`_ contains valuable information when stopping and +starting services. It will be used in the context of power events but its +contents can also be used during the normal operation of a cloud. + +General guidelines +++++++++++++++++++ + +As each cloud is unique this section will provide general guidelines on how to +prepare for and manage power events in your cloud. + +.. important:: + + It is recommended that every deployed cloud have a list of detailed + procedures that cover the uniqueness of that cloud. The guidelines in this + current document can act as starting point for such a resource. + +HA applications +~~~~~~~~~~~~~~~ + +Theoretically, an application with high availability is impervious to a power +event, meaning that one would have no impact on both client requests to the +application and the application itself. However, depending on the situation, +some such applications may still require attention when starting back up. The +`percona-cluster`_ application is a good example of this. + +Cloud applications are typically made highly available through the use of the +`hacluster`_ subordinate charm. Some applications, though, achieve HA at the +software layer (outside of Juju), and can be called *natively HA*. One such +application is ``rabbitmq-server``. + +Cloud topology +~~~~~~~~~~~~~~ + +The very first step is to map out the topology of your cloud. In other words, +you need to know what application units are running on what machines, and +whether those machines are physical (metal), virtual (kvm), or container (lxd) +in nature. Each application's HA status should also be indicated. + +A natural way for Juju operators to map out their cloud is by inspecting the +output of the ``juju status`` command. For a demonstration see :ref:`Cloud +topology example `. It is based on this production +:ref:`Reference cloud `. + +Control plane, data plane, and shutdown order +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +*Data plane* services involve networking, storage, and virtualisation, whereas +*control plane* services are necessary to administer and operate the cloud. +See `High availability`_ and `Control plane architecture`_ for more details. + +When a cloud is in production the priority for the administrator is to ensure +that instances and their associated workloads continue to run. This means that +in terms of the impact a power event may have, the data plane has priority +over the control plane. + +Generally, data plane services (DP) are stopped prior to control plane (CP) +services. Also, services within a plane will typically depend upon another +service within that same plane. The conclusion is that the dependant service +should be brought down before the service being depended upon (e.g. stop Nova +before stopping Ceph). + +In terms of core applications then, an approximate service shutdown ordered +list can be built to act as a general guideline. Some services, such as API +services, have less, if any, impact on other services and can therefore be +turned off in any order. + +In the below list, the most notable aspects are the extremes: nova-compute and +Ceph should be stopped first and keystone, rabbitmq-server, and percona-cluster +should be stopped last: + +#. ``nova-compute`` (DP) +#. ``ceph-osd`` (DP) +#. ``ceph-mon`` (DP) +#. ``ceph-radosgw`` (DP) +#. ``neutron-gateway`` (DP) +#. ``neutron-openvswitch`` (DP) +#. ``glance`` (CP) +#. ``cinder`` (CP) +#. ``neutron-gateway`` (CP) +#. ``neutron-api`` (CP) +#. ``placement`` (CP) +#. ``nova-cloud-controller`` (CP) +#. ``keystone`` (CP) +#. ``rabbitmq-server`` (CP) +#. ``percona-cluster`` (CP) + +Each node can now be analysed to see what applications it hosts and in what +order they should be stopped. + +Stopping and starting services +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When **stopping** a service (not an entire application and not a unit agent) on +a hyperconverged cloud node it is safer to act on each unit and stop the +service individually. The alternative is to power down the node hosting the +service, which will, of course, stop every other service hosted on that node. +**Ensure that you understand the consequences of powering down a node**. + +In addition, whenever a service is stopped on a node you need to know what +impact that will have on the cloud. For instance, the default effect of turning +off a Ceph OSD is that data will be re-distributed among the other OSDs, +resulting in high disk and network activity. Most services should be in HA mode +but you should be aware of the quorum that must be maintained in order for HA +to function as designed. For example, turning off two out of three Keystone +cluster members is not advisable. + +Wherever possible, this document shows how to manage services with Juju +`actions`_. Apart from their intrinsic benefits (i.e. sanctioned by experts), +actions are not hampered by SSH-restricted environments. Note that a charm may +not implement every desired command in the form of an action however. In that +case, the only alternative is to interact directly with the unit's operating +system via `SSH`_. + +.. important:: + + When an action is used the resulting state persists within Juju, and, in + particular, will **survive a node reboot**. This can be very advantageous in + the context of controlled shutdown and startup procedures, but it does + demand tracking on the part of the operator. To assist with this, some + charms expose action information in the output of the ``juju status`` + command . + +When actions are **not** used, in terms of **starting** services on a single +node or across a cloud, it may not be possible to do so in a prescribed order +unless the services were explicitly configured to *not* start automatically +during the bootup of a node. + +.. QUESTION + pmatulis: It is possible to start (and stop) LXD containers in a certain + order. Is adding this element to bundles a viable response to the above for + LXD-based workloads?` + +Regardless of whether a service is started with a Juju action, via SSH, or by +booting the corresponding node, it is vital that you verify afterwards that the +service is actually running and functioning properly. + +Controlled power events ++++++++++++++++++++++++ + +The heart of managing your cloud in terms of controlled power events is the +power-cycling of an individual cloud node. Once you're able to make decisions +on a per-node basis extending the power event to a group of nodes, such as an +AZ or even an entire cloud, will become less daunting. + +Power-cycling a cloud node +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a hyperconverged cloud node requires to be power-cycled begin by +considering the cloud topology, at least for the machine in question. + +To illustrate, machines **17**, **18**, **20** from the :ref:`Cloud topology +example ` will be used. Note that only fundamental +applications will be included (i.e. applications such as openstack-dashboard, +ceilometer, etc. will be omitted). + +The main issue behind power-cycling a node is to come up with a **shutdown** +list of services, as the startup list is typically just the shutdown list in +reverse. This is what is shown below for each machine. Information regarding HA +status and machine type has been retained (from the source topology example). + +The shutdown lists are based on section `Control plane, data plane, and +shutdown order`_. + +machine 17 +^^^^^^^^^^ + +#. ``nova-compute`` (metal) +#. ``ceph-osd`` (natively HA; metal) +#. ``ceph-mon`` (natively HA; lxd) +#. ``ceph-radosgw`` (natively HA; lxd) +#. ``glance`` (HA; lxd) +#. ``cinder`` (HA; lxd) +#. ``keystone`` (HA; lxd) +#. ``percona-cluster`` (HA; lxd) + +machine 18 +^^^^^^^^^^ + +#. ``nova-compute`` (metal) +#. ``ceph-osd`` (natively HA; metal) +#. ``neutron-api`` (HA; lxd) +#. ``nova-cloud-controller`` (HA; lxd) +#. ``rabbitmq-server`` (natively HA; lxd) + +machine 20 +^^^^^^^^^^ + +#. ``ceph-osd`` (natively HA; metal) +#. ``neutron-gateway`` (natively HA; metal) +#. ``neutron-api`` (HA; lxd) +#. ``nova-cloud-controller`` (HA; lxd) +#. ``rabbitmq-server`` (natively HA; lxd) + +See section `Notable applications`_ for instructions on stopping individual +services. + +Power-cycling an AZ or an entire cloud +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Apart from the difference in scale of the service outage, stopping and starting +an AZ (availability zone) or an entire cloud is a superset of the case of +power-cycling an individual node. You just need to identify the group of nodes +that are involved. An AZ or cloud would consist of all of the core services +listed in section `Control plane, data plane, and shutdown order`_. + +Uncontrolled power events ++++++++++++++++++++++++++ + +In the context of this document, an uncontrolled power event is an unintended +power outage. The result of such an event is that one or many physical cloud +hosts have turned off non-gracefully. Since we now know that some cloud +services should be stopped in a particular order and in a particular way the +task now is to ascertain what services could have been negatively impacted and +how to proceed in getting such services back in working order. + +Begin as was done in the case of `Power-cycling a cloud node`_ by determining +the topology of the affected nodes. See whether any corresponding services have +special shutdown procedures as documented in section `Notable applications`_. +Any such services will require special scrutiny when they are eventually +started. Determine an ordered startup list for the affected services. As was +shown in `Power-cycling a cloud node`_, this list is the reverse of the +shutdown list. Finally, once the nodes are powered on, by abiding as much as +possible to the startup list, act on any verification steps found in section +`Notable applications`_ for all cloud services. + +Notable applications +++++++++++++++++++++ + +This section contains application-specific shutdown/restart procedures, +well-known caveats, or just valuable tips. + +As noted under `Stopping and starting services`_, this document encourages the +use of actions for managing application services. The general syntax is:: + + juju run-action --wait + +In the procedures that follow, will be replaced by an example only (e.g. +``nova-compute/0``). You will need to substitute in the actual unit for your +cloud. + +For convenience, the applications are listed here (you can also use the table +of contents in the upper left-hand-side): + ++-----------------+-----------+--------------------+--------------------------+--------------------+ +| `ceph-osd`_ | `cinder`_ | `keystone`_ | `neutron-openvswitch`_ | `percona-cluster`_ | ++-----------------+-----------+--------------------+--------------------------+--------------------+ +| `ceph-mon`_ | `etcd`_ | `landscape`_ | `nova-compute`_ | `rabbitmq-server`_ | ++-----------------+-----------+--------------------+--------------------------+--------------------+ +| `ceph-radosgw`_ | `glance`_ | `neutron-gateway`_ | `nova-cloud-controller`_ | `vault`_ | ++-----------------+-----------+--------------------+--------------------------+--------------------+ + +------------------------------------------------------------------------------- + +.. _ceph-osd: +.. _ceph-mon: +.. _ceph-radosgw: + +ceph +~~~~ + +All Ceph services are grouped under this one heading. + +.. note:: + + Some ceph-related charms are lacking in actions. Some procedures will + involve direct intervention. See bugs `LP #1846049`_, `LP #1846050`_, `LP + #1849222`_, and `LP #1849224`_. + +shutdown +^^^^^^^^ + +With respect to powering down a node that hosts an OSD, by default, the Ceph +CRUSH map is configured to treat each cluster machine as a failure domain. The +default pool behaviour is to replicate data across three failure domains, and +require at least two of them to be present to accept writes. Shutting down +multiple machines too quickly may cause two of three copies of a particular +placement group to become temporarily unavailable, which would cause consuming +applications to block on writes. The CRUSH map can be configured to spread +replicas over a failure domain other than machines. See `CRUSH maps`_ in the +Ceph documentation. + +The shutdown procedures for Ceph are provided for both a **cluster** and for +individual **components** (e.g. ``ceph-mon``). + +cluster +""""""" + +1. Ensure that the cluster is in a healthy state. From a Juju client, run a + status check on any MON unit:: + + juju ssh ceph-mon/1 sudo ceph status + +2. Shut down all components/clients consuming Ceph before shutting down Ceph + components to avoid application-level data loss. + +3. Set the ``noout`` option on the cluster a single MON unit, to prevent data + rebalancing from occurring when OSDs start disappearing from the network:: + + juju run-action --wait ceph-mon/1 set-noout + + Query status again to ensure that the option is set:: + + juju ssh ceph-mon/1 sudo ceph status + + Expected partial output is:: + + health: HEALTH_WARN + noout flag(s) set + +4. Stop the RADOS Gateway service on **each** ``ceph-radosgw`` unit. + + First get the current status:: + + juju ssh ceph-radosgw/0 systemctl status ceph-radosgw@\* + + Example partial output is:: + + ● ceph-radosgw@rgw.ip-172-31-93-254.service - Ceph rados gateway + Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; indirect; vendor + preset: enabled) + Active: active (running) since Mon 2019-09-30 21:33:53 UTC; 9min ago + + Now pause the service:: + + juju run-action --wait ceph-radosgw/0 pause + + Verify that the service has stopped:: + + juju ssh ceph-radosgw/0 systemctl status ceph-radosgw@\* + + Expected output is null (no output). + +5. Remove all of a unit's OSDs from the cluster. Do this on **each** + ``ceph-osd`` unit:: + + juju run-action --wait ceph-osd/1 osd-out + + Once done, verify that all of the cluster's OSDs are *out*:: + + juju ssh ceph-mon/1 sudo ceph status + + Assuming a total of six OSDs, expected partial output ("0 in") is:: + + osd: 6 osds: 6 up, 0 in; 66 remapped pgs + +6. Stop the MON service on **each** ``ceph-mon`` unit:: + + juju ssh ceph-mon/0 sudo systemctl stop ceph-mon.service + + Verify that the MON service has stopped on each unit:: + + juju ssh ceph-mon/0 systemctl status ceph-mon.service + + Expected partial output is:: + + Active: inactive (dead) since Mon 2019-09-30 19:46:09 UTC; 1h 1min ago + +.. important:: + + Once the MON units have lost quorum you will lose the ability to query the + cluster. + +component +""""""""" + +1. Ensure that the cluster is in a healthy state. On any MON:: + + juju ssh ceph-mon/1 sudo ceph status + +2. **ceph-mon** - To bring down a single MON service: + + a. Stop the MON service on the ``ceph-mon`` unit:: + + juju ssh ceph-mon/0 sudo systemctl stop ceph-mon.service + + b. Do not bring down another MON until the cluster has recovered from the + loss of the current one (run a status check). + +3. **ceph-osd** - To bring down all the OSDs on a single unit: + + a. Remove all the OSDs on the ``ceph-osd`` unit:: + + juju run-action --wait ceph-osd/2 osd-out + + b. Do not remove OSDs on another unit until the cluster has recovered from + the loss of the current one (run a status check). + +startup +^^^^^^^ + +The startup procedures for Ceph are provided for both a **cluster** and for +individual **components** (e.g. ``ceph-mon``). + +cluster +""""""" + +Nodes hosting Ceph services should be powered on such that the services are +started in this order: + +1. ``ceph-mon`` +2. ``ceph-osd`` +3. ``ceph-radosgw`` + +**Important**: If during cluster shutdown, + +a. a unit's OSDs were removed from the cluster then you must re-insert them. Do + this for **each** ``ceph-osd`` unit:: + + juju run-action --wait ceph-osd/0 osd-in + +b. the ``noout`` option was set, you will need to unset it. On any MON unit:: + + juju run-action --wait ceph-mon/0 unset-noout + +c. a RADOS Gateway service was paused, you will need to resume it. Do this for + **each** ``ceph-radosgw`` unit:: + + juju run-action --wait ceph-radosgw/0 resume + +Finally, ensure that the cluster is in a healthy state by running a status +check on any MON unit:: + + juju ssh ceph-mon/0 sudo ceph status + +component +""""""""" + +1. Ensure that the cluster is in a healthy state. On any MON:: + + juju ssh ceph-mon/0 sudo ceph status + +2. **ceph-mon** - To bring up a single MON service: + + a. Start the MON service on the ``ceph-mon`` unit:: + + juju ssh ceph-mon/1 sudo systemctl start ceph-mon.service + + b. Do not bring up another MON until the cluster has recovered from the + addition of the current one (run a status check). + +3. **ceph-osd** - To bring up all the OSDs on a unit: + + a. Re-insert the OSDs on the ``ceph-osd`` unit:: + + juju run-action --wait ceph-osd/1 osd-in + + b. Do not re-insert OSDs on another unit until the cluster has recovered + from the addition of the current ones (run a status check). + +.. important:: + + Individual OSDs on a unit cannot be started or stopped using actions. They + are managed as a collective. + +------------------------------------------------------------------------------- + +cinder +~~~~~~ + +shutdown +^^^^^^^^ + +To pause the Cinder service:: + + juju run-action --wait cinder/0 pause + +startup +^^^^^^^ + +To resume the Cinder service:: + + juju run-action --wait cinder/0 resume + +------------------------------------------------------------------------------- + +etcd +~~~~ + +.. note:: + + The ``etcd`` charm is lacking in actions. Some procedures will involve + direct intervention. See bug `LP #1846257`_. + +shutdown +^^^^^^^^ + +To stop the Etcd service:: + + juju ssh etcd/0 sudo systemctl stop snap.etcd.etcd + +startup +^^^^^^^ + +To start the Etcd service:: + + juju ssh etcd/0 sudo systemctl start snap.etcd.etcd + +read queries +^^^^^^^^^^^^ + +To see the etcd cluster status. On any ``etcd`` unit:: + + juju run-action --wait etcd/0 health + +loss of etcd quorum +^^^^^^^^^^^^^^^^^^^ + +If the majority of the etcd units fail (e.g. 2 out of 3) you can scale down the +cluster (e.g. 3 to 1). However, if all hooks have not had a chance to run (e.g. +you may have to force remove and redeploy faulty units) the surviving master +will not accept new cluster members/units. In that case, do the following: + +1. Scale down the cluster to 1 unit any way you can (remove faulty units / stop + the etcd service / delete the database on the slave units). + +2. Force the surviving master to become a 1-node cluster. On the appropriate + unit: + + a. Stop the service:: + + juju ssh etcd/0 sudo systemctl stop snap.etcd.etcd + + b. Connect to the unit via SSH and edit + `/var/snap/etcd/common/etcd.conf.yml` by setting `force-new-cluster` to + 'true'. + + c. Start the service:: + + juju ssh etcd/0 sudo systemctl start snap.etcd.etcd + + d. Connect to the unit via SSH and edit + `/var/snap/etcd/common/etcd.conf.yml` by setting `force-new-cluster` to + 'false'. + +3. Scale up the cluster by adding new etcd units. + +------------------------------------------------------------------------------- + +glance +~~~~~~ + +shutdown +^^^^^^^^ + +To pause the Glance service:: + + juju run-action --wait glance/0 pause + +.. important:: + + If Glance is clustered using the 'hacluster' charm, first **pause** + hacluster and then **pause** Glance. + +startup +^^^^^^^ + +To resume the Glance service:: + + juju run-action --wait glance/0 resume + +.. important:: + + If Glance is clustered using the 'hacluster' charm, first **resume** + Glance and then **resume** hacluster. + +------------------------------------------------------------------------------- + +keystone +~~~~~~~~ + +shutdown +^^^^^^^^ + +To pause the Keystone service:: + + juju run-action --wait keystone/0 pause + +.. important:: + + If Keystone is clustered using the 'hacluster' charm, first **pause** + hacluster and then **pause** Keystone. + +startup +^^^^^^^ + +To resume the Keystone service:: + + juju run-action --wait keystone/0 resume + +.. important:: + + If Keystone is clustered using the 'hacluster' charm, first **resume** + Keystone and then **resume** hacluster. + +------------------------------------------------------------------------------- + +landscape +~~~~~~~~~ + +.. note:: + + The ``postgresql`` charm, needed by Landscape, is lacking in actions. Some + procedures will involve direct intervention. See bug `LP #1846279`_. + +shutdown +^^^^^^^^ + +1. Pause the Landscape service:: + + juju run-action --wait landscape-server/0 pause + +2. Stop the PostgreSQL service:: + + juju ssh postgresql/0 sudo systemctl stop postgresql + +3. Pause the RabbitMQ service:: + + juju run-action --wait rabbitmq-server/0 pause + +.. caution:: + + Services other than Landscape may also be using either of the PostgreSQL or + RabbitMQ services. + +startup +^^^^^^^ + +The startup of Landscape should be done in the reverse order. + +1. Ensure the RabbitMQ service is started:: + + juju run-action --wait rabbitmq-server/0 pause + +2. Ensure the PostgreSQL service is started:: + + juju ssh postgresql/0 sudo systemctl start postgresql + +3. Resume the Landscape service:: + + juju run-action --wait landscape-server/0 pause + +------------------------------------------------------------------------------- + +neutron-gateway +~~~~~~~~~~~~~~~ + +neutron agents +^^^^^^^^^^^^^^ + +A cloud outage will occur if a node hosting a non-HA ``neutron-gateway`` is +power cycled due to the lack of neutron agents. + +Before stopping the service you can manually check for HA status of neutron +agents on the node using the commands below. HA is confirmed by the presence of +more than one agent per **router**, in the case of L3 agents, and more than one +per **network**, in the case of DHCP agents. + +To return the list of **L3 agents** serving each of the routers connected to a +node: + +.. code:: + + for i in `openstack network agent list | grep L3 | awk '/$NODE/ {print $2}'` ; \ + do printf "\nAgent $i serves:" ; \ + for f in `neutron router-list-on-l3-agent $i | awk '/network_id/ {print$2}'` ; \ + do printf "\n Router $f served by these agents:\n" ; \ + neutron l3-agent-list-hosting-router $f ; \ + done ; done + +To return the list of **DHCP agents** serving each of the networks connected to +a node: + +.. code:: + + for i in `openstack network agent list| grep -i dhcp | awk '/$NODE/ {print $2}'` ; \ + do printf "\nAgent $i serves:" ; \ + for f in `neutron net-list-on-dhcp-agent $i | awk '!/+/ {print$2}'` ; \ + do printf "\nNetwork $f served by these agents:\n" ; \ + neutron dhcp-agent-list-hosting-net $f ; \ + done ; done + +.. note:: + + Replace ``$NODE`` with the node hostname as known to OpenStack (i.e. + ``openstack host list``). + +shutdown +^^^^^^^^ + +To pause a Neutron gateway service:: + + juju run-action --wait neutron-gateway/0 pause + +startup +^^^^^^^ + +To resume a Neutron gateway service:: + + juju run-action --wait neutron-gateway/0 resume + +------------------------------------------------------------------------------- + +neutron-openvswitch +~~~~~~~~~~~~~~~~~~~ + +shutdown +^^^^^^^^ + +To pause the Open vSwitch service:: + + juju run-action --wait neutron-openvswitch/0 pause + +startup +^^^^^^^ + +To resume the Open vSwitch service:: + + juju run-action --wait neutron-openvswitch/0 resume + +------------------------------------------------------------------------------- + +nova-cloud-controller +~~~~~~~~~~~~~~~~~~~~~ + +shutdown +^^^^^^^^ + +To pause Nova controller services (Nova scheduler, Nova api, Nova network, Nova +objectstore):: + + juju run-action --wait nova-cloud-controller/0 pause + +startup +^^^^^^^ + +To resume Nova controller services:: + + juju run-action --wait nova-cloud-controller/0 resume + +------------------------------------------------------------------------------- + +nova-compute +~~~~~~~~~~~~ + +.. _nova-compute-shutdown: + +shutdown +^^^^^^^^ + +True HA is not possible for ``nova-compute`` nor its instances. If a node +hosting this service is power-cycled the corresponding hypervisor is removed +from the pool of available hypervisors, and its instances will become +inaccessible. Generally speaking, individual hypervisors are fallible +components in a cloud. The standard response to this is to implement HA on the +instance workloads. Provided shared storage is set up, you can also move +instances to another compute node and boot them anew (state is lost) - see +`Evacuate instances`_. + +To stop a Nova service: + +1. Some affected nova instances may require a special shutdown sequence (e.g. + an instance may host a workload that demands particular care when turning it + off). Invoke them now. + +2. Gracefully stop all remaining affected nova instances. + +3. Pause the Nova service:: + + juju run-action --wait nova-compute/0 pause + +.. tip:: + + If shared storage is implemented, instead of shutting down instances you + may consider moving ("evacuating") them to another compute node. See + `Evacuate instances`_. + +startup +^^^^^^^ + +To resume a Nova service:: + + juju run-action --wait nova-compute/0 resume + +Instances that fail to come up properly can be moved to another compute host +(see `Evacuate instances`_). + +------------------------------------------------------------------------------- + +percona-cluster +~~~~~~~~~~~~~~~ + +shutdown +^^^^^^^^ + +To pause the MySQL service for a ``percona-cluster`` unit:: + + juju run-action --wait percona-cluster/0 pause + +To gracefully shut down the cluster repeat the above for every unit. + +startup +^^^^^^^ + +A special startup procedure is necessary regardless of whether services were +shut down gracefully or not (power outage or hard shutdown): + +1. Run action ``bootstrap-pxc`` on any percona-cluster unit. + +If the MySQL sequence numbers (obtained with command ``juju status +percona-cluster``) vary across units then the action `must` be run on the unit +with the highest sequence number:: + + juju run-action --wait percona-cluster/? bootstrap-pxc + +2. Run action ``notify-bootstrapped`` on a percona-cluster unit. + + There are two possibilities: + + - If the ``bootstrap-pxc`` action was run on a leader then run + ``notify-bootstrapped`` on a non-leader. + - If the ``bootstrap-pxc`` action was run on a non-leader then run + ``notify-bootstrapped`` on the leader. + +Run the appropriate command now:: + + juju run-action --wait percona-cluster/? notify-bootstrapped + +For details see the `percona-cluster charm`_. + +------------------------------------------------------------------------------- + +rabbitmq-server +~~~~~~~~~~~~~~~ + +shutdown +^^^^^^^^ + +To pause a RabbitMQ service:: + + juju run-action --wait rabbitmq-server/0 pause + +startup +^^^^^^^ + +To resume a RabbitMQ service:: + + juju run-action --wait rabbitmq-server/0 resume + +read queries +^^^^^^^^^^^^ + +Provided rabbitmq is running on a ``rabbitmq-server`` unit, you can perform a +status check:: + + juju run-action --wait rabbitmq-server/1 cluster-status + +Example partial output is: + +.. code:: + + Cluster status of node 'rabbit@ip-172-31-13-243' + [{nodes,[{disc,['rabbit@ip-172-31-13-243']}]}, + {running_nodes,['rabbit@ip-172-31-13-243']}, + {cluster_name,<<"rabbit@ip-172-31-13-243.ec2.internal">>}, + {partitions,[]}, + {alarms,[{'rabbit@ip-172-31-13-243',[]}]}] + +It is expected that there are no objects listed on the partitions line (as +above). + +To list unconsumed queues (those with pending messages):: + + juju run-action --wait rabbitmq-server/1 list-unconsumed-queues + +See `Partitions`_ and `Queues`_ in the RabbitMQ documentation. + +partitions +^^^^^^^^^^ + +Any partitioned units will need to be attended to. Stop and start the +rabbitmq-server service for each ``rabbitmq-server`` unit, checking for status +along the way: + +.. code:: + + juju run-action --wait rabbitmq-server/0 pause + juju run-action --wait rabbitmq-server/1 cluster-status + juju run-action --wait rabbitmq-server/0 pause + juju run-action --wait rabbitmq-server/1 cluster-status + +If errors persist, the mnesia database will need to be removed from the +affected unit so it can be resynced from the other units. Do this by removing +the contents of the ``/var/lib/rabbitmq/mnesia`` directory between the stop and +start commands. + +.. note:: + + The network partitioning handling mode configured by the + ``rabbitmq-server`` charm is ``autoheal``. + +------------------------------------------------------------------------------- + +vault +~~~~~ + +.. note:: + + The ``vault`` charm is lacking in actions. Some procedures will involve + direct intervention. See bugs `LP #1846282`_ and `LP #1846375`_. + +shutdown +^^^^^^^^ + +To stop a Vault service:: + + juju ssh vault/0 sudo systemctl stop vault + +startup +^^^^^^^ + +To start a Vault service:: + + juju ssh vault/0 sudo systemctl start vault + +read queries +^^^^^^^^^^^^ + +To see Vault service status:: + + juju ssh vault/0 /snap/bin/vault status + +Expected output is:: + + Cluster is sealed + +unsealing units +^^^^^^^^^^^^^^^ + +When Vault is clustered, each unit will manually (and locally) need to be +unsealed with its respective ``VAULT_ADDR`` environment variable and with the +minimum number of unseal keys (three here): + +.. code:: + + export VAULT_ADDR="https://:8200" + vault operator unseal + vault operator unseal + vault operator unseal + +See `Vault`_ in the Charms Deployment Guide for more details. + +Known issues +++++++++++++ + +- `LP #1804261`_ : ceph-osds will need to be restarted if they start before Vault is ready and unsealed +- `LP #1818260`_ : forget cluster node failed during cluster-relation-changed hook +- `LP #1818680`_ : booting should succeed even if vault is unavailable +- `LP #1818973`_ : vault fails to start when MySQL backend down +- `LP #1827690`_ : barbican-worker is down: Requested revision 1a0c2cdafb38 overlaps with other requested revisions 39cf2e645cba +- `LP #1840706`_ : install hook fails with psycopg2 ImportError + +Consult each charm's bug tracker for full bug listings. See the `OpenStack +Charms`_ project group. + +.. LINKS +.. _percona-cluster charm: https://opendev.org/openstack/charm-percona-cluster/src/branch/master/README.md#cold-boot +.. _Vault: https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-vault.html +.. _High availability: https://docs.openstack.org/arch-design/arch-requirements/arch-requirements-ha.html +.. _Control plane architecture: https://docs.openstack.org/arch-design/design-control-plane.html +.. _Evacuate instances: https://docs.openstack.org/nova/latest/admin/evacuate.html +.. _hacluster: https://jaas.ai/hacluster +.. _OpenStack Charms: https://launchpad.net/openstack-charms +.. _SSH: https://jaas.ai/docs/machine-auth +.. _CRUSH maps: https://docs.ceph.com/docs/master/rados/operations/crush-map +.. _actions: https://jaas.ai/docs/working-with-actions +.. _Partitions: https://www.rabbitmq.com/partitions.html +.. _Queues: https://www.rabbitmq.com/queues.html + +.. BUGS +.. _LP #1804261: https://bugs.launchpad.net/charm-ceph-osd/+bug/1804261 +.. _LP #1818260: https://bugs.launchpad.net/charm-rabbitmq-server/+bug/1818260 +.. _LP #1818680: https://bugs.launchpad.net/charm-ceph-osd/+bug/1818680 +.. _LP #1818973: https://bugs.launchpad.net/vault-charm/+bug/1818973 +.. _LP #1827690: https://bugs.launchpad.net/charm-barbican/+bug/1827690 +.. _LP #1840706: https://bugs.launchpad.net/vault-charm/+bug/1840706 +.. _LP #1846049: https://bugs.launchpad.net/charm-ceph-mon/+bug/1846049 +.. _LP #1846050: https://bugs.launchpad.net/charm-ceph-mon/+bug/1846050 +.. _LP #1846257: https://bugs.launchpad.net/charm-etcd/+bug/1846257 +.. _LP #1846279: https://bugs.launchpad.net/postgresql-charm/+bug/1846279 +.. _LP #1846282: https://bugs.launchpad.net/vault-charm/+bug/1846282 +.. _LP #1846375: https://bugs.launchpad.net/vault-charm/+bug/1846375 +.. _LP #1849222: https://bugs.launchpad.net/charm-ceph-mon/+bug/1849222 +.. _LP #1849224: https://bugs.launchpad.net/charm-ceph-radosgw/+bug/1849224 diff --git a/deploy-guide/source/app.rst b/deploy-guide/source/app.rst index 07a6710..9ab8310 100644 --- a/deploy-guide/source/app.rst +++ b/deploy-guide/source/app.rst @@ -20,3 +20,4 @@ Appendices app-erasure-coding.rst app-policy-overrides.rst app-ovn.rst + app-managing-power-events.rst