106 lines
3.4 KiB
ReStructuredText
106 lines
3.4 KiB
ReStructuredText
==============================
|
|
OPENSTACK DIAGNOSTICS PROPOSAL
|
|
==============================
|
|
|
|
.. contents::
|
|
|
|
Project Name
|
|
============
|
|
|
|
**Official:** OpenStack Diagnostics
|
|
|
|
**Codename:** Rubick
|
|
|
|
OVERVIEW
|
|
========
|
|
|
|
The typical OpenStack cloud life cycle consists of 2 phases:
|
|
|
|
- initial deployment and
|
|
- operation maintenance
|
|
|
|
OpenStack cloud operators usually rely on deploymnet tools to configure all the
|
|
platform components correctly and efficiently in *initial deployment* phase.
|
|
Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack,
|
|
to name a few.
|
|
|
|
However, once you installed and kicked off the cloud, platform configurations
|
|
and operational conditions begin to change. These changes could break
|
|
consistency and integration of cloud platform components. Keeping cloud up and
|
|
running is the essense of *operation maintenance* phase.
|
|
|
|
Cloud operator must quickly and efficiently identify and respond to the root
|
|
cause of such failures. To do so, he must check if his OpenStack configuration
|
|
is sane and consistent. These checks could be thought of as rules of diagnostic
|
|
production system.
|
|
|
|
Currently OpenStack ecosystem lacks projects aimed to increase reliability and
|
|
resilience of the cloud. With this proposal we want to introduce a project which
|
|
will help operators to diagnose their OpenStack platform, reduce response time
|
|
to known and unknown failures and effectively support the desired SLA.
|
|
|
|
Mission
|
|
-------
|
|
|
|
Diagnostics' mission is to **provide OpenStack cloud operators with tools which
|
|
minimize time and effort needed to identify and fix errors in operations
|
|
maintenance phase of cloud life cycle.**
|
|
|
|
User Stories
|
|
------------
|
|
|
|
- As a **cloud operator**, I want to make sure that my OpenStack architecture
|
|
and configuration is sane and consistent across all platform components and
|
|
services.
|
|
- As a **cloud architect**, I want to make sure that my OpenStack architecture
|
|
and configuration are compliant to best practices.
|
|
- As a **cloud architect**, I need a knowledge base of sanity checks and best
|
|
practices for troubleshooting my OpenStack cloud which I can reuse and update
|
|
with my own checks and rules.
|
|
- As a **cloud operator**, I want to be able to automatically extract
|
|
configuration parameters from all OpenStack components to verify their
|
|
correctness, consistency and integrity.
|
|
- As a **cloud operator**, I want automatic diagnostics tool which can inspect
|
|
configuration of my OpenStack cloud and report if it is sane and/or compliant
|
|
toc community-defined best practices.
|
|
- As a **cloud operator**, I want to be able to define rules used to inspect
|
|
and verify configuration of OpenStack components and store them to use for
|
|
verification of future configuration changes.
|
|
|
|
Requirements
|
|
------------
|
|
|
|
TBD
|
|
|
|
Scope
|
|
-----
|
|
|
|
As an MVP1, we create a service that includes:
|
|
|
|
#. Rules engine with grammatic analysis capabilities
|
|
#. Extensible implementation of rules
|
|
#. REST API for running inspections
|
|
#. Storage back-end implementation for OpenStack platform architecture and
|
|
configuration data
|
|
|
|
Assumptions
|
|
-----------
|
|
|
|
We assume that we must reuse as much as possible from OpenStack Deployment
|
|
program in terms of platform configuration and architecture definitions (i.e.
|
|
TripleO Heat and configuration files templates).
|
|
|
|
Dependencies
|
|
------------
|
|
|
|
DESIGN
|
|
======
|
|
|
|
.. include:: service_architecture.rst
|
|
|
|
.. include:: rules_engine.rst
|
|
|
|
.. include:: openstack_integration.rst
|
|
|
|
.. include:: openstack_architecture_model.rst
|