diff --git a/doc/images/openstack_cloud_lifecycle.png b/doc/images/openstack_cloud_lifecycle.png new file mode 100644 index 0000000..33494ca Binary files /dev/null and b/doc/images/openstack_cloud_lifecycle.png differ diff --git a/doc/images/src/openstack_cloud_lifecycle.txt b/doc/images/src/openstack_cloud_lifecycle.txt new file mode 100644 index 0000000..4581544 --- /dev/null +++ b/doc/images/src/openstack_cloud_lifecycle.txt @@ -0,0 +1,7 @@ +@startuml + +(*) -right-> [OpenStack Services\nNova, Keystone, Neutron,\nGlance, Heat, Swift] "Deployment" +"Deployment" -right-> [OpenStack Deployment\nFuel, TripleO, Devstack] "Operation\nMaintenance" +"Operation\nMaintenance" -right-> [DRAGONS?\nTuskar, Rubick] (*) + +@enduml diff --git a/doc/openstack_diagnostics_proposal.rst b/doc/openstack_diagnostics_proposal.rst index 7174cdf..500cf40 100644 --- a/doc/openstack_diagnostics_proposal.rst +++ b/doc/openstack_diagnostics_proposal.rst @@ -11,21 +11,30 @@ Project Name Overview -------- +The typical OpenStack cloud life cycle consists of 2 phases: + +- initial deployment and +- operation maintenance + OpenStack cloud operators usually rely on deploymnet tools to configure all the -platform components correctly and efficiently upfront. However, after initial -deployment platform configurations and operational conditions start to change. -These changes could break consistency and integration of cloud platform and its -components, and ultimately cause cloud service failures of different kinds. +platform components correctly and efficiently in *initial deployment* phase. +Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack, +to name a few. + +However, once you installed and kicked off the cloud, platform configurations +and operational conditions begin to change. These changes could break +consistency and integration of cloud platform components. Keeping cloud up and +running is the essense of *operation maintenance* phase. Cloud operator must quickly and efficiently identify and respond to the root cause of such failures. To do so, he must check if his OpenStack configuration is sane and consistent. These checks could be thought of as rules of diagnostic production system. -Currently OpenStack ecosystem does not provide tools which specifically help to -diagnose platform configuration. We propose a project which will help operators -to diagnose their OpenStack platform and reduce response time to known and -unknown failures. +Currently OpenStack ecosystem lacks projects aimed to increase reliability and +resilience of the cloud. With this proposal we want to introduce a project which +will help operators to diagnose their OpenStack platform, reduce response time +to known and unknown failures and effectively support the desired SLA. Mission ------- diff --git a/doc/openstack_integration.rst b/doc/openstack_integration.rst index 780c9b2..a7ee495 100644 --- a/doc/openstack_integration.rst +++ b/doc/openstack_integration.rst @@ -1,5 +1,5 @@ -VALIDATOR INTEGRATION WITH OPENSTACK -==================================== +DIAGNOSTICS INTEGRATION WITH OPENSTACK +====================================== -------- Overview @@ -50,8 +50,29 @@ and inconsistencies. This engine will provide hints and best practices to increase reliability and operational resilience of the cloud. -Rules engine ------------- +#FIXME: move this part to document rules_engine.rst + +Rules-based approach to diagnostics +----------------------------------- + +The consistent configuration across all components is essential to OpenStack +cloud operation. If something is wrong with configuration, you as an operator +will know this immidiately either from monitoring or clients complaining. But +diagnosing the exact problem is always a challenge, given the number of +components and configuration options per component. + +You could think about troubleshooting OpenStack as going through some scenarios +which can be expressed as sets of rules. Your configuration must comply to all those +rules to be operational. On the other hand, if you know rules which your +configuration breaks, you can identify incorrect parameters reliably and easy. +That is how production rules or diagnostic systems work. + +Example production rule for OpenStack system could be:: + + if (condition)parameter) is (value) then (check_parameter_1) must be (value) and + (check_parameter_2) must be (value) + + ------------------ Integration Points diff --git a/doc/rules_engine.rst b/doc/rules_engine.rst index e69de29..62e6db3 100644 --- a/doc/rules_engine.rst +++ b/doc/rules_engine.rst @@ -0,0 +1,5 @@ +PRODUCTION RULES ENGINE +======================= + +This document describes rules engine used for inspection and diagnostics of +OpenStack configuration.