..
    This work is licensed under a Creative Commons Attribution 3.0 Unported
    License.

    http://creativecommons.org/licenses/by/3.0/legalcode

    Sections of this template were taken directly from the Nova spec
    template at:
    https://github.com/openstack/nova-specs/blob/master/specs/juno-template.rst

..
    This template should be in ReSTructured text. The filename in the git
    repository should match the launchpad URL, for example a URL of
    https://blueprints.launchpad.net/trove/+spec/awesome-thing should be named
    awesome-thing.rst.

    Please do not delete any of the sections in this template.  If you
    have nothing to say for a whole section, just write: None

    Note: This comment may be removed if desired, however the license notice
    above should remain.


==============================
Refactor the Datastore Manager
==============================

.. If section numbers are desired, unindent this
    .. sectnum::

.. If a TOC is desired, unindent this
    .. contents::

There is a large amount of boiler-plate and/or copy-and-paste code in each
datastore manager.  As more managers are added, the time involved in
maintaining all this code will continue to grow. To alleviate this and the bugs
it promotes, a base manager class needs to be created where all common code can
reside.

Launchpad Blueprint:
https://blueprints.launchpad.net/trove/+spec/datastore-manager-refactor


Problem Description
===================

When bugs are discovered and fixed in the manager code of one datastore, these
changes are seldom transferred to all the other datastores.  This causes a
'drift' in the stability of each experimental datastore as currently there are
no third-party CI's.  In addition, when 'new improved' ways to solve a problem
are implemented they are done differently across each manager, if at all.  It
is also difficult to transfer features (and coding knowledge) from one
datastore to another, and the problem is exacerbated as code is
copied-and-pasted when a new datastore manager is implemented.

A current example of this problem is the issue where instances flip from BUILD
to ACTIVE and back again [1]_, or from BUILD->SHUTDOWN->ACTIVE. [2]_  Fixing
this means changing how prepare works, and currently this change would be
required to be made in each datastore manager.  Implementing this in a uniform
way would be almost impossible with the current architecture of the manager
code base.


Proposed Change
===============

A new 'Manager' class will be created that will be the base class for all
datastore managers.  In order to keep the scope reasonable, only the barest
minimum of functionality will be pulled back into the base class in the initial
implementation - this will include methods that are currently 'boiler-plate'
code (such as the response to the rpc_ping and get_fileststem_stats calls, and
the periodic task, update_status).

A mechanism for encapsulating functionality will also be needed in order for
the base manager to be able to execute datastore-specific instructions.  This
will be accomplished through the use of properties that can be overridden by
each descendant.  Some required ones (such as the 'status' object) will be
declared abstract and must be present; others (such as the new configuration
manager, and potentially a future dictionary of strategies) will be optional.

The new Manager class will reside in the datastore module:

.. code-block:: bash

    trove/guestagent/datastore/manager.py

This is alongside the existing service.py file (which contains the existing
BaseDbStatus class).  The directory structure of the MySQL derived classes will
also be tidied a bit and will end up looking like the following:

.. code-block:: bash

    trove
    +-- guestagent
        +-- datastore
            +-- __init__.py
            +-- manager.py          <-- new 'base' manager
            +-- service.py
            +-- experimental
             +-- __init__.py
             +-- cassandra
              +-- __init__.py
              +-- manager.py
              +-- service.py
              +-- system.py

            <other experimental datastore modules>

             +-- redis
              +-- __init__.py
              +-- manager.py
              +-- service.py
              +-- system.py
             +-- vertica
                 +-- __init__.py
                 +-- manager.py
                 +-- service.py
                 +-- system.py
            +-- mysql_common        <-- new module
             +-- __init__.py
             +-- manager.py      <-- renamed from mysql/manager_base.py
             +-- service.py      <-- renamed from mysql/service_base.py
            +-- mysql
             +-- __init__.py
             +-- manager.py
             +-- service.py
            +-- technical-preview
                +-- __init__.py


Within the context of this refactor (as a proof-of-concept), the issue with
having instances flip in-and-out of BUILD state will be addressed properly.
The prepare method will be moved into the base class, which will
seamlessly implement the code required to ensure that no notifications are
sent until it is guaranteed that prepare has finished successfully.  The
existing prepare methods will be renamed 'do_prepare' and will be called from
within the base prepare method.

This manner of determining whether prepare has finished will be accomplished by
having the manager write a file at the start and at the successful conclusion
of the prepare operation.  Any errors (exceptions) will be trapped and logged
and the instance set into the FAILED state.

A sample of what this will look like in the base manager is as follows,
although additional properties may be defined as they are deemed necessary (and
others can be added in future cleanup work):

.. code-block:: python

    class Manager(periodic_task.PeriodicTasks):
        """This is the base class for all datastore managers.  Over time, common
        functionality should be pulled back here from the existing managers.
        """

        def __init__(self, manager_name):

            super(Manager, self).__init__(CONF)

            # Manager properties
            self.__manager_name = manager_name
            self.__manager = None
            self.__prepare_error = False

        @property
        def manager_name(self):
            """This returns the passed-in name of the manager."""
            return self.__manager_name

        @property
        def manager(self):
            """This returns the name of the manager."""
            if not self.__manager:
                self.__manager = CONF.datastore_manager or self.__manager_name
            return self.__manager

        @property
        def prepare_error(self):
            return self.__prepare_error

        @prepare_error.setter
        def prepare_error(self, prepare_error):
            self.__prepare_error = prepare_error

        @property
        def configuration_manager(self):
            """If the datastore supports the new-style configuration manager,
            it should override this to return it.
            """
            return None

        @abc.abstractproperty
        def status(self):
            """This should return an instance of a status class that has been
            inherited from datastore.service.BaseDbStatus.  Each datastore
            must implement this property.
            """
            return None

        ################
        # Status related
        ################
        @periodic_task.periodic_task
        def update_status(self, context):
            """Update the status of the trove instance. It is decorated with
            perodic_task so it is called automatically.
            """
            LOG.debug("Update status called.")
            self.status.update()

        def rpc_ping(self, context):
            LOG.debug("Responding to RPC ping.")
            return True

        #################
        # Prepare related
        #################
        def prepare(self, context, packages, databases, memory_mb, users,
                    device_path=None, mount_point=None, backup_info=None,
                    config_contents=None, root_password=None, overrides=None,
                    cluster_config=None, snapshot=None):
            """Set up datastore on a Guest Instance."""
            LOG.info(_("Starting datastore prepare for '%s'.") % self.manager)
            self.status.begin_install()
            post_processing = True if cluster_config else False
            try:
                self.do_prepare(context, packages, databases, memory_mb,
                                users, device_path, mount_point, backup_info,
                                config_contents, root_password, overrides,
                                cluster_config, snapshot)
            except Exception as ex:
                self.prepare_error = True
                LOG.exception(_("An error occurred preparing datastore: %s") %
                              ex.message)
                raise
            finally:
                LOG.info(_("Ending datastore prepare for '%s'.") % self.manager)
                self.status.end_install(error_occurred=self.prepare_error,
                                        post_processing=post_processing)
            # At this point critical 'prepare' work is done and the instance
            # is now in the correct 'ACTIVE' 'INSTANCE_READY' or 'ERROR' state.
            # Of cource if an error has occurred, none of the code that follows
            # will run.
            LOG.info(_('Completed setup of datastore successfully.'))

            # We only create databases and users automatically for non-cluster
            # instances.
            if not cluster_config:
                try:
                    if databases:
                        LOG.debug('Calling add databases.')
                        self.create_database(context, databases)
                    if users:
                        LOG.debug('Calling add users.')
                        self.create_user(context, users)
                except Exception as ex:
                    LOG.exception(_("An error occurred creating databases/users: "
                                    "%s") % ex.message)
                    raise

            try:
                LOG.debug('Calling post_prepare.')
                self.post_prepare(context, packages, databases, memory_mb,
                                  users, device_path, mount_point, backup_info,
                                  config_contents, root_password, overrides,
                                  cluster_config, snapshot)
            except Exception as ex:
                LOG.exception(_("An error occurred in post prepare: %s") %
                              ex.message)
                raise

        @abc.abstractmethod
        def do_prepare(self, context, packages, databases, memory_mb, users,
                       device_path, mount_point, backup_info, config_contents,
                       root_password, overrides, cluster_config, snapshot):
            """This is called from prepare when the Trove instance first comes
            online.  'Prepare' is the first rpc message passed from the
            task manager.  do_prepare handles all the base configuration of
            the instance and is where the actual work is done.  Once this method
            completes, the datastore is considered either 'ready' for use (or
            for final connections to other datastores) or in an 'error' state,
            and the status is changed accordingly.  Each datastore must
            implement this method.
            """
            pass

        def post_prepare(self, context, packages, databases, memory_mb, users,
                         device_path, mount_point, backup_info, config_contents,
                         root_password, overrides, cluster_config, snapshot):
            """This is called after prepare has completed successfully.
            Processing done here should be limited to things that will not
            affect the actual 'running' status of the datastore (for example,
            creating databases and users, although these are now handled
            automatically).  Any exceptions are caught, logged and rethrown,
            however no status changes are made and the end-user will not be
            informed of the error.
            """
            LOG.debug('No post_prepare work has been defined.')
            pass

        #####################
        # File System related
        #####################
        def get_filesystem_stats(self, context, fs_path):
            """Gets the filesystem stats for the path given."""
            # TODO(peterstac) - note that fs_path is not used in this method.
            mount_point = CONF.get(self.manager).mount_point
            LOG.debug("Getting file system stats for '%s'" % mount_point)
            return dbaas.get_filesystem_volume_stats(mount_point)

        #################
        # Cluster related
        #################
        def cluster_complete(self, context):
            LOG.debug("Cluster creation complete, starting status checks.")
            self.status.end_install()


The base service class will be enhanced to contain the necessary methods to
set a flag denoting whether prepare has finished of not.  This will look
something like the following (only new code is shown):

.. code-block:: python

    class BaseDbStatus(object):

        GUESTAGENT_DIR = '~'
        PREPARE_START_FILENAME = '.guestagent.prepare.start'
        PREPARE_END_FILENAME = '.guestagent.prepare.end'

        def __init__(self):
            self._prepare_completed = None

        @property
        def prepare_completed(self):
            if self._prepare_completed is None:
                # Force the file check
                self.prepare_completed = None
            return self._prepare_completed

        @prepare_completed.setter
        def prepare_completed(self, value):
            # Set the value based on the existence of the file; 'value' is
            # ignored
            # This is required as the value of prepare_completed is cached,
            # so this must be referenced any time the existence of the
            # file changes
            self._prepare_completed = os.path.isfile(
                guestagent_utils.build_file_path(
                    self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME))

        def begin_install(self):
            """Called right before DB is prepared."""
            prepare_start_file = guestagent_utils.build_file_path(
                self.GUESTAGENT_DIR, self.PREPARE_START_FILENAME)
            operating_system.write_file(prepare_start_file, '')
            self.prepare_completed = False

            self.set_status(instance.ServiceStatuses.BUILDING, True)

        def end_install(self, error_occurred=False, post_processing=False):
            """Called after prepare completes."""

            # Set the "we're done" flag if there's no error and
            # no post_processing is necessary
            if not (error_occurred or post_processing):
                prepare_end_file = guestagent_utils.build_file_path(
                    self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME)
                operating_system.write_file(prepare_end_file, '')
                self.prepare_completed = True

            final_status = None
            if error_occurred:
                final_status = instance.ServiceStatuses.FAILED
            elif post_processing:
                final_status = instance.ServiceStatuses.INSTANCE_READY

            if final_status:
                LOG.info(_("Set final status to %s.") % final_status)
                self.set_status(final_status, force=True)
            else:
                self._end_install_or_restart(True)

        def end_restart(self):
            self.restart_mode = False
            LOG.info(_("Ending restart."))
            self._end_install_or_restart(False)

        def _end_install_or_restart(self, force):
            """Called after DB is installed or restarted.
            Updates the database with the actual DB server status.
            """
            real_status = self._get_actual_db_status()
            LOG.info(_("Current database status is '%s'.") % real_status)
            self.set_status(real_status, force=force)

        @property
        def is_installed(self):
            """This is for compatibility - it may be removed during further
            cleanup.
            """
            return self.prepare_completed

        def set_status(self, status, force=False):
            """Use conductor to update the DB app status."""

            if force or self.is_installed:
                LOG.debug("Casting set_status message to conductor "
                          "(status is '%s')." % status.description)
                context = trove_context.TroveContext()

                heartbeat = {'service_status': status.description}
                conductor_api.API(context).heartbeat(
                    CONF.guest_id, heartbeat, sent=timeutils.float_utcnow())
                LOG.debug("Successfully cast set_status.")
                self.status = status
            else:
                LOG.debug("Prepare has not completed yet, skipping heartbeat.")


Configuration
-------------

No configuration changes are anticipated.


Database
--------

None


Public API
----------

None


Public API Security
-------------------

None


Python API
----------

None


CLI (python-troveclient)
------------------------

None


Internal API
------------

The ServiceStatuses.BUILD_PENDING state has been renamed to
ServiceStatuses.INSTANCE_READY to better reflect the instance's actual state.
The value displayed will remain 'BUILD' so that there should be no outward
differences and thus backwards compatibility will be maintained.


Guest Agent
-----------

This change should not affect any behavior on the Guest Agent.  The current
tests should be adequate to ensure that the change is fully compatible with the
rest of the code base.


Alternatives
------------

One suggestion as to how to fix the prepare issue was to use Nova metadata,
which is available on the guest instance.  If it is decided that this would be
useful, it could be added in addition to the proposed method as a means of
notification only.


Dashboard Impact (UX)
=====================

TBD (section added after approval)


Implementation
==============

Assignee(s)
-----------

Primary assignee:
  <peterstac>


Milestones
----------

Target Milestone for completion:
  eg. Mitaka-1


Work Items
----------

All changes will be done in the context of a single task.


Upgrade Implications
====================

None are anticipated.


Dependencies
============

None


Testing
=======

The unittests will be modified as necessary, however minimal changes will be
made in this regard as well.  In order to keep the changes as small as
possible, refactoring the tests will also be done in stages with only the bare
minimum done to start and the remainder being left to a future date.  The
future work would include testing the base class thoroughly and then removing
all the corresponding tests from the derived classes.

The int-tests should continue to run as always, and will be used to determine
that no fundamental changes have been made to the implementation, with the
exception of the bug fixes (and they should just lead to greater stability in
the test infrastructure).


Documentation Impact
====================

Since all the changes are implementation related, no documentation changes are
foreseen.


References
==========

.. [1] https://bugs.launchpad.net/trove/+bug/1482795

.. [2] https://bugs.launchpad.net/trove/+bug/1487233

Appendix
========

None