[DOCS]: Updates to include auto-failover and log archiving

Change-Id: Ib438fe7a97f8807aec6eaed8285785894e3dd4e0
This commit is contained in:
Andrew Hutchings 2013-09-06 16:38:11 +01:00
parent fc8d9ca1d0
commit 443ac95f37
16 changed files with 349 additions and 213 deletions

View File

@ -4,8 +4,11 @@ Description
Purpose
-------
The Admin API server listens for REST+JSON connections to interface various
parts of the LBaaS system and other scripts with the LBaaS database state.
The Admin API server listens for REST+JSON connections to provide information
about the state of Libra to external systems.
Additionally the Admin API has several schedulers which automatically maintain
the health of the Libra system and the connected Load Balancer devices.
Design
------
@ -14,3 +17,6 @@ Similar to the main API server it uses an Eventlet WSGI web server frontend
with Pecan+WSME to process requests. SQLAlchemy+MySQL is used to access the
data store. The main internal difference (apart from the API itself) is the
Admin API server doesn't use keystone or gearman.
It spawns several scheduled threads to run tasks such as building new devices
for the pool, monitoring load balancer devices and maintaining IP addresses.

View File

@ -16,6 +16,7 @@ Configuration File
db_section=mysql1
ssl_certfile=/opt/server.crt
ssl_keyfile=/opt/server.key
gearman=127.0.0.1:4730
[mysql1]
host=localhost
@ -39,7 +40,7 @@ Command Line Options
The port number to listen on, default is 8889
.. option:: --db_secions <SECTIONNAME>
.. option:: --db_sections <SECTIONNAME>
Config file sections that describe the MySQL servers. This option can
be specified multiple times for Galera or NDB clusters.
@ -90,11 +91,6 @@ Command Line Options
How long to wait until we consider the second and final ping check
failed. Default is 30 seconds.
.. option:: --stats_repair_timer <REPAIR_INTERVAL>
How often to run a check to see if damaged load balancers had been
repaired (in seconds), default 180
.. option:: --number_of_servers <NUMBER_OF_SERVER>
The number of Admin API servers in the system.
@ -123,3 +119,17 @@ Command Line Options
A list of tags to be used for the datadog driver
.. option:: --node_pool_size <SIZE>
The number of hot spare load balancer devices to keep in the pool,
default 10
.. option:: --vip_pool_size <SIZE>
The number of hot spare floating IPs to keep in the pool, default 10
.. option:: --expire_days <DAYS>
The number of days before DELETED load balancers are purged from the
database. The purge is run every 24 hours. Purge is not run if no
value is provided.

View File

@ -6,4 +6,5 @@ Libra Admin API Server
about
config
schedulers
api

View File

@ -0,0 +1,66 @@
Admin Schedulers
================
The Admin API has several schedulers to maintain the health of the Libra
system. This section of the document goes into detail about each one.
Each Admin API server takes it in-turn to run these tasks. Which server is
next is determined by the :option:`--number_of_servers` and
:option:`--server_id` options.
Stats Scheduler
---------------
This scheduler is actually a monitoring scheduler and at a later date will also
gather statistics for billing purposes. It is executed once a minute.
It sends a gearman message to active Load Balancer device. There are three
possible outcomes from the results:
#. If all is good, no action is taken
#. If a node connected to a load balancer has failed the node is marked as
ERROR and the load balancer is marked as DEGRADED
#. If a device has failed the device will automatically be rebuilt on a new
device and the associated floating IP will be re-pointed to that device. The
old device will be marked for deletion
Delete Scheduler
----------------
This scheduler looks out for any devices marked for deletion after use or after
an error state. It is executed once a minute.
It sends a gearman message to the Pool Manager to delete any devices that are
to be deleted and removes them from the database.
Create Scheduler
----------------
This scheduler takes a look at the number of hot spare devices available. It
is executed once a minute (after the delete scheduler).
If the number of available hot spare devices falls below the value specified by
:option:`--node_pool_size` it will request that new devices are built and those
devices will be added to the database. It records how many are currently being
built so long build times don't mean multiple Admin APIs are trying to fulfil
the same quota.
VIP Scheduler
-------------
This scheduler takes a look at the number of hot spare floating IPs available.
It is executed once a minute.
If the number of available floating IP address falls below the value specified
by :option:`vip_pool_size` it will request that new IPs are build and those
will be added to the database.
Expunge Scheduler
-----------------
This scheduler removes logical Load Balancers marked as DELETED from the
database. It is executed once a day.
The DELETED logical Load Balancers remain in the database mainly for billing
purposes. This clears out any that were deleted after the number of days
specified by :option:`--expire-days`.

View File

@ -319,6 +319,8 @@ by the LBaaS service.
+-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+
| Virtual IP | :ref:`Get list of virtual IPs <api-vips>` | GET | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/virtualips |
+-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+
| Logs | :ref:`Archive log file to Object Storage <api-logs>` | POST | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/logs |
+-----------------+------------------------------------------------------------+----------+-----------------------------------------------------------------+
5.2 Common Request Headers
~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -2270,6 +2272,74 @@ if not found.
]
}
.. _api-logs:
22. Archive log file to Object Storage
--------------------------------------
22.1 Operation
~~~~~~~~~~~~~~
+----------+------------------------------------+--------+-----------------------------------------------------+
| Resource | Operation | Method | Path |
+==========+====================================+========+=====================================================+
| Logs | Archive log file to Object Storage | POST | {baseURI}/{ver}/loadbalancers/{loadbalancerId}/logs |
+----------+------------------------------------+--------+-----------------------------------------------------+
22.2 Description
~~~~~~~~~~~~~~~~
The operation tells the load balancer to push the current log file into an HP Cloud Object Storage container. The status of the load balancer will be set to 'PENDING_UPDATE' during the operation and back to 'ACTIVE' upon success or failure. A success/failure message can be found in the 'statusDescription' field when getting the load balancer details.
**Load Balancer Status Values**
+----------------+---------------+--------------------------------+
| Status | Name | Description |
+================+===============+================================+
| ACTIVE | Load balancer | is in an operational state |
| PENDING_UPDATE | Load balancer | is in the process of an update |
+----------------+---------------+--------------------------------+
By default with empty POST data the load balancer will upload to the swift account owned by the same tenant as the load balancer in a container called 'lbaaslogs'. To change this the following optional parameters need to be provided in the POST body:
**objectStoreBasePath** : the object store container to use
**objectStoreEndpoint** : the object store endpoint to use including tenantID, for example: https://region-b.geo-1.objects.hpcloudsvc.com:443/v1/1234567890123
**authToken** : an authentication token to the object store for the load balancer to use
22.3 Request Data
~~~~~~~~~~~~~~~~~
The caller is required to provide a request data with the POST which includes the appropriate information to upload logs.
22.4 Query Parameters Supported
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
None required.
22.5 Required HTTP Header Values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**X-Auth-Token**
22.6 Request Body
~~~~~~~~~~~~~~~~~
The request body must follow the correct format for new load balancer creation, examples....
A request that uploads the logs to a different object store
::
{
"objectStoreBasePath": "mylblogs",
"objectStoreEndpoint": "https://region-b.geo-1.objects.hpcloudsvc.com:443/v1/1234567890123",
"authToken": "HPAuth_d17efd"
}
Features Currently Not Implemented or Supported
-----------------------------------------------
@ -2281,7 +2351,3 @@ The following features are not supported.
advertised in /protocols request. Instead TCP will be used for port 443
and the HTTPS connections will be passed through the load balancer with no
termination at the load balancer.
3. The ability to list deleted load balancers is not yet supported.

View File

@ -97,11 +97,6 @@ Command Line Options
The path for the SSL key file to be used for the frontend of the API
server
.. option:: --expire_days <DAYS>
Deleted Load Balancers older than this number of days will be expunged
from the database using a sceduler that is executed every 24 hours.
.. option:: --ip_filters <FILTERS>
A mask of IP addresses to filter for backend nodes in the form

View File

@ -7,7 +7,6 @@ Load Balancer as a Service Device Tools
introduction
worker/index
pool_mgm/index
statsd/index
api/index
admin_api/index
config

Binary file not shown.

Before

Width:  |  Height:  |  Size: 75 KiB

After

Width:  |  Height:  |  Size: 64 KiB

View File

@ -4,15 +4,16 @@ Description
Purpose
-------
The Libra Node Pool manager is designed to keep a constant pool of spare load
balancer nodes so that when a new one is needed it simply needs configuring.
This saves on time needed to spin up new nodes upon customer request and extra
delays due to new nodes failing.
The Libra Node Pool manager is designed to communicate with Openstack Nova or
any other compute API to provide nodes and floating IPs to the libra system
for use. It does this by providing a gearman worker interface to the Nova
API. This means you can have multiple pool managers running and gearman will
decide on the next available pool manager to take a job.
Design
------
It is designed to probe the API server every X minutes (5 by default) to find
out how many free nodes there are. If this falls below a certain defined level
the pool manager will spin up new nodes and supply their details to the
API server.
It is designed to accept requests from the Libra components to manipulate Nova
instances and floating IPs. It is a daemon which is a gearman worker. Any
commands sent to that worker are converted into Nova commands and the results
are sent back to the client.

View File

@ -1,131 +0,0 @@
Code Walkthrough
================
Here we'll highlight some of the more important code aspects.
Server Class
------------
.. py:module:: libra.mgm.mgm
.. py:class:: Server(logger, args)
This class is the main server activity once it has started in either
daemon on non-daemon mode.
:param logger: An instance of :py:class:`logging.logger`
:param args: An instance of :py:class:`libra.common.options.Options`
.. py:method:: main()
Sets the signal handler and then called :py:meth:`check_nodes`
.. py:method:: check_nodes()
Runs a check to see if new nodes are needed. Called once by
:py:meth:`main` at start and then called by the scheduler.
It also restarts the scheduler at the end of execution
.. py:method:: reset_scheduler()
Uses :py:class:`threading.Timer` to set the next scheduled execution of
:py:meth:`check_nodes`
.. py:method:: build_nodes(count, api)
Builds the required number of nodes determined by
:py:meth:`check_nodes`.
:param count: The number of nodes to build
:param api: A driver derived from the :py:class:`MgmDriver` parent class
.. py:method:: exit_handler(signum, frame)
The signal handler function. Clears the signal handler and calls
:py:meth:`shutdown`
:param signum: The signal number
:param frame: The stack frame
.. py:method:: shutdown(error)
Causes the application to exit
:param error: set to True if an error caused shutdown
:type error: boolean
Node Class
----------
.. py:module:: libra.mgm.node
.. py:class:: Node(username, password, tenant, auth_url, region, keyname, secgroup, image, node_type)
This class uses :py:class:`novaclient.client` to manipulate Nova nodes
:param username: The Nova username
:param password: The Nova password
:param tenant: The Nova tenant
:param auth_url: The Nova authentication URL
:param region: The Nova region
:param keyaname: The Nova key name for new nodes
:param secgroup: The Nova security group for new nodes
:param image: The Nova image ID or name for new nodes
:param node_type: The flavor ID or name for new nodes
.. py:method:: build()
Creates a new Nova node and tests that it is running. It will poll
every 3 seconds for 2 minutes to check if the node is running.
:return: True and status dictionary for success, False and error for fail
MgmDriver Class
---------------
.. py:module:: libra.mgm.drivers.base
.. py:class:: MgmDriver
The defines the API for interacting with various API servers. Drivers for
these API servers should inherit from this class and implement the relevant
API methods that it can support.
`This is an abstract class and is not meant to be instantiated directly.`
.. py:method:: get_free_count()
Gets the number of free nodes. This is used to calculate if more nodes
are needed
:return: the number of free nodes
.. py:method:: add_node(name, address)
Adds the node details for a new device to the API server.
:param name: the new name for the node
:param address: the new public IP address for the node
:return: True or False and the JSON response (if any)
.. py:method:: is_online()
Check to see if the driver has access to a valid API server
:return: True or False
.. py:method:: get_url()
Gets the URL for the current API server
:return: the URL for the current API server
Known Drivers Dictionary
------------------------
.. py:data:: known_drivers
This is the dictionary that maps values for the
:option:`--driver <libra_pool_mgm.py --driver>` option
to a class implementing the driver :py:class:`~MgmDriver` API
for that API server. After implementing a new driver class, you simply add
a new entry to this dictionary to plug in the new driver.

154
doc/pool_mgm/commands.rst Normal file
View File

@ -0,0 +1,154 @@
Gearman Commands
================
The Pool Manager registers as the worker name ``libra_pool_mgm`` on the gearman
servers. Using this it accepts the JSON requests outlined in this document.
In all cases it will return the original message along with the following for
success:
.. code-block:: json
{
"response": "PASS"
}
And this for failure:
.. code-block:: json
{
"response": "FAIL"
}
BUILD_DEVICE
------------
This command sends the Nova ``boot`` command using the Nova API and returns
details about the resulting new Nova instance. Details about which image and
other Nova settings to use are configured using the options or config file for
Pool Manager.
Example:
.. code-block:: json
{
"action": "BUILD_DEVICE"
}
Response:
.. code-block:: json
{
"action": "BUILD_DEVICE",
"response": "PASS",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9",
"addr": "15.185.175.81",
"type": "basename: libra-stg-haproxy, image: 12345",
"az": "3"
}
DELETE_DEVICE
-------------
This command requests that a Nova instance be deleted.
Example:
.. code-block:: json
{
"action": "DELETE_DEVICE",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9"
}
Response:
.. code-block:: json
{
"action": "DELETE_DEVICE",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9",
"response": "PASS"
}
BUILD_IP
--------
This command requests a floating IP from Nova.
Example:
.. code-block:: json
{
"action": "BUILD_IP",
}
Response:
.. code-block:: json
{
"action": "BUILD_IP",
"response": "PASS",
"id": "12345",
"ip": "15.185.234.125"
}
ASSIGN_IP
---------
This command assigns floating IP addresses to Nova instances (by name of
instance).
Example:
.. code-block:: json
{
"action": "ASSIGN_IP",
"ip": "15.185.234.125",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9"
}
Response:
.. code-block:: json
{
"action": "ASSIGN_IP",
"ip": "15.185.234.125",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9",
"response": "PASS"
}
REMOVE_IP
---------
This command removes a floating IP address from a Nova instance, preserving
the IP address to be used another time.
Example:
.. code-block:: json
{
"action": "REMOVE_IP",
"ip": "15.185.234.125",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9"
}
Response:
.. code-block:: json
{
"action": "REMOVE_IP",
"ip": "15.185.234.125",
"name": "libra-stg-haproxy-eaf1fef0-1584-11e3-b42b-02163e192df9",
"response": "PASS"
}

View File

@ -25,40 +25,13 @@ Configuration File
nova_secgroup = default
nova_image = 12345
nova_image_size = standard.medium
api_server = 10.0.0.1:8889 10.0.0.2:8889
nodes = 10
check_interval = 5
submit_interval = 15
gearman=127.0.0.1:4730
node_basename = 'libra'
Command Line Options
--------------------
.. program:: libra_pool_mgm
.. option:: --api_server <HOST:PORT>
The hostname/IP and port colon separated pointed to an Admin API server
for use with the HP REST API driver. Can be specified multiple times for
multiple servers
.. option:: --check_interval <CHECK_INTERVAL>
How often to check the API server to see if new nodes are needed
(value is minutes)
.. option:: --submit_interval <SUBMIT_INTERVAL>
How often to check the list of nodes to see if the nodes
are now in a good state (value is in minutes)
.. option:: --driver <DRIVER>
API driver to use. Valid driver options are:
* *hp_rest* - HP REST API, talks to the HP Cloud API server (based
on Atlas API)
This is the default driver.
.. option:: --datadir <DATADIR>
The data directory used to store things such as the failed node list.
@ -73,10 +46,6 @@ Command Line Options
A name to prefix the UUID name given to the nodes the pool manager
generates.
.. option:: --nodes <NODES>
The size of the pool of spare nodes the pool manager should keep.
.. option:: --nova_auth_url <NOVA_AUTH_URL>
The URL used to authenticate for the Nova API
@ -114,3 +83,20 @@ Command Line Options
The flavor ID (image size ID) or name to use for new nodes spun up in
the Nova API
.. option:: --gearman_ssl_ca <PATH>
The path for the Gearman SSL Certificate Authority.
.. option:: --gearman_ssl_cert <PATH>
The path for the Gearman SSL certificate.
.. option:: --gearman_ssl_key <PATH>
The path for the Gearman SSL key.
.. option:: --gearman <HOST:PORT>
Used to specify the Gearman job server hostname and port. This option
can be used multiple times to specify multiple job servers

View File

@ -6,4 +6,4 @@ Libra Node Pool Manager
about
config
code
commands

Binary file not shown.

View File

@ -51,33 +51,20 @@ nova_keyname = default
nova_secgroup = default
nova_image = 12345
nova_image_size = standard.medium
api_server = 10.0.0.1:8889 10.0.0.2:8889
nodes = 10
check_interval = 5
submit_interval = 15
node_basename = 'libra'
az = 1
[statsd]
api_server=127.0.0.1:8889
server=127.0.0.1:4730
logfile=/tmp/statsd.log
pid=/tmp/statsd.pid
driver=dummy datadog hp_rest
datadog_api_key=0987654321
datadog_app_key=1234567890
datadog_message_tail="@user@domain.com"
datadog_tags=service:lbaas
datadog_env=prod
ping_interval = 60
poll_timeout = 5
poll_timeout_retry = 30
gearman=127.0.0.1:4730
[admin_api]
db_sections=mysql1
ssl_certfile=certfile.crt
ssl_keyfile=keyfile.key
expire_days=7
stats_driver=dummy datadog database
datadog_api_key=KEY
datadog_app_key=KEY2
datadog_tags=service:lbaas
node_pool_size=50
[api]
host=0.0.0.0

View File

@ -44,10 +44,6 @@ class Server(object):
def main():
options = Options('mgm', 'Node Management Daemon')
options.parser.add_argument(
'--api_server', action='append', metavar='HOST:PORT', default=[],
help='a list of API servers to connect to (for HP REST API driver)'
)
options.parser.add_argument(
'--az', type=int,
help='The az number the node will reside in (to be passed to the API'