Merge pull request #47 from adobdin/master

1.5.1
2016-06-14 10:49:47 +04:00 · 2016-06-14 10:49:47 +04:00 · ad9d1fea0b
commit ad9d1fea0b
parent 382bc5478d e3d5bef013
6 changed files with 187 additions and 71 deletions
--- a/build-rpm.sh
+++ b/build-rpm.sh
@ -0,0 +1,3 @@
 #/usr/bin/bash
 python ./setup.py bdist_rpm
--- a/doc/source/configuration.rst
+++ b/doc/source/configuration.rst
@ -1,31 +1,120 @@
-=============
+=====================
-Configuration
+General configuration
-=============
+=====================
-There is default configuration file ``config.yaml``, which can be used by the scripts.
+All default configuration values are defined in ``timmy/conf.py``. Timmy works with these values if no configuration file is provided.
-If you wish to keep several configuration files, that is possible - just copy it and explicitly provide the name of it once you launch a script (``--config`` option).
+If a configuration file is provided via ``-c | --config`` option, it overlays the default configuration.
 An example of a configuration file is ``config.yaml``.
-Here is the description of available parameters in configuration file:
+Some of the parameters available in configuration file:
 * **ssh** parameters of *SSH*
 * **opts** parameters to send to ssh command directly (recommended to leave at default)
 * **vars** environment variables to set for SSH
 * **ssh_opts** parameters to send to ssh command directly (recommended to leave at default), such as connection timeout, etc. See ``timmy/conf.py`` to review defaults.
 * **env_vars** environment variables to pass to the commands and scripts - you can use this to expand variables in commands or scripts
 * **fuel_ip** the IP address of the master node in the environment
-* **rqdir** the path of *rqdir*, the directory containing info about commands to execute and logs to gather
+* **fuel_user** username to use for accessing Nailgun API
-* **out-dir** directory to store output data
+* **fuel_pass** password to access Nailgun API
-* **timeout** timeout for SSH commands in seconds
+* **rqdir** the path of *rqdir*, the directory containing scripts to execute and filelists to pass to rsync
-* **archives** directory to store the generated archives
+* **out_dir** directory to store output data
-* **log_files** path and filters for log files
+* **archive_dir** directory to put resulting archives into
 * **timeout** timeout for SSH commands and scripts in seconds
-Nodes which are stored in fuel database can be filtered by the following parameters:
+===================
- * roles,
+Configuring actions
- * online
+===================
 * status the list of statuses ex. ['ready', 'discover']
 * **node_ids** the list of ids, ex. [0,5,6]
-* **hard_filter** hard filter for nodes
+Actions can be configured in a separate yaml file (by default ``rq.yaml`` is used) and / or defind in the main config file or passed via command line options ``-P``, ``-C``, ``-S``, ``-G``.
-* **soft_filter** soft filters for nodes
+
 The following actions are available for definition:
 * **put** - a list of tuples / 2-element lists: [source, destination]. Passed to ``scp`` like so ``scp source <node-ip>:destination``. Wildcards supported for source.
 * **cmds** - a list of dicts: {'command-name':'command-string'}. Example: {'command-1': 'uptime'}. Command string is a bash string. Commands are executed in a sorted order of their names.
 * **scripts** - a list of script filenames located on a local system. If filename does not contain path separator, the script is expected ot be located inside ``rqdir/scripts``. Otherwise the provided path is used to read the script.
 * **files** - a list of filenames to collect. passed to ``scp``. Supports wildcards.
 * **filelists** - a list of filelist filenames located on a local system. Filelist is a text file containing files and directories to collect, passed to rsync. Does not support wildcards. If filename does not contain path separator, the filelist is expected to be located inside ``rqdir/filelists``. Otherwise the provided path is used to read the filelist.
 * **log_files**
 ** **path** - base path to scan for logs
 ** **include** - regexp string to match log files against for inclusion (if not set = include all)
 ** **exclude** - regexp string to match log files against. Excludes matched files from collection.
 ** **start** - date or datetime string to collect only files modified on or after the specified time. Format - ``YYYY-MM-DD`` or ``YYYY-MM-DD HH:MM:SS``
 ===============
 Filtering nodes
 ===============
 * **soft_filter** - use to skip any operations on non-matching nodes
 * **hard_filter** - same as above but also removes non-matching nodes from NodeManager.nodes dict - useful when using timmy as a module
 Nodes can be filtered by the following parameters defined inside **soft_filter** and/or **hard_filter**:
 * **roles** - the list of roles, ex. **['controller','compute']**
 * **online** - enabled by default to skip non-accessible nodes
 * **status** - the list of statuses. Default: **['ready', 'discover']**
 * **ids** - the list of ids, ex. **[0,5,6]**
 * any other attribute of Node object which is a simple type (int, float, str, etc.) or a list containing simple types
 Lists match **any**, meaning that if any element of the filter list matches node value (if value is a list - any element in it), node passes.
 Negative filters are possible by pretending filter parameter with **no_**, example: **no_id = [0]** will filter out Fuel.
 Negative lists also match **any** - if any match / collision found, node is skipped.
 You can combine any number of positive and negative filters as long as their names differ (since this is a dict).
 You can use both positive and negative parameters to match the same node parameter (though it does not make much sense):
 **roles = ['controller', 'compute']**
 **no_roles = ['compute']**
 This will skip computes and run only on controllers. Like stated above, does not make much sense :)
 =============================
 Parameter-based configuration
 =============================
 It is possible to define special **by_<parameter-name>** dicts in config to (re)define node parameters based on other parameters. For example:
 ::
  by_roles:
    controller:
      cmds: {'check-uptime': 'uptime'}
 In this example for any controller node, cmds setting will be reset to the value above. For nodes without controller role, default (none) values will be used.
 =============
 rqfile format
 =============
 ``rqfile`` format is a bit different from config. The basic difference:
 **config:**
 ::
  scripts: [a ,b, c]
  by_roles:
    compute:
      scripts: [d, e, f]
 **rqfile:**
 ::
  scripts:
    __default: [a, b, c]
    by_roles:
      compute: [d, e, f]
 The **config** and **rqfile** definitions presented above are equivalent. It is possible to define config in a config file using the **config** format, or in an **rqfile** using **rqfile** format, linking to the **rqfile** in config with ``rqfile`` setting. It is also possible to define part here and part there. Mixing identical parameters in both places is not recommended - results may be unpredictable (such scenario was not thoroughly tested). In general **rqfile** is good for fewer settings with more parameter-based variations (``by_``), and main config for more different settings with less such variations.
 ===============================
 Configuration application order
 ===============================
 Configuration is assembled and applied in a specific order:
 1. default configuration is initialized. See ``timmy/conf.py`` for details.
 2. command line parameters, if defined, are used to modify the configuration.
 3. **rqfile**, if defined (default - ``rq.yaml``), is converted and injected into the configuration. At this stage the configuration is in its final form.
 4. for every node, configuration is applied, except ``once_by_`` directives:
 4.1 first the top-level attributes are set
 4.2 then ``by_<attribute-name>`` parameters except ``by_id`` are iterated to override or append(accumulate) the attributes
 4.3 then ``by_id`` is iterated to override any matching attributes, redefining what was set before
 5. finally ``once_by_`<attribute-name>`` parameters are applied - only for one matching node for any set of matching values. This is useful for example if you want a specific file or command from only a single node matching a specific role, like running ``nova list`` only on one controller.
 Once you are done with the configuration, you might want to familiarize yourself with :doc:`Usage </usage>`.
--- a/setup.cfg
+++ b/setup.cfg
@ -5,3 +5,6 @@ all_files  = 1
 [upload_sphinx]
 upload-dir = doc/_build/html
 [bdist_rpm]
 requires = python >= 2.6
--- a/timmy/env.py
+++ b/timmy/env.py
@ -1,5 +1,5 @@
 project_name = 'timmy'
-version = '1.4.0'
+version = '1.5.1'
 if __name__ == '__main__':
    exit(0)
--- a/timmy/nodes.py
+++ b/timmy/nodes.py
@ -46,10 +46,10 @@ class Node(object):
    conf_match_prefix = 'by_'
    conf_default_key = '__default'
    conf_priority_section = conf_match_prefix + 'id'
-    print_template = '{0:<14} {1:<3} {2:<16} {3:<18} {4:<10} {5:<30}'
+    header = ['node-id', 'env', 'ip', 'mac', 'os',
-    print_template += ' {6:<6} {7}'
+              'roles', 'online', 'status', 'name', 'fqdn']
-    def __init__(self, id, mac, cluster, roles, os_platform,
+    def __init__(self, id, name, fqdn, mac, cluster, roles, os_platform,
                 online, status, ip, conf, logger=None):
        self.id = id
        self.mac = mac
@ -70,6 +70,8 @@ class Node(object):
        self.logsize = 0
        self.mapcmds = {}
        self.mapscr = {}
        self.name = name
        self.fqdn = fqdn
        self.filtered_out = False
        self.outputs_timestamp = False
        self.outputs_timestamp_dir = None
@ -77,14 +79,18 @@ class Node(object):
        self.logger = logger or logging.getLogger(__name__)
    def __str__(self):
        fields = self.print_table()
        return self.pt.format(*fields)
    def print_table(self):
        if not self.filtered_out:
            my_id = self.id
        else:
            my_id = str(self.id) + ' [skipped]'
-        pt = self.print_template
+        return [str(my_id), str(self.cluster), str(self.ip), str(self.mac),
-        return pt.format(my_id, self.cluster, self.ip, self.mac,
+                self.os_platform, ','.join(self.roles),
-                         self.os_platform, ','.join(self.roles),
+                str(self.online), str(self.status),
-                         str(self.online), self.status)
+                str(self.name), str(self.fqdn)]
    def apply_conf(self, conf, clean=True):
@ -190,7 +196,7 @@ class Node(object):
                                                      env_vars=self.env_vars,
                                                      timeout=self.timeout,
                                                      prefix=self.prefix)
-                    self.check_code(code, 'exec_cmd', c[cmd], ok_codes)
+                    self.check_code(code, 'exec_cmd', c[cmd], errs, ok_codes)
                    try:
                        with open(dfile, 'w') as df:
                            df.write(outs.encode('utf-8'))
@ -202,7 +208,10 @@ class Node(object):
        self.scripts = sorted(self.scripts)
        mapscr = {}
        for scr in self.scripts:
-            f = os.path.join(self.rqdir, Node.skey, scr)
+            if os.path.sep in scr:
                f = scr
            else:
                f = os.path.join(self.rqdir, Node.skey, scr)
            self.logger.info('node:%s(%s), exec: %s' % (self.id, self.ip, f))
            dfile = os.path.join(ddir, 'node-%s-%s-%s' %
                                 (self.id, self.ip, os.path.basename(f)))
@ -217,7 +226,8 @@ class Node(object):
                                                  env_vars=self.env_vars,
                                                  timeout=self.timeout,
                                                  prefix=self.prefix)
-                self.check_code(code, 'exec_cmd', 'script %s' % f, ok_codes)
+                self.check_code(code, 'exec_cmd', 'script %s' % f, errs,
                                ok_codes)
                try:
                    with open(dfile, 'w') as df:
                        df.write(outs.encode('utf-8'))
@ -238,7 +248,7 @@ class Node(object):
                                              ok_codes=ok_codes,
                                              input=input,
                                              prefix=self.prefix)
-            self.check_code(code, 'exec_simple_cmd', cmd, ok_codes)
+            self.check_code(code, 'exec_simple_cmd', cmd, errs, ok_codes)
    def get_files(self, timeout=15):
        self.logger.info('node: %s, IP: %s' % (self.id, self.ip))
@ -253,11 +263,14 @@ class Node(object):
                                                      file=f,
                                                      ddir=ddir,
                                                      recursive=True)
-                self.check_code(code, 'get_files', 'tools.get_file_scp')
+                self.check_code(code, 'get_files', 'tools.get_file_scp', errs)
        else:
            data = ''
            for f in self.filelists:
-                fname = os.path.join(self.rqdir, Node.flkey, f)
+                if os.path.sep in f:
                    fname = f
                else:
                    fname = os.path.join(self.rqdir, Node.flkey, f)
                try:
                    with open(fname, 'r') as df:
                        for line in df:
@ -273,7 +286,7 @@ class Node(object):
                                                ssh_opts=self.ssh_opts,
                                                dpath=ddir,
                                                timeout=self.timeout)
-                self.check_code(c, 'get_files', 'tools.get_files_rsync')
+                self.check_code(c, 'get_files', 'tools.get_files_rsync', e)
    def put_files(self):
        self.logger.info('node: %s, IP: %s' % (self.id, self.ip))
@ -332,12 +345,13 @@ class Node(object):
                        result[f] = s
        return result
-    def check_code(self, code, func_name, cmd, ok_codes=None):
+    def check_code(self, code, func_name, cmd, err, ok_codes=None):
        if code:
            if not ok_codes or code not in ok_codes:
-                self.logger.warning("%s: got bad exit code %s,"
+                self.logger.warning("id: %s, fqdn: %s, ip: %s, func: %s, "
-                                    " node: %s, ip: %s, cmd: %s" %
+                                    "cmd: '%s' exited %d, error: %s" %
-                                    (func_name, code, self.id, self.ip, cmd))
+                                    (self.id, self.fqdn, self.ip,
                                     func_name, cmd, code, err))
    def print_results(self, result_map):
        # result_map should be either mapcmds or mapscr
@ -395,15 +409,30 @@ class NodeManager(object):
                pass
    def __str__(self):
-        pt = Node.print_template
+        def ml_column(matrix, i):
-        header = pt.format('node-id', 'env', 'ip/hostname', 'mac', 'os',
+            a = [row[i] for row in matrix]
-                           'roles', 'online', 'status') + '\n'
+            mc = 0
-        nodestrings = []
+            for word in a:
-        # f3flight: I only did this to not print Fuel when it is hard-filtered
+                lw = len(word)
                mc = lw if (lw > mc) else mc
            return mc + 2
        header = Node.header
        nodestrings = [header]
        for n in self.sorted_nodes():
            if self.filter(n, self.conf['hard_filter']):
                nodestrings.append(n.print_table())
        colwidth = []
        for i in range(len(header)):
            colwidth.append(ml_column(nodestrings, i))
        pt = ''
        for i in range(len(colwidth)):
            pt += '{%s:<%s}' % (i, str(colwidth[i]))
        nodestrings = [(pt.format(*header))]
        for n in self.sorted_nodes():
            if self.filter(n, self.conf['hard_filter']):
                n.pt = pt
                nodestrings.append(str(n))
-        return header + '\n'.join(nodestrings)
+        return '\n'.join(nodestrings)
    def sorted_nodes(self):
        s = [n for n in sorted(self.nodes.values(), key=lambda x: x.id)]
@ -462,6 +491,8 @@ class NodeManager(object):
            sys.exit(7)
        fuelnode = Node(id=0,
                        cluster=0,
                        name='fuel',
                        fqdn='n/a',
                        mac='n/a',
                        os_platform='centos',
                        roles=['fuel'],
@ -499,11 +530,12 @@ class NodeManager(object):
                roles = node_roles
            else:
                roles = str(node_roles).split(', ')
-            keys = "mac os_platform status online ip".split()
+            keys = "fqdn name mac os_platform status online ip".split()
            cl = int(node_data['cluster']) if node_data['cluster'] else None
            params = {'id': int(node_data['id']),
                      # please do NOT convert cluster id to int type
                      # because None can be valid
-                      'cluster': node_data['cluster'],
+                      'cluster': cl,
                      'roles': roles,
                      'conf': self.conf}
            for key in keys:
--- a/timmy/tools.py
+++ b/timmy/tools.py
@ -218,16 +218,6 @@ def mdir(directory):
 def launch_cmd(cmd, timeout, input=None, ok_codes=None):
    def _log_msg(cmd, stderr, code, debug=False, stdin=None, stdout=None):
        message = ('launch_cmd:\n'
                   '___command: %s\n'
                   '______code: %s\n'
                   '____stderr: %s' % (cmd, code, stderr))
        if debug:
            message += '\n_____stdin: %s\n' % stdin
            message += '____stdout: %s' % stdout
        return message
    def _timeout_terminate(pid):
        try:
            os.kill(pid, 15)
@ -235,7 +225,7 @@ def launch_cmd(cmd, timeout, input=None, ok_codes=None):
        except:
            pass
-    logger.info('cmd %s' % cmd)
+    logger.info('launching cmd %s' % cmd)
    p = subprocess.Popen(cmd,
                         shell=True,
                         stdin=subprocess.PIPE,
@ -259,17 +249,16 @@ def launch_cmd(cmd, timeout, input=None, ok_codes=None):
        outs = outs.decode('utf-8')
        errs = errs.decode('utf-8')
        errs = errs.rstrip('\n')
        logger.error(_log_msg(cmd, errs, p.returncode))
    finally:
        if timeout_killer:
            timeout_killer.cancel()
-    logger.info(_log_msg(cmd, errs, p.returncode))
+        input = input.decode('utf-8') if input else None
-    input = input.decode('utf-8') if input else None
+        logger.debug(('___command: %s\n'
-    logger.debug(_log_msg(cmd, errs, p.returncode, debug=True,
+                      '_exit_code: %s\n'
-                          stdin=input, stdout=outs))
+                      '_____stdin: %s\n'
-    if p.returncode:
+                      '____stdout: %s\n'
-        if not ok_codes or p.returncode not in ok_codes:
+                      '____stderr: %s') % (cmd, p.returncode, input, outs,
-            logger.warning(_log_msg(cmd, errs, p.returncode))
+                                           errs))
    return outs, errs, p.returncode
@ -334,13 +323,13 @@ def get_file_scp(ip, file, ddir, timeout=600, recursive=False):
    ddir = os.path.join(os.path.normpath(ddir), dest)
    mdir(ddir)
    r = '-r ' if recursive else ''
-    cmd = "timeout '%s' scp %s'%s':'%s' '%s'" % (timeout, r, ip, file, ddir)
+    cmd = "timeout '%s' scp -q %s'%s':'%s' '%s'" % (timeout, r, ip, file, ddir)
    return launch_cmd(cmd, timeout)
 def put_file_scp(ip, file, dest, timeout=600, recursive=True):
    r = '-r ' if recursive else ''
-    cmd = "timeout '%s' scp %s'%s' '%s':'%s'" % (timeout, r, file, ip, dest)
+    cmd = "timeout '%s' scp -q %s'%s' '%s':'%s'" % (timeout, r, file, ip, dest)
    return launch_cmd(cmd, timeout)
		`@ -0,0 +1,3 @@`
							`#/usr/bin/bash`

							`python ./setup.py bdist_rpm`