Merge pull request #47 from adobdin/master

1.5.1
This commit is contained in:
Alexander Dobdin 2016-06-14 10:49:47 +04:00 committed by GitHub
commit ad9d1fea0b
6 changed files with 187 additions and 71 deletions

3
build-rpm.sh Executable file
View File

@ -0,0 +1,3 @@
#/usr/bin/bash
python ./setup.py bdist_rpm

View File

@ -1,31 +1,120 @@
============= =====================
Configuration General configuration
============= =====================
There is default configuration file ``config.yaml``, which can be used by the scripts. All default configuration values are defined in ``timmy/conf.py``. Timmy works with these values if no configuration file is provided.
If you wish to keep several configuration files, that is possible - just copy it and explicitly provide the name of it once you launch a script (``--config`` option). If a configuration file is provided via ``-c | --config`` option, it overlays the default configuration.
An example of a configuration file is ``config.yaml``.
Here is the description of available parameters in configuration file: Some of the parameters available in configuration file:
* **ssh** parameters of *SSH*
* **opts** parameters to send to ssh command directly (recommended to leave at default)
* **vars** environment variables to set for SSH
* **ssh_opts** parameters to send to ssh command directly (recommended to leave at default), such as connection timeout, etc. See ``timmy/conf.py`` to review defaults.
* **env_vars** environment variables to pass to the commands and scripts - you can use this to expand variables in commands or scripts
* **fuel_ip** the IP address of the master node in the environment * **fuel_ip** the IP address of the master node in the environment
* **rqdir** the path of *rqdir*, the directory containing info about commands to execute and logs to gather * **fuel_user** username to use for accessing Nailgun API
* **out-dir** directory to store output data * **fuel_pass** password to access Nailgun API
* **timeout** timeout for SSH commands in seconds * **rqdir** the path of *rqdir*, the directory containing scripts to execute and filelists to pass to rsync
* **archives** directory to store the generated archives * **out_dir** directory to store output data
* **log_files** path and filters for log files * **archive_dir** directory to put resulting archives into
* **timeout** timeout for SSH commands and scripts in seconds
Nodes which are stored in fuel database can be filtered by the following parameters: ===================
* roles, Configuring actions
* online ===================
* status the list of statuses ex. ['ready', 'discover']
* **node_ids** the list of ids, ex. [0,5,6]
* **hard_filter** hard filter for nodes Actions can be configured in a separate yaml file (by default ``rq.yaml`` is used) and / or defind in the main config file or passed via command line options ``-P``, ``-C``, ``-S``, ``-G``.
* **soft_filter** soft filters for nodes
The following actions are available for definition:
* **put** - a list of tuples / 2-element lists: [source, destination]. Passed to ``scp`` like so ``scp source <node-ip>:destination``. Wildcards supported for source.
* **cmds** - a list of dicts: {'command-name':'command-string'}. Example: {'command-1': 'uptime'}. Command string is a bash string. Commands are executed in a sorted order of their names.
* **scripts** - a list of script filenames located on a local system. If filename does not contain path separator, the script is expected ot be located inside ``rqdir/scripts``. Otherwise the provided path is used to read the script.
* **files** - a list of filenames to collect. passed to ``scp``. Supports wildcards.
* **filelists** - a list of filelist filenames located on a local system. Filelist is a text file containing files and directories to collect, passed to rsync. Does not support wildcards. If filename does not contain path separator, the filelist is expected to be located inside ``rqdir/filelists``. Otherwise the provided path is used to read the filelist.
* **log_files**
** **path** - base path to scan for logs
** **include** - regexp string to match log files against for inclusion (if not set = include all)
** **exclude** - regexp string to match log files against. Excludes matched files from collection.
** **start** - date or datetime string to collect only files modified on or after the specified time. Format - ``YYYY-MM-DD`` or ``YYYY-MM-DD HH:MM:SS``
===============
Filtering nodes
===============
* **soft_filter** - use to skip any operations on non-matching nodes
* **hard_filter** - same as above but also removes non-matching nodes from NodeManager.nodes dict - useful when using timmy as a module
Nodes can be filtered by the following parameters defined inside **soft_filter** and/or **hard_filter**:
* **roles** - the list of roles, ex. **['controller','compute']**
* **online** - enabled by default to skip non-accessible nodes
* **status** - the list of statuses. Default: **['ready', 'discover']**
* **ids** - the list of ids, ex. **[0,5,6]**
* any other attribute of Node object which is a simple type (int, float, str, etc.) or a list containing simple types
Lists match **any**, meaning that if any element of the filter list matches node value (if value is a list - any element in it), node passes.
Negative filters are possible by pretending filter parameter with **no_**, example: **no_id = [0]** will filter out Fuel.
Negative lists also match **any** - if any match / collision found, node is skipped.
You can combine any number of positive and negative filters as long as their names differ (since this is a dict).
You can use both positive and negative parameters to match the same node parameter (though it does not make much sense):
**roles = ['controller', 'compute']**
**no_roles = ['compute']**
This will skip computes and run only on controllers. Like stated above, does not make much sense :)
=============================
Parameter-based configuration
=============================
It is possible to define special **by_<parameter-name>** dicts in config to (re)define node parameters based on other parameters. For example:
::
by_roles:
controller:
cmds: {'check-uptime': 'uptime'}
In this example for any controller node, cmds setting will be reset to the value above. For nodes without controller role, default (none) values will be used.
=============
rqfile format
=============
``rqfile`` format is a bit different from config. The basic difference:
**config:**
::
scripts: [a ,b, c]
by_roles:
compute:
scripts: [d, e, f]
**rqfile:**
::
scripts:
__default: [a, b, c]
by_roles:
compute: [d, e, f]
The **config** and **rqfile** definitions presented above are equivalent. It is possible to define config in a config file using the **config** format, or in an **rqfile** using **rqfile** format, linking to the **rqfile** in config with ``rqfile`` setting. It is also possible to define part here and part there. Mixing identical parameters in both places is not recommended - results may be unpredictable (such scenario was not thoroughly tested). In general **rqfile** is good for fewer settings with more parameter-based variations (``by_``), and main config for more different settings with less such variations.
===============================
Configuration application order
===============================
Configuration is assembled and applied in a specific order:
1. default configuration is initialized. See ``timmy/conf.py`` for details.
2. command line parameters, if defined, are used to modify the configuration.
3. **rqfile**, if defined (default - ``rq.yaml``), is converted and injected into the configuration. At this stage the configuration is in its final form.
4. for every node, configuration is applied, except ``once_by_`` directives:
4.1 first the top-level attributes are set
4.2 then ``by_<attribute-name>`` parameters except ``by_id`` are iterated to override or append(accumulate) the attributes
4.3 then ``by_id`` is iterated to override any matching attributes, redefining what was set before
5. finally ``once_by_`<attribute-name>`` parameters are applied - only for one matching node for any set of matching values. This is useful for example if you want a specific file or command from only a single node matching a specific role, like running ``nova list`` only on one controller.
Once you are done with the configuration, you might want to familiarize yourself with :doc:`Usage </usage>`. Once you are done with the configuration, you might want to familiarize yourself with :doc:`Usage </usage>`.

View File

@ -5,3 +5,6 @@ all_files = 1
[upload_sphinx] [upload_sphinx]
upload-dir = doc/_build/html upload-dir = doc/_build/html
[bdist_rpm]
requires = python >= 2.6

View File

@ -1,5 +1,5 @@
project_name = 'timmy' project_name = 'timmy'
version = '1.4.0' version = '1.5.1'
if __name__ == '__main__': if __name__ == '__main__':
exit(0) exit(0)

View File

@ -46,10 +46,10 @@ class Node(object):
conf_match_prefix = 'by_' conf_match_prefix = 'by_'
conf_default_key = '__default' conf_default_key = '__default'
conf_priority_section = conf_match_prefix + 'id' conf_priority_section = conf_match_prefix + 'id'
print_template = '{0:<14} {1:<3} {2:<16} {3:<18} {4:<10} {5:<30}' header = ['node-id', 'env', 'ip', 'mac', 'os',
print_template += ' {6:<6} {7}' 'roles', 'online', 'status', 'name', 'fqdn']
def __init__(self, id, mac, cluster, roles, os_platform, def __init__(self, id, name, fqdn, mac, cluster, roles, os_platform,
online, status, ip, conf, logger=None): online, status, ip, conf, logger=None):
self.id = id self.id = id
self.mac = mac self.mac = mac
@ -70,6 +70,8 @@ class Node(object):
self.logsize = 0 self.logsize = 0
self.mapcmds = {} self.mapcmds = {}
self.mapscr = {} self.mapscr = {}
self.name = name
self.fqdn = fqdn
self.filtered_out = False self.filtered_out = False
self.outputs_timestamp = False self.outputs_timestamp = False
self.outputs_timestamp_dir = None self.outputs_timestamp_dir = None
@ -77,14 +79,18 @@ class Node(object):
self.logger = logger or logging.getLogger(__name__) self.logger = logger or logging.getLogger(__name__)
def __str__(self): def __str__(self):
fields = self.print_table()
return self.pt.format(*fields)
def print_table(self):
if not self.filtered_out: if not self.filtered_out:
my_id = self.id my_id = self.id
else: else:
my_id = str(self.id) + ' [skipped]' my_id = str(self.id) + ' [skipped]'
pt = self.print_template return [str(my_id), str(self.cluster), str(self.ip), str(self.mac),
return pt.format(my_id, self.cluster, self.ip, self.mac, self.os_platform, ','.join(self.roles),
self.os_platform, ','.join(self.roles), str(self.online), str(self.status),
str(self.online), self.status) str(self.name), str(self.fqdn)]
def apply_conf(self, conf, clean=True): def apply_conf(self, conf, clean=True):
@ -190,7 +196,7 @@ class Node(object):
env_vars=self.env_vars, env_vars=self.env_vars,
timeout=self.timeout, timeout=self.timeout,
prefix=self.prefix) prefix=self.prefix)
self.check_code(code, 'exec_cmd', c[cmd], ok_codes) self.check_code(code, 'exec_cmd', c[cmd], errs, ok_codes)
try: try:
with open(dfile, 'w') as df: with open(dfile, 'w') as df:
df.write(outs.encode('utf-8')) df.write(outs.encode('utf-8'))
@ -202,7 +208,10 @@ class Node(object):
self.scripts = sorted(self.scripts) self.scripts = sorted(self.scripts)
mapscr = {} mapscr = {}
for scr in self.scripts: for scr in self.scripts:
f = os.path.join(self.rqdir, Node.skey, scr) if os.path.sep in scr:
f = scr
else:
f = os.path.join(self.rqdir, Node.skey, scr)
self.logger.info('node:%s(%s), exec: %s' % (self.id, self.ip, f)) self.logger.info('node:%s(%s), exec: %s' % (self.id, self.ip, f))
dfile = os.path.join(ddir, 'node-%s-%s-%s' % dfile = os.path.join(ddir, 'node-%s-%s-%s' %
(self.id, self.ip, os.path.basename(f))) (self.id, self.ip, os.path.basename(f)))
@ -217,7 +226,8 @@ class Node(object):
env_vars=self.env_vars, env_vars=self.env_vars,
timeout=self.timeout, timeout=self.timeout,
prefix=self.prefix) prefix=self.prefix)
self.check_code(code, 'exec_cmd', 'script %s' % f, ok_codes) self.check_code(code, 'exec_cmd', 'script %s' % f, errs,
ok_codes)
try: try:
with open(dfile, 'w') as df: with open(dfile, 'w') as df:
df.write(outs.encode('utf-8')) df.write(outs.encode('utf-8'))
@ -238,7 +248,7 @@ class Node(object):
ok_codes=ok_codes, ok_codes=ok_codes,
input=input, input=input,
prefix=self.prefix) prefix=self.prefix)
self.check_code(code, 'exec_simple_cmd', cmd, ok_codes) self.check_code(code, 'exec_simple_cmd', cmd, errs, ok_codes)
def get_files(self, timeout=15): def get_files(self, timeout=15):
self.logger.info('node: %s, IP: %s' % (self.id, self.ip)) self.logger.info('node: %s, IP: %s' % (self.id, self.ip))
@ -253,11 +263,14 @@ class Node(object):
file=f, file=f,
ddir=ddir, ddir=ddir,
recursive=True) recursive=True)
self.check_code(code, 'get_files', 'tools.get_file_scp') self.check_code(code, 'get_files', 'tools.get_file_scp', errs)
else: else:
data = '' data = ''
for f in self.filelists: for f in self.filelists:
fname = os.path.join(self.rqdir, Node.flkey, f) if os.path.sep in f:
fname = f
else:
fname = os.path.join(self.rqdir, Node.flkey, f)
try: try:
with open(fname, 'r') as df: with open(fname, 'r') as df:
for line in df: for line in df:
@ -273,7 +286,7 @@ class Node(object):
ssh_opts=self.ssh_opts, ssh_opts=self.ssh_opts,
dpath=ddir, dpath=ddir,
timeout=self.timeout) timeout=self.timeout)
self.check_code(c, 'get_files', 'tools.get_files_rsync') self.check_code(c, 'get_files', 'tools.get_files_rsync', e)
def put_files(self): def put_files(self):
self.logger.info('node: %s, IP: %s' % (self.id, self.ip)) self.logger.info('node: %s, IP: %s' % (self.id, self.ip))
@ -332,12 +345,13 @@ class Node(object):
result[f] = s result[f] = s
return result return result
def check_code(self, code, func_name, cmd, ok_codes=None): def check_code(self, code, func_name, cmd, err, ok_codes=None):
if code: if code:
if not ok_codes or code not in ok_codes: if not ok_codes or code not in ok_codes:
self.logger.warning("%s: got bad exit code %s," self.logger.warning("id: %s, fqdn: %s, ip: %s, func: %s, "
" node: %s, ip: %s, cmd: %s" % "cmd: '%s' exited %d, error: %s" %
(func_name, code, self.id, self.ip, cmd)) (self.id, self.fqdn, self.ip,
func_name, cmd, code, err))
def print_results(self, result_map): def print_results(self, result_map):
# result_map should be either mapcmds or mapscr # result_map should be either mapcmds or mapscr
@ -395,15 +409,30 @@ class NodeManager(object):
pass pass
def __str__(self): def __str__(self):
pt = Node.print_template def ml_column(matrix, i):
header = pt.format('node-id', 'env', 'ip/hostname', 'mac', 'os', a = [row[i] for row in matrix]
'roles', 'online', 'status') + '\n' mc = 0
nodestrings = [] for word in a:
# f3flight: I only did this to not print Fuel when it is hard-filtered lw = len(word)
mc = lw if (lw > mc) else mc
return mc + 2
header = Node.header
nodestrings = [header]
for n in self.sorted_nodes(): for n in self.sorted_nodes():
if self.filter(n, self.conf['hard_filter']): if self.filter(n, self.conf['hard_filter']):
nodestrings.append(n.print_table())
colwidth = []
for i in range(len(header)):
colwidth.append(ml_column(nodestrings, i))
pt = ''
for i in range(len(colwidth)):
pt += '{%s:<%s}' % (i, str(colwidth[i]))
nodestrings = [(pt.format(*header))]
for n in self.sorted_nodes():
if self.filter(n, self.conf['hard_filter']):
n.pt = pt
nodestrings.append(str(n)) nodestrings.append(str(n))
return header + '\n'.join(nodestrings) return '\n'.join(nodestrings)
def sorted_nodes(self): def sorted_nodes(self):
s = [n for n in sorted(self.nodes.values(), key=lambda x: x.id)] s = [n for n in sorted(self.nodes.values(), key=lambda x: x.id)]
@ -462,6 +491,8 @@ class NodeManager(object):
sys.exit(7) sys.exit(7)
fuelnode = Node(id=0, fuelnode = Node(id=0,
cluster=0, cluster=0,
name='fuel',
fqdn='n/a',
mac='n/a', mac='n/a',
os_platform='centos', os_platform='centos',
roles=['fuel'], roles=['fuel'],
@ -499,11 +530,12 @@ class NodeManager(object):
roles = node_roles roles = node_roles
else: else:
roles = str(node_roles).split(', ') roles = str(node_roles).split(', ')
keys = "mac os_platform status online ip".split() keys = "fqdn name mac os_platform status online ip".split()
cl = int(node_data['cluster']) if node_data['cluster'] else None
params = {'id': int(node_data['id']), params = {'id': int(node_data['id']),
# please do NOT convert cluster id to int type # please do NOT convert cluster id to int type
# because None can be valid # because None can be valid
'cluster': node_data['cluster'], 'cluster': cl,
'roles': roles, 'roles': roles,
'conf': self.conf} 'conf': self.conf}
for key in keys: for key in keys:

View File

@ -218,16 +218,6 @@ def mdir(directory):
def launch_cmd(cmd, timeout, input=None, ok_codes=None): def launch_cmd(cmd, timeout, input=None, ok_codes=None):
def _log_msg(cmd, stderr, code, debug=False, stdin=None, stdout=None):
message = ('launch_cmd:\n'
'___command: %s\n'
'______code: %s\n'
'____stderr: %s' % (cmd, code, stderr))
if debug:
message += '\n_____stdin: %s\n' % stdin
message += '____stdout: %s' % stdout
return message
def _timeout_terminate(pid): def _timeout_terminate(pid):
try: try:
os.kill(pid, 15) os.kill(pid, 15)
@ -235,7 +225,7 @@ def launch_cmd(cmd, timeout, input=None, ok_codes=None):
except: except:
pass pass
logger.info('cmd %s' % cmd) logger.info('launching cmd %s' % cmd)
p = subprocess.Popen(cmd, p = subprocess.Popen(cmd,
shell=True, shell=True,
stdin=subprocess.PIPE, stdin=subprocess.PIPE,
@ -259,17 +249,16 @@ def launch_cmd(cmd, timeout, input=None, ok_codes=None):
outs = outs.decode('utf-8') outs = outs.decode('utf-8')
errs = errs.decode('utf-8') errs = errs.decode('utf-8')
errs = errs.rstrip('\n') errs = errs.rstrip('\n')
logger.error(_log_msg(cmd, errs, p.returncode))
finally: finally:
if timeout_killer: if timeout_killer:
timeout_killer.cancel() timeout_killer.cancel()
logger.info(_log_msg(cmd, errs, p.returncode)) input = input.decode('utf-8') if input else None
input = input.decode('utf-8') if input else None logger.debug(('___command: %s\n'
logger.debug(_log_msg(cmd, errs, p.returncode, debug=True, '_exit_code: %s\n'
stdin=input, stdout=outs)) '_____stdin: %s\n'
if p.returncode: '____stdout: %s\n'
if not ok_codes or p.returncode not in ok_codes: '____stderr: %s') % (cmd, p.returncode, input, outs,
logger.warning(_log_msg(cmd, errs, p.returncode)) errs))
return outs, errs, p.returncode return outs, errs, p.returncode
@ -334,13 +323,13 @@ def get_file_scp(ip, file, ddir, timeout=600, recursive=False):
ddir = os.path.join(os.path.normpath(ddir), dest) ddir = os.path.join(os.path.normpath(ddir), dest)
mdir(ddir) mdir(ddir)
r = '-r ' if recursive else '' r = '-r ' if recursive else ''
cmd = "timeout '%s' scp %s'%s':'%s' '%s'" % (timeout, r, ip, file, ddir) cmd = "timeout '%s' scp -q %s'%s':'%s' '%s'" % (timeout, r, ip, file, ddir)
return launch_cmd(cmd, timeout) return launch_cmd(cmd, timeout)
def put_file_scp(ip, file, dest, timeout=600, recursive=True): def put_file_scp(ip, file, dest, timeout=600, recursive=True):
r = '-r ' if recursive else '' r = '-r ' if recursive else ''
cmd = "timeout '%s' scp %s'%s' '%s':'%s'" % (timeout, r, file, ip, dest) cmd = "timeout '%s' scp -q %s'%s' '%s':'%s'" % (timeout, r, file, ip, dest)
return launch_cmd(cmd, timeout) return launch_cmd(cmd, timeout)