Add some nice docs on what this is.

2013-03-06 22:24:29 -08:00 · 2013-03-06 22:24:29 -08:00 · a72b7b99cd
commit a72b7b99cd
parent 73fc7534b7
2 changed files with 180 additions and 1 deletions
--- a/cloudinit/mergers/init.py
+++ b/cloudinit/mergers/init.py
@ -85,7 +85,7 @@ class LookupMerger(UnknownMerger):
 def dict_extract_mergers(config):
    parsed_mergers = []
-    raw_mergers = config.get('merger_how')
+    raw_mergers = config.get('merge_how')
    if raw_mergers is None:
        raw_mergers = config.get('merge_type')
    if raw_mergers is None:
--- a/doc/merging.txt
+++ b/doc/merging.txt
@ -0,0 +1,179 @@
 Arriving in 0.7.2 is a new way to handle dictionary merging in cloud-init.
 ---
 Overview
 --------
 This was done because it has been a common feature request that there be a
 way to specify how cloud-config yaml "dictionaries" are merged together when
 there are multiple yamls to merge together (say when performing an #include).
 Since previously the merging algorithm was very simple and would only overwrite
 and not append lists, or strings, and so on it was decided to create a new and
 improved way to merge dictionaries (and there contained objects) together in a
 way that is customizable, thus allowing for users who provide cloud-config data
 to determine exactly how there objects will be merged.
 For example.
 #cloud-config (1)
 run_cmd:
  - bash1
  - bash2
 #cloud-config (2)
 run_cmd:
  - bash3
  - bash4
 The previous way of merging the following 2 objects would result in a final 
 cloud-config object that contains the following.
 #cloud-config (merged)
 run_cmd:
  - bash3
  - bash4
 Typically this is not what users want, instead they would likely prefer:
 #cloud-config (merged)
 run_cmd:
  - bash1
  - bash2
  - bash3
  - bash4
 This way makes it easier to combine the various cloud-config objects you have
 into a more useful list, thus reducing duplication that would have had to
 occur in the previous method to accomplish the same result.
 Customizability
 ---------------
 Since the above merging algorithm may not always be the desired merging
 algorithm (like how the merging algorithm in < 0.7.2 was not always the preferred
 one) the concept of customizing how merging can be done was introduced through
 a new concept call 'merge classes'. 
 A merge class is a class defintion which provides functions that can be used
 to merge a given type with another given type.
 An example of one of these merging classes is the following:
 class Merger(object):
    def __init__(self, merger, opts):
        self._merger = merger
        self._overwrite = 'overwrite' in opts
    # This merging algorithm will attempt to merge with
    # another dictionary, on encountering any other type of object
    # it will not merge with said object, but will instead return
    # the original value
    #
    # On encountering a dictionary, it will create a new dictionary
    # composed of the original and the one to merge with, if 'overwrite'
    # is enabled then keys that exist in the original will be overwritten
    # by keys in the one to merge with (and associated values). Otherwise
    # if not in overwrite mode the 2 conflicting keys themselves will
    # be merged.
    def _on_dict(self, value, merge_with):
        if not isinstance(merge_with, (dict)):
            return value
        merged = dict(value)
        for (k, v) in merge_with.items():
            if k in merged:
                if not self._overwrite:
                    merged[k] = self._merger.merge(merged[k], v)
                else:
                    merged[k] = v
            else:
                merged[k] = v
        return merged
 As you can see there is a '_on_dict' method here that will be given a source value
 and a value to merge with. The result will be the merged object. This code itself
 is called by another merging class which 'directs' the merging to happen by
 analyzing the types of the objects to merge and attempting to find a know object
 that will merge that type. I will avoid pasting that here, but it can be found
 in the mergers/__init__.py file (see LookupMerger and UnknownMerger).
 So following the typical cloud-init way of allowing source code to be downloaded
 and used dynamically, it is possible for users to inject there own merging files
 to handle specific types of merging as they choose (the basic ones included will
 handle lists, dicts, and strings). Note how each merge can have options associated
 with it which affect how the merging is performed, for example a dictionary merger
 can be told to overwrite instead of attempt to merge, or a string merger can be
 told to append strings instead of discarding other strings to merge with.
 How to activate
 ---------------
 There are a few ways to activate the merging algorithms, and to customize them
 for your own usage.
 1. The first way involves the usage of MIME messages in cloud-init to specify
   multipart documents (this is one way in which multiple cloud-config is joined
   together into a single cloud-config). Two new headers are looked for, both
   of which can define the way merging is done (the first header to exist wins).
   These new headers (in lookup order) are 'Merge-Type' and 'X-Merge-Type'. The value
   should be a string which will satisfy the new merging format defintion (see
   below for this format).
 2. The second way is actually specifying the merge-type in the body of the
   cloud-config dictionary. There are 2 ways to specify this, either as a string
   or as a dictionary (see format below). The keys that are looked up for this
   definition are the following (in order), 'merge_how', 'merge_type'.
 *String format*
 The string format that is expected is the following.
 "classname(option1,option2)+classname2(option3,option4)" (and so on)
 The class name there will be connected to class names used when looking for the
 class that can be used to merge and options provided will be given to the class
 on construction of that class.
 For example, the default string that is used when none is provided is the following:
 "list(extend)+dict()+str(append)"
 *Dictionary format*
 In cases where a dictionary can be used to specify the same information as the
 string format (ie option #2 of above) it can be used, for example.
 merge_how:
 - name: list
   settings: [extend]
 - name: dict
   settings: []
 - name: str
   settings: [append]
 This would be the equivalent format for default string format but in dictionary
 form instead of string form.
 Specifying multiple types and its effect
 ----------------------------------------
 Now you may be asking yourself, if I specify a merge-type header or dictionary
 for every cloud-config that I provide, what exactly happens?
 The answer is that when merging, a stack of 'merging classes' is kept, the
 first one on that stack is the default merging classes, this set of mergers
 will be used when the first cloud-config is merged with the initial empty
 cloud-config dictionary. If the cloud-config that was just merged provided a 
 set of merging classes (via the above formats) then those merging classes will
 be pushed onto the stack. Now if there is a second cloud-config to be merged then
 the merging classes from the cloud-config before the first will be used (not the
 default) and so on. This way a cloud-config can decide how it will merge with a
 cloud-config dictionary coming after it.
 Other uses
 ----------
 The default merging algorithm for merging conf.d yaml files (which form a initial
 yaml config for cloud-init) was also changed to use this mechanism so its full
 benefits (and customization) can also be used there as well. Other places that
 used the previous merging are also similar now extensible (metadata merging for
 example).