diff --git a/guidelines/etags.rst b/guidelines/etags.rst new file mode 100644 index 0000000..a326125 --- /dev/null +++ b/guidelines/etags.rst @@ -0,0 +1,199 @@ + +ETags +===== + +ETags_ are "opaque validator[s] for differentiating between +multiple representations of the same resource". They are used in a +variety of ways in HTTP to determine the outcome of conditional +requests as described in :rfc:`7232`. Understanding the full breadth +of ETags requires a very complete understanding of HTTP and the +nuances of resources and their representations. This document does +not attempt to address all applications of ETags at once, instead it +addresses specific use cases that have arisen in response to other +guidelines. It will evolve over time. + +ETags and the lost update problem +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Problem +----------- + +HTTP is fundamentally a system for sending representations of +resources back and forth across a network connection. A common +interaction is to ``GET /some/resource``, modify the representation, +and then ``PUT /some/resource`` to update the resource on the server. +This is an extremely useful and superfically simple pattern that drives +many APIs in OpenStack and beyond. + +That apparently simplicity is misleading: If there are two or more +clients performing operations on ``/some/resource`` in the same time +frame they can experience the `lost update problem`_: + +* Client A and client B both ``GET /some/resource`` and make changes to + their local representation. +* Client B does a ``PUT /some/resource`` at time 1. +* Client A does a ``PUT /some/resource`` at time 2. + +Client B's changes have been lost. Neither client is made aware of this. +This is a problem. + +A Solution +---------- + +HTTP/1.1 and beyond has a solution for this problem called ETags_. +These provide a validator for different representations of a +resource that make it straightforward to determine if the +representation provided by a request or a response is the same as +one already in hand. This is very useful when validating cached GET +requests (the ETag answers the question "is what I have in my cache +the same as what the server would give me?") but is also useful for +avoiding the lost update problem. + +If the scenario described above is modified to use ETags it would +work like this: + +* Client A and client B both ``GET /some/resource``, including a + response header named ``ETag`` that is the same for both clients + (let's make the ETag 'red57'). Details on ETag generation can be + found below. +* They both make changes to their local representation. +* Client B does a ``PUT /some/resource`` and includes a header + named If-Match_ with a value of ``red57``. The request is + successful because the ETag sent in the request is the same as the + ETag generated by the server of its current state of the resource. +* Client A does a ``PUT /some/resource`` and includes the If-Match_ + header with value ``red57``. This request fails (with a 412_ + response code) because ``red57`` no longer matches the ETag + generated by the server: Its current state has been updated by the + request from client B. + + +Client B's changes have not been lost and client A has not +inadvertently changed something that is not in the form they +expected. Client A is made aware of this by the response code. +At this stage, client A can choose to GET the resource again and compare +their local representation with that just retrieved and choose a course +of action. + +Details +------- + +If a service accepts PUT requests and needs to avoid lost updates it +can do so by: + +* Sending responses to GET requests with an ETag **header** (see + below for some discussion on ETag-like attributes in + representations). +* Requiring clients to send an If-Match header with a valid ETag when + processing PUT requests. +* Processing the If-Match header on the server side to compare the + ETag provided in the request with the generated ETag of the + currently stored representation. If there is a match, carry on + with the request action, if not, respond with a 412 status code. + +.. note:: An ETag value is a double-quoted string: ``"the etag"``. + +.. note:: The If-Match header may contain multiple ETags (separated + by commas). If it does, at least one must match for the + request to proceed. + +.. note:: What section of a codebase takes the responsibility of + managing the ETag and If-Match headers is greatly dependent on + the architecture of the service. In general the handler or + controller for each resource should be the locus of + responsibility. It may be there are decorators or libraries + that can be shared but such things are beyond the scope of + this document. Early implementors are encouraged to write code + that is transparent and easy to inspect, allowing easier + future extraction. + +.. note:: ETags_ can be either strong or weak. see :rfc:`7232` for + discussion on how weak ETags may be used. They are not + addressed in this document as their import is primarily + related to cache handling. Strong ETags signify + byte-for-byte equivalence between representations of the + same resource. Weak ETags indicate only semantic equivalence. + +Each of the steps listed above require functionality to generate ETags +for representations. Whenever the representation is different the ETag +should be different. :rfc:`7232#section-2.3.1` has advice on how to +generate good ETags. In practice they should be: + +* Different for different forms of the same resource. For example, the + XML and JSON representations of the same version of a resource + should have different ETags. +* Different from version to version. +* Not based on something that will change when the system restarts. + For example not be based on inodes or database keys that are ints + or other non-universal identifiers. +* Not be based on hashes of strings that do not have reliable + ordering. For example it can be tempting to make md5 or sha hashes + of the JSON string that represents a resource. If the ordering in + that JSON is not guaranteed, the ETag is not useful. + +Ideally they should be fast to calculate or if not fast then easy +to store (when the representation is written). A hash of a last +udpated timestamp and the content-type can work, but only if updates +are less frequent than clock updates. + +.. note:: Many details of how ETags can be useful are left out of this + document. It is worth reading :rfc:`7232` in its entirety to + understand their purpose, how they work, edge cases and + how they interact with other modes of conditional request + handling. + +Special Cases +------------- + +For simple resources that represent a single unified entity the +above handling works well. For more complex resources the situation +becomes more complicated. Some scenarios worth considering: + +* When there is a resource which represents a collection of + resources (e.g. ``GET /resources`` versus ``GET + /resources/some-id``) the strict process for updating one of the + resources in that collection when using ETags would be: + + * ``GET /resources`` to get the list of resources. + * Do some client side processing to choose a singe resource's id. + * ``GET /resources/that-id`` to get the resource and its ``ETag`` + header. + * Modify the local representation. + * ``PUT /resources/that-id`` with an ``If-Match`` header + containing the ETag. + + This may be considered cumbersome. One way to optimize this is to + include an attribute whose value is the ETag in the individual + representations of the singular resources in the collection + resource. Then the second GET above can be skipped as the ETag is + already available. + +* When a resource has sub resources (e.g. an ``/image/id`` resource + contains a metadata attribute whose content is also available at + ``/image/id/metadata``) it can be desirable to retrieve the image + resource and then PUT to the metadata resource. Strictly speaking + this would require a GET of the metadata resource to determine the + ETag. + + If this is a problem, an optimization to work around this is to + allow the ETag of the image resource to be an acceptable ETag of + the metadata resource when provided in an ``If-Match`` header. + If this is done, then it is important that the reverse not be + true: The ETag sent with the metadata resource should not be valid + in an ``If-Match`` header sent to the image resource. + +.. note:: In both of the above scenarios the semantics of ETags are being + violated. An ETag is not a magic key to unlock a resource and + make it writable. It is a value used to determine if two + representations of the same resource are in fact the same. In + the situations above they are comparing different resources. + Services should only do so if they must. Either because the + performance benefit is huge (in which case consider fixing the + performance of the API) or the user experience improvement is + significant. The latter is far more important and legitimate + than the former + +.. _lost update problem: https://www.w3.org/1999/04/Editing/ +.. _ETags: https://tools.ietf.org/html/rfc7232#section-2.3 +.. _412: https://tools.ietf.org/html/rfc7232#section-4.2 +.. _If-Match: https://tools.ietf.org/html/rfc7232#section-3.1