3.1 KiB
The Need For Staccato
In this document we describe why a transfer service like Staccato is needed. This need breaks down into three major areas:
Robustness Efficiency Workload
Robust Transfers
Data transfers fail. Transmitting large amount of data can be expensive. Ideally transfers are check pointed along the way so that when the inevitable failure occurs, the transfer can be restarted from the last known checkpoint thus minimizing the redundant data that is sent.
Unfortunately performing such check-pointing is non trivial. The information needs to be consistently stored in a way that will survive the termination of both source and destination transfer endpoints. Locking mechanisms and consistency measures must be in place to be certain that only one transfer takes place at a time. Certain protocols do not allow for partial transfer in which case a cache is needed to minimize the potential of transmission.
There are many more complications that make the job of monitoring a transfer difficult. Rather than trying to embed all of the needed complicated logic into a traditional client, this is the kind of thing best implemented with a service.
Efficient Transfers
Commonly the protocol used for data transfer is defined by the storage system in which the source data lives. This is not always the best choice, and it conflates the concepts of an access protocol and a transfer protocol. Often times the best protocol to use is determined not only by the architecture and workload of the source storage system, but also that of the destination as well as that of the network.
A service like Staccato is in a architectural position to know more about what is happening on all three of these components, the source storage system, the destination storage system, and the network. It can avoid thrashing and overheating of resources by scheduling transfers at optimal times, select optimal protocols (think of bittorrent when a single source is requested for download to many destinations), and setting more optimal parameters on protocols for the transfer at hand (think of TCP buffer sizes).
Having this knowledge and functionality in a traditional client would be overly complicated it not impossible.
Workload
Clients often wish to do download a data set to a local file, or upload a local file to a more well managed storage system. Such clients are the target users for this service. As it commonly stands today clients download files by connecting to a remote storage system by speaking its protocol and marshaling ever byte of that protocol (including security signing and other potentially processor intensive work). The workload put upon the client scales with the size of the image and the protocol in use. Rarely does the client plan its resources and time outs around these things. In these case the client really just wants file, it doesn't want to do the work or contribute the resources (CPU, NIC, memory) to do it.
Because of this a service that offloads this burden makes sense.