As part of our PARDA research, we examined how IO latency varies with increases in overall load (queue length) at the array using one to five hosts accessing the same storage array. The attached image (Figure 6 from the paper) shows the aggregate throughput and average latency observed in the system, with increasing contention at the array. The generated workload is a uniform 16 KB IOs, 67% reads and 70% random, while keeping 32 IOs outstanding from each host. It can be clearly seen that, for this experiment, throughput peaked at three hosts, but overall latency continues to increase with load. In fact, in some cases, beyond a certain level of workload parallelism, throughput can even drop.
An important question to consider for application performance is whether bandwidth is more important or latency. If the former, then pushing the outstanding IOs higher might make sense up to a point. However, for latency sensitive workloads, it is better to provide a target latency and to stop increasing the load (outstanding IOs) on the array beyond that point. The latter is the key observation that PARDA is built around. We use a control equation that uses an input target latency goal beyond which the array can be considered to be overloaded. Using our equation, we modify the outstanding IOs count across VMware ESX hosts in a distributed fashion to stay close to the target IO latency. In the paper, we also detail how our equation also incorporates proportional sharing and fairness. Our experimental results show the technique to be effective.
The PARDA paper has been accepted at the 7th USENIX Conference on File and Storage Technologies (FAST ’09). Also checkout the technical program for a really interesting line up of paper presentations.