Not many people know that spindle consolidation in storage devices can actually increase performance without any additional cost. In this article, I’ll show you how you can improve virtual storage performance for better storage utilization and lower cost using some of the virtual storage consolidation techniques that CloudPhysics uses to help our customers.
Here’s an empirical example. We separately ran two workloads on an isolated 3-disk RAID 5 group. In repeatable experiments, we then ran them together at the same time on a 6-disk RAID 5 group to evaluate the performance gain or loss from sharing. Notice that this comparison keeps the total number of physics disks identical. In other words, the isolated tests were with 3 disks each (total of 6) and the shared one was with 6 disks. For anyone trying this at home, this is a critical factor in this experiment since it ensures that the baseline cost, power and reliability profiles don’t change dramatically. And the same applies to performance from sheer spindle count. So, what does this virtual storage performance experimental bonanza look like? Read on.
We tested two workloads that both exhibit random IO seek patterns: DVDStore and Swingbench OLTP. To evaluate the effect of sharing, we ran their data disks in isolation and then together. See below for a comparison in terms of average IOPS and 90th percentile latency for the two cases. The IOPS achieved by each application remains largely the same in the isolated and shared cases (some variance is within the margin of experimental error).
Improve virtual storage performance: Conventional thinking #fails
The application metric (TPM), however, shows something remarkable. By conventional thinking, the two workloads should be competing thus resulting in either a net loss or at most net wash. However, we see that DVDStore actually get a higher application benchmark score without reducing the score of Swingbench OLTP.
Even more remarkably, the DVDStore storage IO latency is reduced in case of sharing spindles: the 90th percentile latency dropping from 100 ms to 30 ms! All the while, there is no increase in latency in the “competing” workload Swingbench OLTP.
Let’s take a moment to take stock. Without spending an extra dime, we have improved virtual storage performance. We saw a reduced IO latency by 70% from smarter placement of virtual disks. You might be wondering why this happens and if can we take advantage of this effect in other scenarios.
Additive randomness explained
In the shared spindle case, there are higher number of disks to absorb the bursts. Random workloads are additive in their load and do not destructively interfere with each other. This data clearly shows that as long as the underlying performance from spindle count is scaled up to match, overall achieved performance will also scale as workloads are added. The improvement in latency alone makes the case for consolidating random IO streams onto unified sets of spindles. Another significant benefit of moving away from hard partitioning is work conservation: if one workload is experiencing a transient burst of IO, it is able to expand its footprint out to a much larger number of disks thus allowing absorption of spare capacity from other workloads resulting in much higher burst scaling.
Here’s how you can improve your virtual storage performance today:
- Use vscsiStats to find VMs which issue exclusively random IO
- Figure out which spindle or RAID groups they are running on and the number of disks attached to them
- Setup a RAID group that has the same number of spindles as the sum of the original
- Migrate the VMs onto the larger shared RAID group spindle
- Instantly enjoy more performance for each workload
As a result of the higher random IO performance capabilities of the new storage configuration, you’ll be able to fit more workloads onto datastores before pushing latencies too high. This trick can save you money by deferring purchases of additional storage, caches or other expensive performance tiers. However, in a future post, I’ll describe the anti-pattern for this: don’t mix critical sequential workloads with random IO. Doing so will only increase your cost in terms of $/IOPS. I always recommend enabling Storage I/O Control to manage noisy neighbors.
CloudPhysics customers have benefitted tremendously in CapEx and OpEx from advanced analytics capabilities of our platform. Ping me to find out how we can dramatically lower the CapEx profile of your storage infrastructure and increase performance by doing all of this and much more analysis automatically.