Would you like de-dupe and auto-tiering with that?

What I’ve noticed for a while now is customers will always ask for these features and whilst they are great technologies, I see little relevance for them in today’s SDS environments. Why is that you ask? Well, if you look at auto-tiering this was introduced into storage arrays 10 years ago when SSD’s were really expensive. The SSD’s in the storage array were used to store hot blocks of data and overtime auto-tier those blocks down to magnetic media as the data got colder. In order to keep track of those blocks the storage array will need to consume extra CPU and memory to continuously run algorithms to find out which blocks to move and which blocks not to move…yet. Now that SSD’s are at the cross point of being cost competitive with magnetic media, the notion of auto-tiering will be irrelevant in the very near future if you ask me.

One huge benefit of going all SSD is that you don’t have to worry about which type of volume to assign or is their enough performance for the application to consume. Just assign an all SSD volume and these operational tasks are gone. You’re probably thinking well what about moving blocks of data from NVMe to SSD? I don’t know of any single application that requires 100K IOPs let alone millions and if the SDS system is capable of scaling to millions of IOPs with sub-millisecond latencies with just SSD media then why do you need to auto-tier? Also, with NVMe media being introduced today the adoption rate will be slow as they will be used for specific applications that require extremely low nanosecond latencies, not necessarily for IOPs. These are usually in the form of high compute financial trading applications or complex scientific calculations. Just a handful when compared to the majority of applications that run in your datacenter today.

Next is de-duplication; again, fantastic technology but it was solely designed for the backup world – not for primary storage. De-duplication is very, very CPU and memory heavy and is usually performed as a post process as compared to inline as it needs to compare all blocks of data against data that is already written to disk and find out what is duplicate. Imagine the performance impact of doing everything all at once; storage and backup. Yes, some may say that CPU’s are getting faster so it shouldn’t matter but remember that the most expensive component in a server is the CPU and memory so shouldn’t you maximize your investment by running as many VM’s as possible? If you have to reserve CPU and memory to perform auto-tiering and de-duplication, which if you ask me will save you anywhere between 5-10% total storage, you might as well buy more storage since it is cheaper than buying more CPU and memory. By reducing the VM count per server will ultimately mean more hardware and more real-estate in your datacenter.

Remember, SSD’s are getting cheaper and fatter so the cost of buying more storage is way more cost effective than buying more servers to run your VM’s. Vendors who can offload these data services onto a proprietary card (HW lock in!) or can do both primary storage and backup on to the same appliance to me is a single point of failure. It’s like asking your backup admin to also be the storage admin which in most organisations will violate compliance as that single person will have too much control of your environment. Segregation of duties is very important. Imagine collapsing storage, backup, virtualisation, network and server teams into a single person in your datacenter? Might be perfect for ROBO use cases but never for the enterprise datacenter! 

Comments

Popular posts from this blog

Why one Software Defined Storage solution will rule them all!!!

Attack of the tools!