Why one Software Defined Storage solution will rule them all!!!
In my travels across Asia I’ve spoken to many customers to
showcase the merits of running their datacenter with software defined storage.
If webscalers like Google and Facebook can run their datacenter using commodity
x86 servers along with software defined storage, why can’t you? When was the
last time you saw 404 in your browser when accessing Google or Facebook? In
this blog post I’ll describe why you should consider Dell EMC ScaleIO as the
ONLY software defined block storage solution that not only replaces traditional
SAN but can also run in hyperconverged environments. No other solution in the
market today can run both at the same time in the same cluster for any
application at any scale!
Software defined storage has been around for a short period
of time and yet it has garnered a lot of attention over the years. There are
many solutions out there in the market that claim to be the leader and that’s
ok. What you need to understand first and foremost is the architecture of how
these software defined storage systems/hyperconverged systems operate (moving
forward I’ll dub them both as SDS). Once you understand how the architecture works
will it then help you decide what’s right for you.
Firstly, solutions that have to rely on an SSD or two SSDs
as caching devices is not a very smart approach. Even when that SSD has to
cache other SSD’s sitting behind it to me is a true waste of SSD resources –
you should be able to use all the SSD’s together as a single pool of storage
for both reads and writes and leverage their true combined potential. Have you
thought about what happens when you have bursty traffic and the SSD fills up and
it cannot de-stage the data quick enough? How do you think it will affect your
VM’s sitting on that volume? What happens if one of the VM’s in that same
volume is busier than the rest? That will undoubtedly cause hot spots and
performance drain. Yes, you can set QoS and so forth but you’re still only
limiting yourself to those one or two SSD’s and it is just not wise. How are
you going to predict bursty traffic? The answer is you can’t and typically
admins will need to move that VM to another host to give it the performance it
needs which will then cause more unnecessary backend network traffic. With ScaleIO
on the other hand, it abstracts all the devices from all the servers and puts them
all together in a single storage pool. Imagine the performance you’ll get if
you had 100 x SSD’s serving every volume you create from that pool? And all of this
processing is done parallel! Who needs data locality?! That is why we are
seeing insane performance figures and no wonder storagereview.com has said it
is the fastest thing that they’ve ever tested. You can read up on the review
and be sure to read the first sentence in the 2nd paragraph in the Conclusion
section ;-)
http://www.storagereview.com/emc_vxrack_node_powered_by_scaleio_review. You can also see the other test results they’ve performed such as Sysbench OLTP, VMmark and SQL. Btw can you find performance reports of the other leaders on there?
http://www.storagereview.com/emc_vxrack_node_powered_by_scaleio_review. You can also see the other test results they’ve performed such as Sysbench OLTP, VMmark and SQL. Btw can you find performance reports of the other leaders on there?
Secondly, there is the software defined storage appliance
approach. This one bothers me all the time as we see these other leaders position
themselves for the datacenter. An appliance based approach = one cluster. Multiple
clusters = multiple separate appliances. Btw, the average size of a cluster is
between 12-16 nodes and I’ve never seen or heard of a customer go up to 64
nodes. With this appliance approach there is no sharing of resources from a
performance and storage perspective. It’s like managing multiple arrays again
so imagine the pain in having to manage multiple appliances! Oh and don’t
forget data migrations from one appliance to another or the fact you cannot add
storage and compute independently. Yes, these leaders say you can but if you
actually read their documentation, they say upgrades should be symmetric so that
performance is predictable otherwise they cannot guarantee performance. Well,
what happens if I don’t need the extra compute and just need storage? That
means my CapEx has just gone up due to their rigid design. And I can’t even add
an all flash to an existing hybrid cluster. I have to start a new cluster!? This
siloed approach should never be considered for the datacenter.
At Dell EMC World 2017 it was announced that ScaleIO is now
part of the Dell EMC Enterprise Storage Family. What that means is that ScaleIO
is classified as a datacenter solution along with VMAX and XtremIO. Not bad for a software
defined storage solution that is only 6 years old (https://en.wikipedia.org/wiki/EMC_ScaleIO)
compared to VMAX which has been around for 25 years (https://en.wikipedia.org/wiki/EMC_Symmetrix).
You start with a minimum of 3 nodes and you can scale compute and storage
independently all the way up to 1,024 servers in a single cluster! It has all
the features one would come to expect from an enterprise grade solution like multi-tenancy, snapshots, thin provisioning, QoS, etc. Yes, it doesn’t have certain data
services such as native replication or data reduction but these features are
coming. Once ScaleIO has its own native replication (Btw, why not let the
application do the replication? Shouldn’t the application know it has multiple
copies of the data instead? ScaleIO supports any application replication
solution today such as Oracle Data Guard) and compression, what else is left?
Yes, there is the notion of active-active but is the application itself aware
that it is active-active? And one thing that makes me laugh is when the leaders
perform POCs with their metro-clustering ability but with only one VM - that
doesn’t really simulate a real-life workload! Or when they do performance
testing, they only test for a short period of time and see phenomenal results.
Little do customers’ realise is that the workload is still hitting the cache! A
performance test should run for a minimum of 30 mins and not 5 mins.
Lastly, I like it when these leaders say that they can do
everything from storage to backup to disaster recovery. One thing that you have
to remember is that by doing everything in one appliance you’re literally
putting all your eggs in one basket. Yes, it is one throat to choke but in my
mind you need solutions that were designed to solve a particular problem - not
a band aid approach. For example, when it comes to backup these leaders actually
perform a snapshot. A snapshot is a point in time copy of the data and is not a
true secondary copy of the data which is a backup. Also, you must keep primary
data and backup data separate, i.e. they must be isolated, redundant and
resilient. Think of a photo and a cassette tape. If I take multiple photos that’s
multiple point in time copies and usually the storage system won’t be able to
handle that many snaps that is kept for weeks/months on end. What happens when after
the 9am snap and a data corruption occurs. When it is time to take the next
snap say 10 mins later, you’re actually snapshotting the corruption as well. The
right strategy here is to define what your actual RPO/RTO requirements are for
each application. By doing so will allow you to select the right data protection
strategy. Take Dell EMC’s RecoverPoint (RP) and the snapshot example above.
With RP I can actually roll back in time prior to the corruption and recover
from there. With the snapshot though I would have lost 10 minutes’ worth of
data when I restored to the previous snapshot. Not an ideal situation to be in.
A backup strategy should have a dedicated backup software and backup appliance
like Avamar and Data Domain that is designed to back up your most important
asset, i.e. your data.
Well that was my first blog post, I do hope you found it
educational and interesting. I look forward to your comments below and any suggestions
to make this blog more interesting for you.
Next blog post: Would you like de-dupe and auto-tiering with
that?
Comments
Post a Comment