Cohesity: a short intro
Cohesity, since its foundation in 2013, has become a popular name in the Enterprise Storage vendor landscape; although initially Cohesity might have been labeled like “just another backup vendor”, this misplaced and simplistic description has certainly been very unfair to them. Cohesity’s completeness of vision goes way beyond that of being just another backup solution provider, putting them instead at the forefront of the “Battle for Secondary Storage”.
The problem that Cohesity is trying to solve is one that is unfortunately very common: the sprawl of unmanaged, uncorrelated and often unused secondary copies of data endlessly generated by organizations. Multiple copies of the same data are created for backups, archives, test and dev, analytics and DR purposes, resulting into unmanageable, inefficient and complex data siloes. Cohesity can ingest all this data, consolidate it efficiently into one single logical container and make it available for any possible use you might think of. Cohesity is a true DataPlatform meant to enable efficient use of secondary storage. While this goal was initially achieved with software defined, hyper-convergent, scalable appliances, the next inevitable step for Cohesity was to abstract the platform’s capabilities from “the iron” and to develop a Virtual Edition of DataPlatform to address ROBO and IoT use cases and, lastly, a Cloud Edition capable of running on AWS, Azure and Google Cloud. All of these implementations share the same distinctive SpanFS File System and the same API-driven, policy-based management interface, enabling Cohesity’s capabilities to extend to any location your data lives on.
Cohesity at Cloud Field Day 4: aiming at the Clouds
Cohesity’s presentation at Cloud Field Day 4 was mainly focused on Cloud use cases for their technology: extending an organization’s reach into the Cloud creates at the same time new opportunities and new challenges, but also poses new risks, the biggest one being turning the Cloud into another data silo, and a very expensive one. Cohesity’s objective is at the same time to defuse this risk and to turn it into an opportunity to enable better and smarter uses of data stored into Cloud services.
Let’s then go briefly over some of the possible use cases enabled by Cohesity’s Cloud integration: first of all let’s clarify that all the three main Public Service Providers (AWS, Google and Azure) are supported – as well as any other one offering S3-compatible services – and that they are all easily configurable as external targets.
With this in mind, the first and most obvious use case is related to the use of the Cloud as a mere extension of one organization’s Data Center. Even if this one – at a first glance – could be considered as a trivial use case, the benefits of having Cohesity handling it are immediately evident. An on-prem Cohesity cluster can be connected to as many external targets as needed and policies can be created (through the GUI or programmatically) to perform a specific type of task (Archive in this case) with a certain schedule and with the appropriate retention settings. Data, which can either be files, backups or VMware, Hyper-V or AVH VMs, will be archived on object storage in the selected Cloud(s) as per policy and can be retrieved in a granular way when needed. Archived data can also be retrieved to a different destination than its source Cohesity platform, in another Cloud Provider or in another Data Center. This is what Cohesity refers to as CloudArchive and CloudRetrieve, respectively.
The strength of Cohesity’s solution is to make the most efficient use of preventive de-duplication and metadata indexing to reduce the amount of data stored in S3 (and similar cloud storage) and also optimize egress traffic during the retrieval process, with evident savings on your Cloud bills.
Next interesting use case highlighted by Cohesity at CFD4 is the export of VMs from on-prem to Cloud through a feature called CloudSpin. Basically, the process is as simple as creating a CloudSpin policy that includes destination target and schedule/retention settings and then applying it to the desired VM. When the job is executed a local backup of the VM is created, the CloudSpin conversion process to the destination format (AMI, VHD etc.) is performed and the resulting VM image is copied to the target Cloud object storage bucket from where it is ready to be imported and launched when needed. Possible use cases of this solution are, of course, VM migration, DR and Test/Dev.
Cohesity’s CFD4 appearance was light on slides and heavy on demos, they all speak better than a thousand words; in the demo shown below Jon Hildebrand demonstrates first how to archive a VMware VM to multiple Clouds simultaneously, then how to migrate a VM from to on-prem to any of the supported CSPs using CloudSpin. Yes, it is that easy.
Cohesity also demonstrated how – leveraging CloudSpin – to enable Application Mobility between on-prem and Public Cloud by showcasing how to easily clone an entire application stack (Web + SQL servers) to a supported CSP, without interfering with the production application. This, is of course a perfect use case for Test/Dev scenarios.
You can watch the whole demo of the process here:
Unfortunately, at the moment it is not yet possible to fully re-configure the application at destination (re-IPing of instances, creation of the necessary networking constructs in the Cloud etc.) natively from within Cohesity, but the CFD4 audience has been reassured that not only this much needed feature is on the roadmap, but it is indeed being developed as I write this post.
Having a Cloud Edition instance of DataPlatform running in your Cloud of choice would also allow to enable Cloud Native Backup for applications just migrated or natively deployed into the Cloud. Let’s see how this works on AWS: first an AWS snapshot of your EBS volume is created and stored in a S3 bucket, then an EBS volume is created from the snapshot and sent to the Cohesity CE instance. Once the instance backup is ingested it becomes “useful” data ready to receive the same Cohesity treatment it would receive on-prem, but in a native Cloud context. All of this is of course done leveraging APIs and the familiar Cohesity processes and interface, demonstrating the strength of Cohesity’s unified DataPlatform.
The example just described is focused on AWS but a similar backup process works with the same logic in Azure and GCP.
Finally, the last Cloud use case demonstrated at CFD4 was related to Multi-Cloud Data Mobility: just because you have your data on a certain Cloud Platform, this does not mean that you must be locked into it. You might decide to move your data from Cloud Provider A to Provider B for cost reasons, for consolidation needs or because of M&A to align to a common corporate strategy. Whatever your driver is, it is very simple to move your archived data from one provider to another and repoint your on-prem Cohesity cluster to the new location. Data cross-cloud migration can be performed using a 3rd party tool like Flexify.io while repointing is a matter of issuing a one-liner command on your on-prem platform. As soon as these steps are completed, your Cohesity cluster will automatically re-program your “to the cloud” jobs to point to the new destination and they will be resumed with no interruption of service.
Although not exactly smooth (it requires a few manual steps) the cross-cloud migration process as shown in the below video is simple and most of all, it works.
To wrap up the CFD4 presentations Rawlinson Rivera demonstrated how easy it is to incorporate Cohesity data services into the service catalog of any Private Cloud Management Platform and therefore bridging the Private and Public Cloud Data domains. The CMP of choice for the demo was VMware vRealize Automation, but the point there was to show that all the Cohesity capabilities can be leveraged via APIs and plugged into the CMP that better suits your needs to expand the portfolio of services associated with your workloads. Cohesity also hinted at the fact that they are setting up a dedicated team to address Service Providers’ specific requirements, and what Rawlinson has shown at CFD4 clearly goes into that direction.
Cohesity is not just another player in the backup appliances niche, instead they are positioning themselves as a strong contender for the management of secondary data, regardless of their customers data size, complexity and footprint. Ironically, just a few days after the CFD4 showcase, Cohesity announced a new product that reinforces my strong opinion on them: Cohesity Helios is a SaaS add-on for DataPlatform users, aimed at simplifying and unifying Data Management across wide Cohesity installations. This deserves a dedicated analysis and a new blog post in the coming weeks… it would also be interesting to compare them to their competitors at some point… but I am digressing here…
- Cohesity DataPlatform – https://www.cohesity.com/products/data-platform/
- Cohesity SpanFS – https://www.cohesity.com/what-we-do/spanfs/
- Cohesity Helios – https://www.cohesity.com/products/helios/
Disclaimer: travel, hotel and meal expenses for my participation to Cloud Field Day 4 have been kindly paid for by Gestalt IT, who invited me as a delegate. I am not under any obligation by neither Gestalt IT nor any of the vendors who participated to CFD4 to write any review or recommend any of the products and solutions presented at the event. I have not received any compensation for writing the above post and its content only represent my personal opinions.