When it was launched, vRealize Operations Manager was immediately perceived by its user base as a complete rework of its predecessor, vCenter Operations Manager. Changes were introduced not only in terms of features and capabilities, but also in the product’s architecture. Having hit version 6.2 and incorporating even some functionalities inherited by Hyperic, vROps is now definitely a mature product, which makes it an essential and indispensable component of any modern VMware virtualization infrastructure.
In this article I will try to cover most of the design considerations that need to be made when facing a vROps implementation scenario; I don’t mean to cover all the facets of the “vROps Design Dilemma”, neither will I go too much in depth analyzing all the possible design considerations. Nevertheless I hope to give you enough food for thought to succeed with your vROps implementation.
First thing first: forget about what you learned from vCOps. With vROps there’s no more separate UI and Analytics VMs, this meaning that the smallest deployment can be now performed with only one VM including all the necessary services. This leads to at least two design choices that need to be made at a very early stage: the first one is related to the OS of choice for your vROps VMs, the second one is about sizing.
vROps is available in three flavors: a “ready to deploy” OVA appliance (based on SUSE Linux), one Linux installable binary certified to work with RHEL and – finally – a Windows installer package. The choice is up to you and it might depend on any factor that might influence the decision (your familiarity with a certain OS, your company’s policies etc.). What matters is that once you choose a flavor, you will have stick with it for all the VMs of your “Analytics Cluster” (more about this soon). Personally, I prefer the OVA as it is very fast to deploy and set up and it doesn’t come with the burden of maintaining a complete OS stack: when vROps is upgraded to a new build, usually also the OS powering the OVA is updated, and that’s all you need to do to look after your vROps VMs.
We can’t discuss sizing if first we do not talk about the different roles a VM can hold in a vROps cluster. One interesting thing is that, even when deployed as a single VM, your vROps implementation is already considered as a “cluster”; the reason behind this reasoning is that you can easily add more nodes to scale out your deployment whenever you hit the supported object limit (hold on for that when we talk about nodes sizes).
What matters is that the first VM deployed in a cluster is always a “Master Node”, and this the only VM that holds at any given time all four vROps databases (source: VMware):
- Global xDB: Contains user configuration data, such as alert and symptom definitions, dashboard configurations, and super metric formulas. The global xDB is found on the master node and master replica nodes only
- Alerts/symptoms vPostgres: Contains alerts and symptoms information
- Historical inventory service xDB: Contains a historical view of all object properties and relationships
- FSDB: Contains all the statistical, time series data. This raw metric data is used to provide live status displays in the user interface
To scale out a vROps Cluster you most likely would have to add one or more additional nodes, known as “Data Nodes”: these are essentially very similar to the Master Node except that they don’t hold the Global xDB database. So what would you do to protect the Master Node and keep the cluster operable even in case of Master failure? You would enable High Availability, which translates into choosing one of your existing Data Nodes and turning it into a “Master Replica” node. When you do this the Master Replica Node immediately begins to sync with the Master: if the latter becomes unavailable, the Replica Node will take over its role in a matter of minutes. This is certainly cool, but it comes with a price: the reduction in the number of objects and collected metrics the cluster can manage, which can result in the need to deploy additional Data Nodes to compensate for this.
The need for HA can be mitigated with a proper backup strategy: if you are able to quickly and consistently restore a failed Master Node, than you can spare yourself the need for HA. Just keep in mind this when designing for backup: a vROps cluster is a distributed system that, in VMware’s own words “uses VMware vFabric® GemFire® to connect nodes into a shared memory cluster and map the work across the nodes.” Because of this distributed architecture, to guarantee consistency of data after a restore, all the nodes in the vROps cluster must be backed up and restored at the same time. The vROps documentation will provide more details about how to setup a vROps backup/restore job, but the point here is to highlight why and how include backup in your design.
In a nutshell, this is an “Analytics Cluster”, i.e. the portion of the cluster that ingests, analyzes, correlates and presents the data. One strict requirement of the nodes in an Analytics Cluster is that the nodes should run all on the same OS (OVA, Linux or Windows) and must have exactly the same HW specs (CPU, RAM and Disk).
But there’s more: let’s not forget “Remote Collector Nodes”. As their name says, these nodes do not performs any analytics work (they are not part of the GemFire distributed fabric) and their only role is to collect data from a remote endpoint. So you can deploy a Remote Collector at another location, have it fetch data from the far endpoint and send it to the Analytics cluster at the primary site. One peculiarity of Remote Collector Nodes is that they don’t have to match neither the same OS nor the HW specs of the Analytics Cluster ones.
So, how do we size a vROps cluster? vROps sizing is a matter of choosing the number (and role) of the nodes and their size. vROps Analytics Nodes come in different form factors (Small, Medium, Large, Extra Large) depending on the maximum number of objects (vCenters, ESXi hosts, VMs, DataStores etc.) monitored and metrics collected by a single node.
So one could do some simple math and come up with the list and specs of the VMs to be deployed, but luckily VMware has developed a sizing calculator (downloadable from here) which, by simply entering the number of VMs, Hosts and DataStores in scope and by specifying the data retention period and the choice for HA, will generate a recommendation for the number of nodes needed in each of the supported form factors, including disk I/O and disk size.
The same tool also allows for more detailed calculations accounting for deployed EndPoint agents (remember I mentioned that vROps now also incorporates some Hyperic functionality), NSX integration, VDI objects and even storage arrays.
A multi-node cluster comes with an interesting bonus: all the nodes in the Analytics Cluster are capable of receiving connections to the Web UI, so why not make a best use of this and put them behind a Load Balancer? In that case prepare a PEM certificate with the name of your cluster and SAN entries for each of the nodes. All it takes is to load the certificate onto the Master Node as part of its installation procedure and it’ll be automatically imported by any new node being added to the cluster. There is a whitepaper from VMware explaining how to configure some of the most common Load Balancers, so be sure to download it from here .
One last word before wrapping up: vROps comes out of the box with two “Solutions” which allow to collect data from different sources. The EndPoint Operations (EPOPs) will collect metrics at OS level on those VMs where the EPOPs Agent has been installed, while the vCenter Solution has two adapters, one to collect metrics from vCenters (the “vCenter Adapter”) and one to execute remediation tasks on vCenter object straight from vROps (the “Python Adapter”). The basic functionalities of vROps are provided by these two adapters, but vROps can be extended to do much more than just this: dozens of Management Packs can be downloaded from VMware’s Solutions Exchange website to extend the collection and analysis capabilities of vROps to almost any kind of object. Some of these packs are free, some are paid, some are developed by VMware, some by vendors or by partners; there are Management Packs to monitor storage arrays, databases, servers, network devices, UPSs etc. You name it, chances are it’ll be there.
Given all of this it’s easy to understand how vROps has evolved from its vCOps early days to a complete solution capable to monitor and analyze complex, heterogeneous and diverse virtualization infrastructures running on top of VMware technologies. No vSphere design should come without a vROps implementation and no vROps should be deployed without addressing first its design and architectural peculiarities.
This article has been selected and published as part of Pernix Data vSphere Design Pocketbook v3.0. The book, featuring articles from many IT experts and Community personalities, is available for free download from the Pernix Data website.