Category Archives: Virtualization

VMWare Instant Recovery

When a virtual machine crashes, there are two ways to quickly recover it – first is by using the VMware snapshot copy and second is by restoring an image-level backup.  Most VMware environment though do not usually perform snapshots on the virtual machines (VMs) due to increased usage on the primary storage, which can be costly.   On the other hand, using traditional method to restore image-level backup can take longer since it has to be copied back from the protection storage to the primary storage. 

However, most backup solutions nowadays – including Netbackup, Avamar/Data Domain, and Veeam – support VMware instant recovery where you can immediately restore VMs by running them directly from backup files.  The way it works is that the virtual machine image backup is staged to a temporary NFS share on the protection storage system (e.g. Data Domain).   You can then use the vSphere Client to power on the virtual machine (which is NFS mounted on the ESXi host), then initiate a vMotion of the virtual machine to the primary datastore within the vCenter. 

Since there is no need to extract the virtual machine from the backup file and copy it to production storage, you can perform restore from any restore point in a matter of minutes. VMware instant recovery helps improve recovery time objectives (RTO), and minimizes disruption and downtime of critical workloads.

There are also other uses for instant recovery. You can use it to verify the backup image, verify an application, test a patch on a restored virtual machine before you apply the patch to production systems, and perform granular restore of individual files and folders.

Unlike the primary storage, protection storage such as Data Domains are usually slow.  However, the new releases of Data Domains have improved random I/O (due to additional flash SSD), higher IOPS and better latency, enabling faster instant access and restore of VMs. 

Selecting the Right HCI Solution

The popularity of hyper converged infrastructure (HCI) systems is fueled not only by better, faster, and cheaper cpu and flash storage, but also by better orchestration of compute and storage resources, horizontal scaling, and elasticity to adjust to changing workloads.

Hyper-converged infrastructures are scale-out systems with nodes that are added and aggregated into a single platform. Each node performs compute, storage and networking functions, and they run virtualization software. HCI enables the software-defined data center.

But what are the considerations in buying the right solution for your use case? Here are some guidelines:

1. Closely inspect how the system implements reliability and resiliency. How does it protect system configuration and data? Implementations include replication, erasure coding, distribution of state information across multiple nodes to enable automatic failover, etc.

2. Does it have self-healing capabilities?

3. Can it perform non-disruptive upgrades?

4. Does it support VMware vSphere, as well as Microsoft Hyper-V and open source hypervisors like KVM?

5. Does the storage supports auto-tiering?

6. Since migrations affect virtual machine performance, how does the system maintains data locality as virtual machines move from one host to another?

7. What are the network configuration options? How is the network managed? Is there a self-optimizing network capabilities?

8. How is the performance affected when backups and restore are performed?

9. What is the performance impact if they are deployed in multiple geographical regions?

10. What are the data protection and recovery capabilities, such as snapshots and replication of workloads locally, to remote data centers and in the cloud?

11. Does it deduplicate the data, which minimizes the amount of data stored?

12. Does it have the option to extend to public clouds?

13. What are its management capabilities? Does it provide a single intuitive console for managing the HCI, or does it include a plug-in to hypervisor management tool such as vCenter to perform the management tasks?

14. Does it have APIs that enable third-party tools and custom scripts to interface with to enable automation?

15. Does it have monitoring, alerting, and reporting system which analyzes its performance, errors and capacity planning?

Finally, you should look at the vendor itself and look at their future in the HCI space, their product roadmap, support polices and cost model (lease, outright purchase, pay as you go, etc).

Hyper Converged Infrastructure (HCI)

Companies who want to retain control of their infrastructure and data (due to regulatory, security, application requirement, and other reasons), but still want the benefits of the public cloud – such as unlimited scalability, efficient resource utilization, cost-effectiveness of pooling compute and storage resources, easy provisioning of resources on demand – would benefit tremendously by using hyperconverged infrastructure (HCI) on their premises.

Hyperconverged infrastructure consolidates compute, storage, and networking in a box. It creates a modular system which can be linearly scaled. HCI takes advantage of commodity hardware (i.e. x86 systems) and advances in storage and networking technologies (i.e. flash storage providing high IOPS and 10GB / 40GB high speed Ethernet).

HCI uses virtualization technology (such as VMware) to aggregate compute, storage and network. It eliminates the need for dedicated SAN and storage arrays by pooling the storage of each node and defining it via software. In addition, HCI usually offers unified management which eliminates the different management silos between compute, storage and network.

There are a variety of HCI solutions to choose from. You can build it yourself using commodity hardware and using virtualization software (e.g. VMware vSphere) and software defined storage (e.g. VMware vSAN). You can also buy hyperconverged appliances from vendors such as Nutanix and Dell-EMC (VxRails). Hyperconverged rack-scale systems for large enterprises, such as Dell-EMC VxRack, are available as well .

There are numerous advantages for using HCI:

1. Faster time to deploy – you can easily add compute, storage and network, and scale it up and out to meet business demands. This in turn reduces development cycles for new apps and services.

2. Simplified management and operations – compute, storage and network provisioning can be done by a unified team eliminating the network, compute or storage silos. Many provisioning and configuration tasks can now be scripted and automated.

3. Cost savings – initial investment is usually lower. Your company can start small and scale incrementally as you grow, adding smaller amounts of compute or storage capacity as required vs buying larger bundles of software and storage arrays. Operational expenses is also much lower, since there is no more SAN to manage.

4. Reduces the footprint of the Data Center which means less power and less cooling requirements. HCI can usually consolidate 16 DC racks into one.

HCI is the ideal infrastructure solution for on-premise data centers.

Ensuring Reliability of Your Apps on the Amazon Cloud

On February 28, 2017, the Amazon Simple Storage Service (S3) located in the Northern Virginia (US-EAST-1) Region went down due to an incorrect command issued by a technician. A lot of websites and applications that rely on the S3 service went down with it. The full information about the outage can be found here: https://aws.amazon.com/message/41926/

While Amazon Web Services (AWS) could have prevented this outage, a well-architected site should not have been affected by this outage. Amazon allows subscribers to use multiple availability zones (and even redundancy in multiple regions), so that when one goes down, the applications are still able to run on the others.

It is very important to have a well-architected framework when using the cloud. AWS provides one that is based on five pillars:

  • Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
  • Reliability – The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
  • Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
  • Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.
  • Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

For those companies who were affected by the outage, applying the “reliability” principle (by utilizing multiple availability zones, or using replication to different regions), could have shielded them from the outage.

Building an Enterprise Private Cloud

Businesses are using public clouds such as Amazon AWS, VMware vCloud or Microsoft Azure because they are relatively easy to use, they are fast to deploy, businesses can buy resources on demand, and most importantly, they are relatively cheap (because there is no operational overhead in building, managing and refreshing an on-premise infrastructure). But there are downsides to using public cloud, such as security and compliance, diminished control of data, data locality issue, and network latency and bandwidth. On-premise infrastructure is still the most cost effective for regulated data and for applications with predictable workloads (such as ERP, local databases, end-user productivity tools, etc).

However, businesses and end-users are expecting and demanding cloud-like services from their IT departments for these applications that are best suited on-premise. So, IT departments should build and deliver an infrastructure that has the characteristics of a public cloud (fast, easy, on-demand, elastic, etc) and the reliability and security of the on-premise infrastructure – an enterprise private cloud.

An enterprise cloud is now possible to build because of the following technology advancements:

  1. hyper-converged solution
  2. orchestration tools
  3. flash storage

When building an enterprise cloud, keep in mind the following:

  1. They should be 100% virtualized.
  2. There should be a mechanism for self-service provisioning, monitoring, billing and charge back.
  3. A lot of operational functions should be automated.
  4. Compute and storage can be scaled-out.
  5. It should be resilient – no single point of failure.
  6. Security should be integrated in the infrastructure.
  7. There should be a single management platform.
  8. Data protection and disaster recovery should be integrated in the infrastructure.
  9. It should be application-centric instead of infrastructure-centric.
  10. Finally, it should be able to support legacy applications as well as modern apps.

Hyper-converged Infrastructure: Hype or For Real?

One of the hottest emerging technologies in IT is hyper-converged infrastructure (HCI). What is the hype all about? Is it here to stay?

As defined by Techtarget, hyper-convergence infrastructure (HCI) is a system with a software-centric architecture that tightly integrates compute, storage, networking, virtualization resources (hypervisor, virtual storage, virtual networking) and other technologies (such as data protection and deduplication) in a commodity hardware box (usually x86) supported by a single vendor.

Hyper-convergence grew out of the concept of converged infrastructure, where engineers took it a little further – using very small hardware footprint, tight integration of components and simplified management. It is a relatively new technology. On the technology adoption curve, it is still at the early adopters stage.

Nutanix is the first vendor to offer hyper-converged solution, followed by Simplivity, and Scale Computing. Not to be outdone, VMWare developed its EVO-RAIL, then opened it for hardware vendors to OEM the product. Major vendors, including EMC, NetApp, Dell, HP, and Hitachi began selling EVO-RAIL products.

One of the best HCI product that I’ve seen is VxRail. Jointly engineered by VMware and EMC, the “VxRail appliance family takes full advantage of VMware Hyper-Converged Software capabilities and provides additional hardware and lifecycle management features and rich EMC data services, delivered in a turnkey appliance with integrated support.”

What are the advantages of HCI and where can it be used? Customers who are looking to start small and be able to scale out overtime, will find the HCI solution very attractive. It is a perfect fit for small to medium size companies, to be able to build their own data center without spending huge amount of money. It is simple (because it eliminates a lot of hardware clutter) and highly scalable (because it can be scaled very easily by adding small standardized x86 nodes). Since it is scalable, it will ease the burden of growth. Finally, its performance is comparable to big infrastructures because leveraging SSD storage and bringing the data close to the compute enables high IOPS at very low latencies.

References:

1. Techtarget
2. VMware Hyper-Converged Infrastructure: What’s All the Fuss About?

Backing Up Virtual Machines Using Avamar Image-Level Backup

Avamar can backup virtual machines using guest level backup or image-level backup.

The advantages of VMware guest backup are that it allows backup administrators to leverage identical backup methods for physical and virtual machines, which reduces administrative complexity, and it provides the highest level of data deduplication, which reduces the amount of backup data across the virtual machines.

The second way to backup virtual machines is via the Avamar image-level backup. It is faster and more efficient and it also supports file level restores.

Avamar integrates with VMware VADP (vStorage API for Data Protection) to provide image level backups. Integration is achieved through the use of the Avamar VMware Image plug-in. Simply put, the VMware Image backup creates a temporary snapshot of the virtual machine, and uses a virtual machine proxy to perform the image backup.

Backup can occur while the virtual machines are powered on or off. Since the backup is handled by a proxy, CPU cycles of the target virtual machines are not used.

Avamar provides two ways for restoring virtual machine data: image restores, which can restore an entire image or selected drives; and file-level restores, which can restore specific folders or files.

However, file-level restores are only supported on Windows and Linux. In addition, it has the following limitations:

1. File-level restores are more resource intensive and are best used to restore a relatively small amounts of data. In fact, you cannot restore more than 5,000 folders or files.

2. The latest VMware Tools must be installed on the target virtual machine, in order to successfuly restore files and folders.

3. Dynamic disks, GPT disks, deduplicated NTFS, ReFS, extended partitions, bootloaders, encrypted and compressed partitions virtual disk configurations are not supported.

4. ACLs are not restored.

5. Symbolic links cannot be restored.

6. When restoring files or folders to the original virtual machine, only SCSI disks are supported; IDE disks are not supported.

If you must restore folders or files, and you ran into the limitations mentioned above, you can restore an entire image or selected drives to a temporary location (for example, a new temporary virtual machine), then copy those files and folders to the desired location following the restore.

Integrating Riverbed Steelfusion with EMC VNX

SteelFusion is an appliance-based IT-infrastructure for remote offices. SteelFusion eliminates the need for physical servers, storage and backup infrastructure at remote offices by consolidating them into the data centers. Virtual servers located at the data centers are projected to the branch offices, enabling the branch office users access to servers and data with LAN-like performance.

SteelFusion uses VMware to project virtual servers and data to the branch office. Robust VMware infrastructure usually consists of fiber channel block-based storage such as EMC VNX. The advantage of using EMC VNX or any robust storage platform is its data protection features such as redundancy and snapshots.

In order to protect data via the use of EMC VNX array-based snapshot, and so that data can be backed up and restored using 3rd party backup software, the following items must be followed:

1. When configuring storage and LUNs, use Raid Group instead of Storage Pools. Storage Pools snapshots do not integrate well with Steelfusion for now.

2. Create Reserve LUNs to be used for snapshots.

3. When adding the VNX storage array information to Steelfusion Core appliance, make sure to select ‘Type: EMC CLARiON’, not EMC VNX.

For more information, consult the Riverbed documentation.

Using Isilon as VMware Datastore

I recently implemented a VMware farm utilizing Isilon as a backend datastore. Although Isilon’s specialty is sequential access I/O workloads such as file services, it can also be used as a storage for random access I/O workloads such as datastore for VMware farms. I only recommend it though for low to mid-tier VMware farms.

Isilon scale-out storage supports both iSCSI and NFS implementations. However, NFS implementation is far superior than iSCSI. The advantages of NFS are:

1. simplicity – managing virtual machines at the file level is simpler than managing LUNs,
2. rapid storage provisioning – instead of managing LUNs, all VMDK files may be stored on a single file export, eliminating the need to balance workloads across multiple LUNs,
3. higher storage utilization rates – VMDK files are thin-provisioned by default when using NAS-based datastore.

In addition, Isilon only supports software iSCSI initiators.

Isilon supports VAAI (vStorage APIs for Array Integration) which offloads I/O intensive tasks from the ESXi host to the Isilon storage cluster directly (such as when doing storage vmotion, virtual disk cloning, NAS-based VM snaphots, and VM instant provisioning), which results in overall faster completion times. Isilon also supports VASA (vStorage APIs for Storage Awareness) which presents the underlying storage capabilities to vCenter.

When using NFS datastore, it is very important to follow the implementation best practices which can be found here. Some of the important best practices are:

1. Connect the Isilon and ESXi hosts to the same physical switches on the same subnet. The underlying network infrastructure should also be redundant, such as redundant switches.
2. Use 10GB connectivity to achieve optimal performance.
3. Segment NFS traffic so that other traffic such as virtual machines network traffic or management network traffic do not share bandwidth with NFS traffic.
4. Use separate vSwiches for NFS traffic on the VMware and use dedicated NICs for NFS storage.
5. Use Smartconnect zone to load balance between multiple Isilon nodes, as well as dynamic failover and failback of client connections across the Isilon storage nodes.
6. Enable the VASA features and functions to simplify and automate storage resource management
7. To achieve higher aggregate I/O, create multiple datastores, with each datastore mounted via a separate FQDN/ Smartconnect pool and network interface on the Isilon cluster.

Delivering Centralized Data with Local Performance to Remote Offices

One of the challenges large companies are facing is how to build and support IT infrastructure (network, server, and storage) for remote offices. Remote offices usually do not have full time IT employees because it is usually cost prohibitive to employ full time IT personnel to support a small IT infrastructure. In addition, large companies are very protective of their data and want their data to be centralized at their data centers sitting on a reliable and well protected infrastructure.

However, centralizing the infrastructure and data location may lead to poor performance for the local site, especially if the WAN bandwidth and latency is not that great.

Enter Riverbed Steelfusion. Riverbed SteelFusion is a branch converged infrastructure solution that centralizes data in the datacenter and delivers local performance and nearly instant recovery at the branch. It does this by consolidating branch servers, storage, networking and virtualization infrastructure into a single solution.

With Steelfusion, a virtual machine which will act as a branch file or application server is provisioned at the data center where a Steelfusion Core is located, and is projected to the branch via the Steelfusion Edge located at the branch office.

Steelfusion has the following advantages:

1. No more costly installation and maintenance of servers and storage at the branch office.
2. LAN performance in the branch, which will make end users happy.
3. Centralized management of storage, servers, and data at the data center.
4. No more costly branch backup (such as backup hardware and software, tape media, backup management, off-site handling, etc)
5. Improved recovery of servers and applications.
6. Quick provisioning of servers.
7. Data is secure in the data center, in case branch office has a disaster or theft.

Delivering data and applications located at the data centers to branch/remote offices while maintaining local area performance can be accomplished by using Riverbed Steelfusion.