Category Archives: Virtualization

Hyper Converged Infrastructure (HCI)

Companies who want to retain control of their infrastructure and data (due to regulatory, security, application requirement, and other reasons), but still want the benefits of the public cloud – such as unlimited scalability, efficient resource utilization, cost-effectiveness of pooling compute and storage resources, easy provisioning of resources on demand – would benefit tremendously by using hyperconverged infrastructure (HCI) on their premises.

Hyperconverged infrastructure consolidates compute, storage, and networking in a box. It creates a modular system which can be linearly scaled. HCI takes advantage of commodity hardware (i.e. x86 systems) and advances in storage and networking technologies (i.e. flash storage providing high IOPS and 10GB / 40GB high speed Ethernet).

HCI uses virtualization technology (such as VMware) to aggregate compute, storage and network. It eliminates the need for dedicated SAN and storage arrays by pooling the storage of each node and defining it via software. In addition, HCI usually offers unified management which eliminates the different management silos between compute, storage and network.

There are a variety of HCI solutions to choose from. You can build it yourself using commodity hardware and using virtualization software (e.g. VMware vSphere) and software defined storage (e.g. VMware vSAN). You can also buy hyperconverged appliances from vendors such as Nutanix and Dell-EMC (VxRails). Hyperconverged rack-scale systems for large enterprises, such as Dell-EMC VxRack, are available as well .

There are numerous advantages for using HCI:

1. Faster time to deploy – you can easily add compute, storage and network, and scale it up and out to meet business demands. This in turn reduces development cycles for new apps and services.

2. Simplified management and operations – compute, storage and network provisioning can be done by a unified team eliminating the network, compute or storage silos. Many provisioning and configuration tasks can now be scripted and automated.

3. Cost savings – initial investment is usually lower. Your company can start small and scale incrementally as you grow, adding smaller amounts of compute or storage capacity as required vs buying larger bundles of software and storage arrays. Operational expenses is also much lower, since there is no more SAN to manage.

4. Reduces the footprint of the Data Center which means less power and less cooling requirements. HCI can usually consolidate 16 DC racks into one.

HCI is the ideal infrastructure solution for on-premise data centers.

Ensuring Reliability of Your Apps on the Amazon Cloud

On February 28, 2017, the Amazon Simple Storage Service (S3) located in the Northern Virginia (US-EAST-1) Region went down due to an incorrect command issued by a technician. A lot of websites and applications that rely on the S3 service went down with it. The full information about the outage can be found here:

While Amazon Web Services (AWS) could have prevented this outage, a well-architected site should not have been affected by this outage. Amazon allows subscribers to use multiple availability zones (and even redundancy in multiple regions), so that when one goes down, the applications are still able to run on the others.

It is very important to have a well-architected framework when using the cloud. AWS provides one that is based on five pillars:

  • Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
  • Reliability – The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
  • Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
  • Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.
  • Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

For those companies who were affected by the outage, applying the “reliability” principle (by utilizing multiple availability zones, or using replication to different regions), could have shielded them from the outage.

Building an Enterprise Private Cloud

Businesses are using public clouds such as Amazon AWS, VMware vCloud or Microsoft Azure because they are relatively easy to use, they are fast to deploy, businesses can buy resources on demand, and most importantly, they are relatively cheap (because there is no operational overhead in building, managing and refreshing an on-premise infrastructure). But there are downsides to using public cloud, such as security and compliance, diminished control of data, data locality issue, and network latency and bandwidth. On-premise infrastructure is still the most cost effective for regulated data and for applications with predictable workloads (such as ERP, local databases, end-user productivity tools, etc).

However, businesses and end-users are expecting and demanding cloud-like services from their IT departments for these applications that are best suited on-premise. So, IT departments should build and deliver an infrastructure that has the characteristics of a public cloud (fast, easy, on-demand, elastic, etc) and the reliability and security of the on-premise infrastructure – an enterprise private cloud.

An enterprise cloud is now possible to build because of the following technology advancements:

  1. hyper-converged solution
  2. orchestration tools
  3. flash storage

When building an enterprise cloud, keep in mind the following:

  1. They should be 100% virtualized.
  2. There should be a mechanism for self-service provisioning, monitoring, billing and charge back.
  3. A lot of operational functions should be automated.
  4. Compute and storage can be scaled-out.
  5. It should be resilient – no single point of failure.
  6. Security should be integrated in the infrastructure.
  7. There should be a single management platform.
  8. Data protection and disaster recovery should be integrated in the infrastructure.
  9. It should be application-centric instead of infrastructure-centric.
  10. Finally, it should be able to support legacy applications as well as modern apps.

Hyper-converged Infrastructure: Hype or For Real?

One of the hottest emerging technologies in IT is hyper-converged infrastructure (HCI). What is the hype all about? Is it here to stay?

As defined by Techtarget, hyper-convergence infrastructure (HCI) is a system with a software-centric architecture that tightly integrates compute, storage, networking, virtualization resources (hypervisor, virtual storage, virtual networking) and other technologies (such as data protection and deduplication) in a commodity hardware box (usually x86) supported by a single vendor.

Hyper-convergence grew out of the concept of converged infrastructure, where engineers took it a little further – using very small hardware footprint, tight integration of components and simplified management. It is a relatively new technology. On the technology adoption curve, it is still at the early adopters stage.

Nutanix is the first vendor to offer hyper-converged solution, followed by Simplivity, and Scale Computing. Not to be outdone, VMWare developed its EVO-RAIL, then opened it for hardware vendors to OEM the product. Major vendors, including EMC, NetApp, Dell, HP, and Hitachi began selling EVO-RAIL products.

One of the best HCI product that I’ve seen is VxRail. Jointly engineered by VMware and EMC, the “VxRail appliance family takes full advantage of VMware Hyper-Converged Software capabilities and provides additional hardware and lifecycle management features and rich EMC data services, delivered in a turnkey appliance with integrated support.”

What are the advantages of HCI and where can it be used? Customers who are looking to start small and be able to scale out overtime, will find the HCI solution very attractive. It is a perfect fit for small to medium size companies, to be able to build their own data center without spending huge amount of money. It is simple (because it eliminates a lot of hardware clutter) and highly scalable (because it can be scaled very easily by adding small standardized x86 nodes). Since it is scalable, it will ease the burden of growth. Finally, its performance is comparable to big infrastructures because leveraging SSD storage and bringing the data close to the compute enables high IOPS at very low latencies.


1. Techtarget
2. VMware Hyper-Converged Infrastructure: What’s All the Fuss About?

Backing Up Virtual Machines Using Avamar Image-Level Backup

Avamar can backup virtual machines using guest level backup or image-level backup.

The advantages of VMware guest backup are that it allows backup administrators to leverage identical backup methods for physical and virtual machines, which reduces administrative complexity, and it provides the highest level of data deduplication, which reduces the amount of backup data across the virtual machines.

The second way to backup virtual machines is via the Avamar image-level backup. It is faster and more efficient and it also supports file level restores.

Avamar integrates with VMware VADP (vStorage API for Data Protection) to provide image level backups. Integration is achieved through the use of the Avamar VMware Image plug-in. Simply put, the VMware Image backup creates a temporary snapshot of the virtual machine, and uses a virtual machine proxy to perform the image backup.

Backup can occur while the virtual machines are powered on or off. Since the backup is handled by a proxy, CPU cycles of the target virtual machines are not used.

Avamar provides two ways for restoring virtual machine data: image restores, which can restore an entire image or selected drives; and file-level restores, which can restore specific folders or files.

However, file-level restores are only supported on Windows and Linux. In addition, it has the following limitations:

1. File-level restores are more resource intensive and are best used to restore a relatively small amounts of data. In fact, you cannot restore more than 5,000 folders or files.

2. The latest VMware Tools must be installed on the target virtual machine, in order to successfuly restore files and folders.

3. Dynamic disks, GPT disks, deduplicated NTFS, ReFS, extended partitions, bootloaders, encrypted and compressed partitions virtual disk configurations are not supported.

4. ACLs are not restored.

5. Symbolic links cannot be restored.

6. When restoring files or folders to the original virtual machine, only SCSI disks are supported; IDE disks are not supported.

If you must restore folders or files, and you ran into the limitations mentioned above, you can restore an entire image or selected drives to a temporary location (for example, a new temporary virtual machine), then copy those files and folders to the desired location following the restore.

Integrating Riverbed Steelfusion with EMC VNX

SteelFusion is an appliance-based IT-infrastructure for remote offices. SteelFusion eliminates the need for physical servers, storage and backup infrastructure at remote offices by consolidating them into the data centers. Virtual servers located at the data centers are projected to the branch offices, enabling the branch office users access to servers and data with LAN-like performance.

SteelFusion uses VMware to project virtual servers and data to the branch office. Robust VMware infrastructure usually consists of fiber channel block-based storage such as EMC VNX. The advantage of using EMC VNX or any robust storage platform is its data protection features such as redundancy and snapshots.

In order to protect data via the use of EMC VNX array-based snapshot, and so that data can be backed up and restored using 3rd party backup software, the following items must be followed:

1. When configuring storage and LUNs, use Raid Group instead of Storage Pools. Storage Pools snapshots do not integrate well with Steelfusion for now.

2. Create Reserve LUNs to be used for snapshots.

3. When adding the VNX storage array information to Steelfusion Core appliance, make sure to select ‘Type: EMC CLARiON’, not EMC VNX.

For more information, consult the Riverbed documentation.

Using Isilon as VMware Datastore

I recently implemented a VMware farm utilizing Isilon as a backend datastore. Although Isilon’s specialty is sequential access I/O workloads such as file services, it can also be used as a storage for random access I/O workloads such as datastore for VMware farms. I only recommend it though for low to mid-tier VMware farms.

Isilon scale-out storage supports both iSCSI and NFS implementations. However, NFS implementation is far superior than iSCSI. The advantages of NFS are:

1. simplicity – managing virtual machines at the file level is simpler than managing LUNs,
2. rapid storage provisioning – instead of managing LUNs, all VMDK files may be stored on a single file export, eliminating the need to balance workloads across multiple LUNs,
3. higher storage utilization rates – VMDK files are thin-provisioned by default when using NAS-based datastore.

In addition, Isilon only supports software iSCSI initiators.

Isilon supports VAAI (vStorage APIs for Array Integration) which offloads I/O intensive tasks from the ESXi host to the Isilon storage cluster directly (such as when doing storage vmotion, virtual disk cloning, NAS-based VM snaphots, and VM instant provisioning), which results in overall faster completion times. Isilon also supports VASA (vStorage APIs for Storage Awareness) which presents the underlying storage capabilities to vCenter.

When using NFS datastore, it is very important to follow the implementation best practices which can be found here. Some of the important best practices are:

1. Connect the Isilon and ESXi hosts to the same physical switches on the same subnet. The underlying network infrastructure should also be redundant, such as redundant switches.
2. Use 10GB connectivity to achieve optimal performance.
3. Segment NFS traffic so that other traffic such as virtual machines network traffic or management network traffic do not share bandwidth with NFS traffic.
4. Use separate vSwiches for NFS traffic on the VMware and use dedicated NICs for NFS storage.
5. Use Smartconnect zone to load balance between multiple Isilon nodes, as well as dynamic failover and failback of client connections across the Isilon storage nodes.
6. Enable the VASA features and functions to simplify and automate storage resource management
7. To achieve higher aggregate I/O, create multiple datastores, with each datastore mounted via a separate FQDN/ Smartconnect pool and network interface on the Isilon cluster.

Delivering Centralized Data with Local Performance to Remote Offices

One of the challenges large companies are facing is how to build and support IT infrastructure (network, server, and storage) for remote offices. Remote offices usually do not have full time IT employees because it is usually cost prohibitive to employ full time IT personnel to support a small IT infrastructure. In addition, large companies are very protective of their data and want their data to be centralized at their data centers sitting on a reliable and well protected infrastructure.

However, centralizing the infrastructure and data location may lead to poor performance for the local site, especially if the WAN bandwidth and latency is not that great.

Enter Riverbed Steelfusion. Riverbed SteelFusion is a branch converged infrastructure solution that centralizes data in the datacenter and delivers local performance and nearly instant recovery at the branch. It does this by consolidating branch servers, storage, networking and virtualization infrastructure into a single solution.

With Steelfusion, a virtual machine which will act as a branch file or application server is provisioned at the data center where a Steelfusion Core is located, and is projected to the branch via the Steelfusion Edge located at the branch office.

Steelfusion has the following advantages:

1. No more costly installation and maintenance of servers and storage at the branch office.
2. LAN performance in the branch, which will make end users happy.
3. Centralized management of storage, servers, and data at the data center.
4. No more costly branch backup (such as backup hardware and software, tape media, backup management, off-site handling, etc)
5. Improved recovery of servers and applications.
6. Quick provisioning of servers.
7. Data is secure in the data center, in case branch office has a disaster or theft.

Delivering data and applications located at the data centers to branch/remote offices while maintaining local area performance can be accomplished by using Riverbed Steelfusion.

Redefining Data Center In A Box

Data center in a box is traditionally defined as a “type of data center in which portable, mobile, and modular information nodes are self-contained within a cargo container. It is designed and packaged for quick deployment and acquisition of data center solutions in organizations or facilities, including remote off-site locations.” Data center in a box usually contains equipment from large storage, compute, and network vendors such as EMC, NetApp, Dell, and Cisco. They are pieced together to form the IT infrastructure. Virtual Computing Alliance (VCE) for instance, offers Vblock, a bundled product containing EMC storage, Cisco servers, and VMware. NetApp has a similar offering called Flexpod.

But new innovative companies such as Simplivity, Nutanix, and Scale Computing are changing the definition of data center in a box. They are creating a purpose-built product from the ground up that incorporates not just compute, storage, and network, but additional services such as data deduplication, wan optimization, and backup in a box.

For instance, Simplivity’s product called OmniCube is “a powerful data center building block that assimilates the core functions of server, storage and networking in addition to a wide range of advanced functionality including: native VM-level backup, WAN optimization, bandwidth efficient replication for DR, cache accelerated performance, and cloud integration.”

These products will further simplify the design, implementation, and operation of IT infrastructure. With these boxes, there is no more storage area network (SAN) to manage, nor additional appliances such as WAN accelerator to deploy. A few virtual machine (VM) administrators can manage all the boxes in a cluster from the VMware server virtualization management user interface.

Data center in a box will continue to evolve and will change how we view and manage IT infrastructure for years to come.

Best Practices for Using NFS Datastore on VMware

More companies are now deploying VMware with IP based shared storage (NAS). NAS storage is cheaper than Fiber Channel (SAN) storage because there is no separate Fiber Channel (FC) based network to maintain. More importantly, IP based storage performance and stability are now comparable with FC based storage.

Other advantages of using IP based storage, specifically NFS, are thin provisioning, de-duplication, and the ease-of-backup-and-restore of virtual machines and files on a virtual disk via array based snapshots. In addition, IP based storage is easier to maintain.

VMware published a whitepaper on the best practices for running VMware vSphere on Network Attached Storage (NAS) using NFS. Following the best practices in deploying an NFS based storage is very important to obtain a stable and optimized VMware environment. Here are the important things to consider:

On the network side, the local area network (LAN) on which the NFS traffic will run needs to be designed with availability, downtime-avoidance, isolation, and failover:

1. NFS traffic should be on a separate physical LAN, or at least on a separate VLAN.
2. Use private (non-routable) IP addresses. This will also address a security concern since NFS traffic is not encrypted and NFS is mounted with root privileges on the VMware host.
3. Use redundancy by teaming the NICs on the VMware host, configuring LACP protocol, and using two LAN switches.
4. Use jumbo frames.
5. Use 10GB Ethernet.

On the storage array side, the storage controller must be redundant, in case the primary one fails. In addition,

1. Configure the NFS exports to be persistent. (e.g. exportfs –p)
2. Install the VAAI and other plug-in tools from the storage vendor. For instance, NetApp has the Virtual Storage Console (VSC) plug-in that can be installed on the vCenter.
3. Refer to the storage vendor best practices guide. For instance, NetApp and EMC published their own best practice whitepapers for using NFS on VMware.

On the VMware hosts, the following configuration should be implemented:

1. Use the same datastore name across all hosts.
2. Select “No” for NIC Teaming failback option. If there is some intermittent behavior in the network, this will prevent the flip-flopping of NIC cards being used.
3. If you increase the maximum number of concurrent mount points (from the default of 8), also increase Net.TcpipHeapSize as well. For instance, if 32 mount points are used, increase tcpip.Heapsize to 30MB.
4. Set the following VMware High Availability options: (NFS heartbeats are used to determine if an NFS volume is still available.)
NFS. Hearbeat.Frequency=12

When configured properly, IP based storage, specifically NFS, provides a very solid storage platform for VMware.