Category Archives: Data Storage

Integrating Riverbed Steelfusion with EMC VNX

SteelFusion is an appliance-based IT-infrastructure for remote offices. SteelFusion eliminates the need for physical servers, storage and backup infrastructure at remote offices by consolidating them into the data centers. Virtual servers located at the data centers are projected to the branch offices, enabling the branch office users access to servers and data with LAN-like performance.

SteelFusion uses VMware to project virtual servers and data to the branch office. Robust VMware infrastructure usually consists of fiber channel block-based storage such as EMC VNX. The advantage of using EMC VNX or any robust storage platform is its data protection features such as redundancy and snapshots.

In order to protect data via the use of EMC VNX array-based snapshot, and so that data can be backed up and restored using 3rd party backup software, the following items must be followed:

1. When configuring storage and LUNs, use Raid Group instead of Storage Pools. Storage Pools snapshots do not integrate well with Steelfusion for now.

2. Create Reserve LUNs to be used for snapshots.

3. When adding the VNX storage array information to Steelfusion Core appliance, make sure to select ‘Type: EMC CLARiON’, not EMC VNX.

For more information, consult the Riverbed documentation.

Migrating Data to Isilon NAS

Isilon has made it easy to migrate data from NetApp filers to Isilon clusters. They made a utility called isi_vol_copy that copies files including its metadata and its ACL (access control list) information via NDMP protocol. The utility is run on the Isilon command line interface. There is no need to use a separate host that executes migration tools such as robocopy, which may be slower and more difficult to manage.

isi_vol_copy is versatile enough to do a full baseline copy of data and perform updates of the deltas on a daily basis using the incremental switch, until the day of the cutover. Since Isilon is BSD-based, the incremental copy jobs can be run via crontabs.

The load can also be distributed by running the isi_vol_copy utility on multiple nodes on the Isilon cluster.

The syntax of the command is:

isi_vol_copy <source_filer>:<directory> -full|incr -sa username:password <destination_directory_on_Isilon>

Using Isilon as VMware Datastore

I recently implemented a VMware farm utilizing Isilon as a backend datastore. Although Isilon’s specialty is sequential access I/O workloads such as file services, it can also be used as a storage for random access I/O workloads such as datastore for VMware farms. I only recommend it though for low to mid-tier VMware farms.

Isilon scale-out storage supports both iSCSI and NFS implementations. However, NFS implementation is far superior than iSCSI. The advantages of NFS are:

1. simplicity – managing virtual machines at the file level is simpler than managing LUNs,
2. rapid storage provisioning – instead of managing LUNs, all VMDK files may be stored on a single file export, eliminating the need to balance workloads across multiple LUNs,
3. higher storage utilization rates – VMDK files are thin-provisioned by default when using NAS-based datastore.

In addition, Isilon only supports software iSCSI initiators.

Isilon supports VAAI (vStorage APIs for Array Integration) which offloads I/O intensive tasks from the ESXi host to the Isilon storage cluster directly (such as when doing storage vmotion, virtual disk cloning, NAS-based VM snaphots, and VM instant provisioning), which results in overall faster completion times. Isilon also supports VASA (vStorage APIs for Storage Awareness) which presents the underlying storage capabilities to vCenter.

When using NFS datastore, it is very important to follow the implementation best practices which can be found here. Some of the important best practices are:

1. Connect the Isilon and ESXi hosts to the same physical switches on the same subnet. The underlying network infrastructure should also be redundant, such as redundant switches.
2. Use 10GB connectivity to achieve optimal performance.
3. Segment NFS traffic so that other traffic such as virtual machines network traffic or management network traffic do not share bandwidth with NFS traffic.
4. Use separate vSwiches for NFS traffic on the VMware and use dedicated NICs for NFS storage.
5. Use Smartconnect zone to load balance between multiple Isilon nodes, as well as dynamic failover and failback of client connections across the Isilon storage nodes.
6. Enable the VASA features and functions to simplify and automate storage resource management
7. To achieve higher aggregate I/O, create multiple datastores, with each datastore mounted via a separate FQDN/ Smartconnect pool and network interface on the Isilon cluster.

2015 Storage Trends

The world of data storage has seen significant innovation over the years. This year, companies will continue to adopt these storage technologies and storage vendors will continue to innovate and develop exciting products and services. Here are my top 5 storage trends for this year:

1. Software-defined storage (SDS) or storage virtualization will start to have huge adoption in tier-2 or tier-3 storage. Virtual storage appliances such as Nutanix and Virtual SAN-like solutions such as VMware virtual-SAN will find their way in companies looking for simple converged solutions.

2. The cost of flash storage will continue to drop, driving its deployment to tier-1, I/O intensive applications such as VDI. Flash storage will also continue to be used on server-side flash, and on hybrid or tiered-based appliances.

3. Small and medium companies will make headway in utilizing the cloud for storage, but mostly as backup and sync-and-share applications.

4. Storage vendors will release products with integrated data protection including encryption, archiving, replication, backup, and disaster recovery.

5. Finally, the demand for storage will continue to grow because of the explosion of big data, the “internet of things”, and large enterprises building redundant data centers.

Delivering Centralized Data with Local Performance to Remote Offices

One of the challenges large companies are facing is how to build and support IT infrastructure (network, server, and storage) for remote offices. Remote offices usually do not have full time IT employees because it is usually cost prohibitive to employ full time IT personnel to support a small IT infrastructure. In addition, large companies are very protective of their data and want their data to be centralized at their data centers sitting on a reliable and well protected infrastructure.

However, centralizing the infrastructure and data location may lead to poor performance for the local site, especially if the WAN bandwidth and latency is not that great.

Enter Riverbed Steelfusion. Riverbed SteelFusion is a branch converged infrastructure solution that centralizes data in the datacenter and delivers local performance and nearly instant recovery at the branch. It does this by consolidating branch servers, storage, networking and virtualization infrastructure into a single solution.

With Steelfusion, a virtual machine which will act as a branch file or application server is provisioned at the data center where a Steelfusion Core is located, and is projected to the branch via the Steelfusion Edge located at the branch office.

Steelfusion has the following advantages:

1. No more costly installation and maintenance of servers and storage at the branch office.
2. LAN performance in the branch, which will make end users happy.
3. Centralized management of storage, servers, and data at the data center.
4. No more costly branch backup (such as backup hardware and software, tape media, backup management, off-site handling, etc)
5. Improved recovery of servers and applications.
6. Quick provisioning of servers.
7. Data is secure in the data center, in case branch office has a disaster or theft.

Delivering data and applications located at the data centers to branch/remote offices while maintaining local area performance can be accomplished by using Riverbed Steelfusion.

Installing High Performance Computing Cluster

A high performance computing (HPC) cluster is usually needed to analyze data from scientific instruments. For instance, I recently setup an HPC cluster using Red Hat Enterprise Linux 6.5 consisting of several nodes which will be used to analyze data generated from a gene sequencer machine.

Basically, to build the cluster, you need several machines with high speed processors and multiple cores, lots of memory, a high speed network to connect the nodes, and a huge and fast data storage. You also need to install an operating system – such as the Red Hat or CentOS Linux, and configure tools and utilities such as kickstart, ssh, NFS, and NIS. Finally, a cluster software or queueing system is needed to manage jobs to fully utilize the compute resources. One of the commonly used open source cluster software is Son of Grid Engine (SGE)  – an offshoot of the popular Sun Grid Engine.

An excellent write up for setting up an HPC cluster can be found at this Admin article.

The latest Son of Grid Engine version (as of this writing) is 8.1.7 and can be downloaded from the Son of Grid Engine Project Site.

Since the environment I setup is running Red Hat Enterprise Linux 6.5, I downloaded and installed the following rpms:

gridengine-8.1.7-1.el6.x86_64.rpm
gridengine-execd-8.1.7-1.el6.x86_64.rpm
gridengine-qmaster-8.1.7-1.el6.x86_64.rpm
gridengine-qmon-8.1.7-1.el6.x86_64.rpm

After the installation of the rpms, I installed and configured the qmaster, then installed sge (execd) on all the nodes. I also ran a simple test to verify that the cluster is working by issuing the following commands:

$ qsub /opt/sge/examples/jobs/simple.sh
$ qstat

Redefining Data Center In A Box

Data center in a box is traditionally defined as a “type of data center in which portable, mobile, and modular information nodes are self-contained within a cargo container. It is designed and packaged for quick deployment and acquisition of data center solutions in organizations or facilities, including remote off-site locations.” Data center in a box usually contains equipment from large storage, compute, and network vendors such as EMC, NetApp, Dell, and Cisco. They are pieced together to form the IT infrastructure. Virtual Computing Alliance (VCE) for instance, offers Vblock, a bundled product containing EMC storage, Cisco servers, and VMware. NetApp has a similar offering called Flexpod.

But new innovative companies such as Simplivity, Nutanix, and Scale Computing are changing the definition of data center in a box. They are creating a purpose-built product from the ground up that incorporates not just compute, storage, and network, but additional services such as data deduplication, wan optimization, and backup in a box.

For instance, Simplivity’s product called OmniCube is “a powerful data center building block that assimilates the core functions of server, storage and networking in addition to a wide range of advanced functionality including: native VM-level backup, WAN optimization, bandwidth efficient replication for DR, cache accelerated performance, and cloud integration.”

These products will further simplify the design, implementation, and operation of IT infrastructure. With these boxes, there is no more storage area network (SAN) to manage, nor additional appliances such as WAN accelerator to deploy. A few virtual machine (VM) administrators can manage all the boxes in a cluster from the VMware server virtualization management user interface.

Data center in a box will continue to evolve and will change how we view and manage IT infrastructure for years to come.

Data Migration Using PowerPath Migration Enabler

One project I recently led is the migration of data from an old EMC Clariion to the new EMC VNX. There are a couple of strategies for migrating block data on a storage area network (SAN) – either use storage-based migration (migration is between the two storage arrays) or use host-based migration (migration is done on the host). EMC provides several tools for accomplishing these tasks. SAN Copy for instance is an excellent storage-based migration tool.

There are many factors to consider when choosing a migration strategy – size of data, cost, SAN bandwidth, complexity of the setup, application downtime, among many others. One strategy that is relatively simple and requires no downtime is to use the host-based migration tool PowerPath Migration Enabler Hostcopy.

This tool is part of PowerPath when you install the full software. In version 5.7 SP2, as long as the PowerPath is licensed, there is no additional license needed for Hostcopy (unlike the older version).

The migration process is non disruptive. It does not require shutting down the application. The host is still operational while migration is going on. In general, the steps for migrating data are:

1. On Windows or Linux host, make sure Powerpath 5.7 SP2 is installed and licensed.

powermt check_registration

2. Check source disk and record the disk pseudo name.

powermt display dev=all

3. On new storage, present the target LUN to host.

4. On host, rescan and initialize the target disk.

5. Check that the target disk is present and record the pseudo name.

powermt display dev=all

6. Setup the PowerPath Migration Enabler session

powermig setup -src harddiskXX -tgt harddiskYY -techType hostcopy

7. Perform initial synchronization

powermig sync -handle 1

8. Monitor status of the session

powermig query -handle 1

9. The data transfer rate can also be throttled

powermig throttle -throttleValue 0 -handle 1

10. When ready to switch over to the new storage, enter the following command:

powermig selectTarget -handle 1

11. Commit the changes

powermig commit -handle 1

12.Cleanup/delete the session

powermig cleanup -handle 1

13. Remove the old storage by removing lun from the old storage group

14. On host, rescan HBA for hardware changes, then remove old LUNs from PowerPath

powermt display dev=all
powermt remove dev=all
powermt display dev=all

For more information about PowerPath Migration Enabler, visit EMC website.

EMC VNX2 Storage Array Review

VNX is EMC’s unified enterprise storage solution for block and file. The latest release called VNX2, uses the advanced Intel Sandy Bridge processor with more cores. It also has more memory (RAM).

It’s Fast VP technology which dynamically moves data between SSD (flash drives), SAS drives and NL-SAS tiers, is now improved by decreasing the data “chunk” from 1GB to 256MB, which allows greater efficiency of data placement. Also, using SSD as the top tier is new in VNX2.

It’s Fast Cache technology has also been improved. Per EMC, “the warm up time has been improved by changing the behavior that when the capacity of FAST Cache is less than 80% utilized, any read or write will promote the data to FAST Cache.”

VNX2 boasts of its active/active LUNs configuration. However, active/active LUNs only work when the LUN is provisioned using RAID Groups. It does not work with Storage Pools. Hopefully, active/active LUNs will be available for Storage Pools in the future because more and more LUNs are being configured using Storage Pools instead of RAID Groups.

Another improvement is that in Unisphere, storage administrators do not need to set the storage processors (SP) cache settings – read and write cache settings and high and low water marks. It needs only to be turned on or off. The system now adjusts the cache settings automatically.

There are also no hot spare drives now. You simply don’t provision all the drives, and a blank drive becomes a hot spare. You can set the hot spare policy for each type of drive. The recommended is 1 per 30 drives.

I noticed a couple of shortcomings in this release. I do not like the fact that when creating a LUN in a pool, the “thin” is checked by default now. I believe that thick LUNs should be the default because of performance considerations. In addition, if storage administrators are not careful, they may end up over-provisioning the pool with thin LUNs.

On the file side, there is really no major improvement. I believe there is no updates on the data movers. Data movers still function in active/passive mode. One change though is that you can now use VDM (Virtual Data Mover) for NFS, although to configure this, you need to use the CLI.

Overall, VNX2 is one of the best enterprise storage array in terms of its performance and functionality.

Data At Rest Encryption

When the Internet was invented several decades ago, security was not in the minds of the pioneers. TCP/IP, the protocol used to send data from one point to the next was inherently insecure. Data are being sent over the wire in clear text. Today, advances in encryption technologies enabled the data to be secure while in transit. When you shop at reputable websites, for instance, you can be sure that the credit card number you send over the Internet is encrypted (You will see https on the URL instead of http). Most web applications now (such as gmail, facebook, etc) are encrypted.

However, most of these data, when stored on the servers (data at rest) are still not encrypted. That’s why hackers are still able to get hold of these precious data, such as personally identifiable information (PII) – credit card numbers, social security numbers, etc. as well as trade secrets and other company proprietary information. There are a lot of ways to secure data at rest without encrypting them (such as using better authentication, better physical security, firewalls, using secured applications, better deterrent to social engineering attacks, etc.), but encrypting data at rest is another layer of security to make sure data is not readable when hackers get a hold of them.

The demand for encrypting data at rest is growing, especially now that more data are being moved to the cloud. Enterprise data centers are also being required to encrypt data on their storage systems, either by business or compliance need.

Luckily, IT storage companies such as EMC, NetApp, and many others are now offering encryption for data at rest on their appliances. However, encrypting data is still expensive. Encrypting and decrypting data need a lot of processing power. Moreover, adding encryption to the process may slow down the access of data. Better key management system is also needed. For instance, when using the cloud for storage, data owners (as opposed to service providers) should solely possess the keys and should be able to manage the keys easily.

The Internet will be more secure if data is encrypted not only during transit but also during storage.