I have been working recently configuring VMWare SRM using Hitachi AMS 2000 arrays. This has not been the most straightforward or well documented process. I hope this post will save someone else a little time.
In this post I will focus on the configuration and the required prerequisites for the HDS SRA 2.0 for VMWare SRM. During testing I used ESX 4.0 update1, vCenter 4.0 update1 and SRM 4.01.
When working with this solution there are some obvious pieces of documentation such as the following:
Hitachi Storage Replication Adapter Software VMware vCenter Site Recovery Manager Deployment Guide
Site Recovery Manager Administration Guide
There are also some other helpful pieces of documentation which may not be as obvious, such as:
Hitachi AMS 2000 Family Command Control Interface (CCI) Installation, Reference, and User’s Guides. These can be found on the HDS Support portal.
Implementing VMware Site Recovery Manager with Hitachi Enterprise Storage Systems Whitepaper
How To Debug CCI Issues 1.3 article on the HDS GSC website
Also, the sample horcm.conf file installed with the CCI has some good information in the comments.
Now on to the configuration.
Prerequisites
Before configuration of the HDS SRA is possible the following requirements must be in place:
- Two HDS AMS2000 arrays connected by WAN (FC or iSCSI)
- One VMWare vCenter installation in the primary/protected site
- One VMWare vCenter installation in the secondary/recovery site
- One VMWare SRM installation in the primary/protected site
- One VMWare SRM installation in the secondary/recovery site
- TrueCopy replication in place for LUN’s with protected datastores
- SRM Sites paired
Test Environment
Our test environment consists of one cluster at the primary site and one at the secondary site. We have a test SharePoint environment which is stored across four VMFS datastores. See the diagram below.
Our goal for this test configuration will be to failover the SharePoint environment to the recovery site. We are replicating the LUN’s containing all of the SharePoint system data using TrueCopy Extended Distance. Also SRM is installed at both sites and the sites have been paired.
After the items above are in place we can move on to the configuring the SRA.
HDS SRA 2.0 Configuration
When configuring the storage replication adapter the primary documentation is the deployment guide referenced above. I thought there were a few things missing from the document.
The first step in getting the SRA configured is making sure you have a copy of the proper HDS CCI for your array firmware. The HDS SRA relies on the Hitachi Command Control Interface, which must be installed on the SRM servers. I installed the CCI in the default c:HORCM directory on both SRM servers. This is a straightforward install and is documented in the Hitachi AMS 2000 Family Command Control Interface (CCI) Installation Guide.
The portion of the CCI install that was tough for me was determining it needed to be installed as a service and creating the horcm.conf files. Our above example will only require two instances of the horcm service. One on each SRM server they will be HORCM0 and HORCM1.
To create the services we create the horcm_run.txt files and issue the following commands.
On the first SRM server – create the c:HORCMToolhorcm0_run.txt file and the execute the following command:
C:HORCMtoolsvcexe /S=HORCM0 /A=C:HORCMToolsvcexe.exe
On the second SRM server – create the c:HORCMToolhorcm1_run.txt file and the execute the following command:
C:HORCMtoolsvcexe /S=HORCM1 /A=C:HORCMToolsvcexe.exe
The horcmx_run.txt is created by making a copy of the file naming it appropriately and setting the HORCMINST variable to the correct instance number. This is documented in the file located in HORCMTool
After running these commands you should see the services appear in the windows services MMC
Then add the following lines to the %systemroot%driversetcservices file on each SRM server.
1 2 |
horcm0 11000/udp #horcm0 CCI service horcm1 11001/udp #horcm1 CCI service |
The file should appear as below with one blank line below the last horcm service
1 2 3 4 5 |
rasadv 9753/udp horcm0 11000/udp #horcm0 CCI service horcm1 11001/udp #horcm1 CCI service imip-channels 11320/tcp #IMIP Channels Port |
Once the services are installed the next step is to create the horcm.conf files. The first thing we need to do this is a command device. The SRA deployment guide left this step out. This is documented in the VMWare SRM with Enterprise Storage whitepaper mentioned earlier. Basically you create a small LUN and present it to the SRM server as a physical compatibility RDM. Then you initialize the disk and create a basic primary partition, but do not assign a drive letter or format it. I found one HDS document that said this LUN should be 33MB and one that said 36MB so I made it 40MB. Once this is done we have all that is needed to create the horcm.conf files.
HORCM0.conf on the SRM server at primary site
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
#/************************* For HORCM_MON *************************************/ HORCM_MON #ip_address service poll(10ms) timeout(10ms) srm1.test.local horcm0 8000 3000 #/************************** For HORCM_CMD ************************************/ HORCM_CMD #dev_name dev_name dev_name .CMD-11111111-64 #/************************** For HORCM_LDEV ***********************************/ HORCM_LDEV #dev_group dev_name Serial# CU:LDEV(LDEV#) MU# DRLAB_SRM_TEST DRLAB_AP_OS 11111111 0x003C DRLAB_SRM_TEST DRLAB_DB_OS 11111111 0x003D DRLAB_SRM_TEST DRLAB_OLTP_LOG 11111111 0x003E DRLAB_SRM_TEST DRLAB_OLTP_DATA 11111111 0x003F #/************************* For HORCM_INST ************************************/ HORCM_INST #dev_group ip_address service DRLAB_SRM_TEST srm2.test.local horcm1 |
HORCM1.conf on the SRM server at secondary site
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
#/************************* For HORCM_MON *************************************/ HORCM_MON #ip_address service poll(10ms) timeout(10ms) srm2.test.local horcm1 8000 3000 #/************************** For HORCM_CMD ************************************/ HORCM_CMD #dev_name dev_name dev_name .CMD-11111112-4 #/************************** For HORCM_LDEV ***********************************/ HORCM_LDEV #dev_group dev_name Serial# CU:LDEV(LDEV#) MU# DRLAB_SRM_TEST DRLAB_AP_OS 11111112 0x0001 DRLAB_SRM_TEST DRLAB_DB_OS 11111112 0x0005 DRLAB_SRM_TEST DRLAB_OLTP_LOG 11111112 0x0006 DRLAB_SRM_TEST DRLAB_OLTP_DATA 11111112 0x0007 #/************************* For HORCM_INST ************************************/ HORCM_INST #dev_group ip_address service DRLAB_SRM_TEST srm1.test.local horcm0 |
A couple of points to note on these files is the relationship between the hosts and devices in the group. The HORCM_LDEV section on each instance contains a reference to the half of the pair it controls. The HORCM_INST section contains a reference to the opposite instance in each file. Also the command device naming format. It consists of “.CMD-” followed by the array serial number and the LUN number “.CMD-11111112-4”. Now that we have the configuration files we copy them into the %windir% on their respective SRM servers and start the services.
After this is complete we can install the SRA. This is downloaded from the VMWare website and the executable is named RMHTCSRA.exe. It is a simple, no option, install. After this there are some environment variable which need to be set.
setx SplitReplication true /m
setx RMSRATMU 1 /m
Then reboot the SRM servers. We are now ready to configure the SRA using the SRM plug-in in vCenter.
Here we see the paired sites in site recovery.
Click on configure array managers and we see the following dialog.
Click Add to add a protected site array manager and we see the following configuration dialog.
We enter the name and HORCMINST=0 for the first instance on the primary server in the protected site. The we use a different name and HORCMINST=1 for the next instance on the secondary server in the recovery site. Here we see both sides configured.
The last step in the wizard allows us to confirm the SRA sees the replicated datastores properly.
We see the LUN numbers match the devices in the horcm<x>.conf files. The datastore group in this diagram consists of four LUN’s which also belong to the same TCE consistency group. These are the LUN’s being used by our test SharePoint application and database servers.
At this point we are now ready to complete configuration of the protection groups and recovery plans in Site Recovery Manager. The process for configuring these is documented in the SRM administration guide. Protection groups are configured at the protected site and recovery plans are configured at the recovery site. Here is a screenshot of the test recovery plan for our SharePoint environment.
When we run a test on this recovery plan we can see the test runs successfully and waits for us to complete testing before clicking continue to return to a ready state.
During this phase we can look at a couple of things to confirm what is happening in the process. One is the new datastores we will see in the configuration tab of the DR ESX hosts.
There was no need to change the LVM.EnableResignature or LVM.DisallowSnapshotLun settings at the host level in ESX 4 as this is enabled at the volume level and SRM handles this at the time of testing or failover. Another part of the process we can confirm at this time is the status of the TrueCopy pairs. Here we see the pairs are in split status.
Now we can complete any other testing to confirm success of the test and then click continue in the recovery plan to return to a ready state. After the test completes we can see the datastores are removed from the recovery ESX hosts and the TCE pairs are returned to a paired status after resynchronization.
After the testing process is completed we can review some of the steps in the SRM logs. The logs are located under %allusersprofile% VMwareVMware vCenter Site Recovery ManagerLogs. These log entries and the HORCM logs under c:HORCMlog are the primary sources of information in troubleshooting problems with this process.
I hope someone finds this post useful. Next I am going to be testing this with secondary copies at the recovery site using Shadowimage and Copy-on-Write.
Regards,
Dave