November 19, 2013 in Systems8 minutes
I recently had a need to deploy quite a few ESXi hosts on top of Cisco UCS B-Series blades (60+) back-ended by Netapp storage. I needed some kind of method to do this quickly so that I didn’t have to spend days just installing ESXi.
Here were some of the design guidelines:
Needed an ESXi 5.5 installation with the Cisco enic and fnic drivers installed, as well as the Cisco 1000v VEM module
Needed to install on a large number of hosts (50+)
Boot from SAN was a required component of the design, no local storage.
Because of the number of hosts, vSphere Auto Deploy was a no-brainer for me. However, this particular design required boot from SAN using the Netapp storage on the back-end, so a stateful configuration (introduced in vSphere 5.1) was needed. Auto Deploy would be removed later, so this was strictly a quick deployment mechanism.
I also had a few challenges to deal with
Provisioning of boot LUNs, LUN masking, and FC Switch Zoning would be incredibly tedious and prone to errors (120 WWPNs…not exactly a small task). Also needed to put into place a structure that allowed for flexible configuration when growth occurred.
Assigning IP addresses is also tedious - could be done with answer files and host profiles, but need to correlate with UCS Service Profile name.
So when you’re doing boot from SAN, there are a few things you need to worry about that you wouldn’t if you were running on local disk. The boot LUNs have to be provisioned in the first place, and all of a server’s WWPNs (2 for me) have to be given access to each LUN. You have to be careful not to give more than one server access to the same boot LUN otherwise you’ll really mess up the installation.
In addition, the FC switches need to have their zoning configuration updated so that each of the servers can access the storage.
So clearly, an automated approach is needed. I created a nice little snippet of PowerShell that will look at the service profiles in Cisco UCS (using the PowerTool module Cisco provided for PowerShell) and make sure that the boot lun and masking configuration in Netapp (using the Netapp PowerShell module) reflects that.
I also wrote another snippet to produce a zoning configuration on a pair of Nexus 5596UP switches. It’s a little long (NX-OS still doesn’t have a legitimate API yet) so I won’t post it here. I’ll be integrating it into my Flexpod Toolkit though.
If you have an existing Cisco UCS configuration (which I also create via PowerShell) using Netapp storage, this is a handy script to true-up your boot LUN configuration, even if only a few LUNs are missing (great for testing).
Auto Deploy involves quite a few components and external prerequisites. I recommend you follow Duncan’s walkthrough if you’ve never done it - it’s what I first used back when I learned it in my own home lab.
As the walkthrough states, you first need to:
Set up a TFTP server to serve up the Auto Deploy files you can download from the Auto Deploy server. (tftpd works GREAT)
Set up a DHCP server with options 66 and 67 filled out to point to your own TFTP server (believe it or not tftpd can do this too, great in a pinch)
host profile
Create an image with the drivers I needed
Set up a deployment rule
Ensure the boot order for the servers was correct
My host profile was designed JUST to enable stateful installations. Everything else was configured via PowerCLI so this was all I needed it to do.
Note that the “arguments for first disk” field contains the word “remote”. This is key if you want to install to a SAN LUN - the documentation doesn’t mention this keyword.
WARNING - I used quite a few means to ensure that only each host’s boot LUNs were available at the time they started up. It is really easy to overwrite an important LUN with the above settings if you’re not careful. Make sure you’re aware of how you have things configured.
To create the image I needed, I ran this snippet of PowerCLI:
Again, it’s important to remember that I didn’t provide any filter on oemstring, so any server that had access to the DHCP-enabled VLAN, upon reboot, would receive AutoDeploy treatment. This was a controlled environment where I knew the scope of this change. Be careful not to bork your servers. :)
This snippet of PowerShell (which requires Cisco’s PowerTool module) will do a good job of cycling through the exact service profiles that we want, and power them on. Be care with this, it’s easy to shut down basically every blade with a few changes to the code. However, as it stands, it will cycle through our Service Profiles, and power them on, all with a nice wait timer (5 min) for each run:
As you know, DHCP is required when running Auto Deploy, but I want my hosts to be statically addressed. So, I used a simple strategy - I created my DHCP scope in such a way so that if I rebooted hosts in the right order, and timed it so that each host got a sequential address, they’d all end up with the addresses I wanted them to have - it would just have happened via DHCP.
At this point I could go back in with a script, and simply re-use the same address, but make it static instead. There are other ways to get this done, but this was relatively simple and reliable. This snippet of PowerCLI will get into each host, and make the address change. With a simple tweak you could also do this through vCenter, assuming the hosts have been added to vCenter.
{%highlight powershell>}} for ($i=1; $i -le 60; $i++) { Write-Host “Connecting to $i” Connect-VIServer 10.102.40.$i -User “root” -Password "" -Force Get-VMHostNetwork | Set-VMHostNetwork -VMKernelGateway 10.102.40.1 Get-VMHostNetworkAdapter -Name “vmk0” | Set-VMHostNetworkAdapter -dhcp:$false -ip 10.102.40.$i -Subnetmask 255.255.255.0 -Confirm:$false Disconnect-VIServer 10.102.40.$i -Confirm:$false }
After a while, all of your hosts will show up in the cluster specified in the auto deploy rule configuration.
Just trying to put some tools together that will help me the next time a big deployment like this comes up. My Flexpod Toolkit is going to receive ALL of these snippets, and more. It’s of immensely greater value to run these tools inside the context of the data center solution as a whole - if a central automation toolkit can drive configurations based off of information already available in the infrastructure itself, then the value of the solution goes up exponentially.
I welcome any suggestions to anything I did here - I want to provide and improve upon these tools as a single unit as part of my Flexpod Toolkit project so that you can use them in your own environment.