(SOLVED) Problem upgrading from NX-OS 7.0(3)I7(5a) to 7.0(3)I7(7) on Nexus 3000 switches for a NTAP Cluster

Last week I heard a lot about the big Cisco CDP bug. I offered to help a customer upgrade their Nexus 3132Q-V switches from NX-OS 7.0(3)I7(5a) to 7.0(3)I7(7) and subsequently apply the Software Maintenance Update(SMU) to fix the CDP bug. I have done plenty of NX-OS upgrades. This was the first time I got stuck all due to a bug. Here we go:

Before upgrading anything, you should always consult the relevant locations for upgrade advice. These include NetApp’s Active IQ (Login required, for ONTAP upgrade advice) and the (old) Software Download Page (Login required, the Cluster Network switch details are not on the new location as of this post) where you would select your Switch Brand (Broadcom, Cisco, NetApp) in the pull-down menu next to the Cluster Network/Management Switches. These pages show appropriate compatibility matrices for ONTAP, switch OS versions and Reference Configuration File (RCF) versions.

In our case, we are using ONTAP 9.6P5 on the Cluster and the Nexus 3132Q-V switches were running 7.0(3)I7(5a). According to Cisco, the CDP bug above is fixed by upgrading to 7.0(3)I7(7) and then applying the SMU.

We followed directions from NetApp to do the upgrade. There are a number of verification steps and actions that are done from ONTAP to essentially force the Cluster Network to use one switch. At that point the unused switch is upgraded and brought back into service. This is where we are picking up in the docs.

First we copy the the new code to the switch:

ntapclus-sw1# copy http://laptop:8123/nxos.7.0.3.I7.7.bin bootflash:///nxos.7.0.3.I7.7.bin vrf management

This completed in about 3-4 minutes (nearly a 1GB file). The next step is to simply install the new NX-OS code:

ntapclus-sw1# install all nxos bootflash:///nxos.7.0.3.I7.7.bin
Installer will perform compatibility check first. Please wait.
Installer is forced disruptive

Verifying image bootflash:/nxos.7.0.3.I7.7.bin for boot variable "nxos".
[################## ] 86% -- FAIL.
Return code 0x40450030 (Digital signature verification failed).
Pre-upgrade check failed. Return code 0x40930011 (Image verification failed).
ntapclus-sw1#

Huh? What is that Digital signature verification failed? I tried multiple times with it always failing at either 85% or 86%. So for due diligence, I decided to check the MD5SUM:

ntapclus-sw1# sho file bootflash:///nxos.7.0.3.I7.7.bin md5sum
a9d40fbfaf43c214c3d97cb290788d06

Well, that matches exactly from Cisco’s website so I know the code downloaded to my laptop fine and subsequently transferred to the switch without error. Off to search on Google. In a few minutes , while not exactly what I was hoping for, I found a Cisco defect (CSCvm37015 , Cisco Login required!) that indicated this exists in NX-OS 7.0(3)I7(5a). The solution is to disable digital image signature verification. This is easily done:

ntapclus-sw1# configure
Enter configuration commands, one per line. End with CNTL/Z.
ntapclus-sw1(config)# no feature signature-verification
WARNING: This will disable digital image signature verification for all NxOS software attempted to be installed using any install method.
Are you sure you want to continue? (y/n) : [n] y
WARNING: Image Signature Verification has been Disabled!
ntapclus-sw1(config)# end
ntapclus-sw1# copy run st
[########################################] 100%
Copy complete, now saving to disk (please wait)…
Copy complete.

Ok, I tried installing NX-OS 7.0(3)I7(7) again to see what happens:

ntapclus-sw1# install all nxos bootflash:nxos.7.0.3.I7.7.bin
Installer will perform compatibility check first. Please wait.
Installer is forced disruptive
Verifying image bootflash:/nxos.7.0.3.I7.7.bin for boot variable "nxos".
[####################] 100% -- SUCCESS

Verifying image type.
[####################] 100% -- SUCCESS -- SUCCESS

Preparing "nxos" version info using image bootflash:/nxos.7.0.3.I7.7.bin.
[####################] 100% -- SUCCESS

Preparing "bios" version info using image bootflash:/nxos.7.0.3.I7.7.bin.
[####################] 100% -- SUCCESS -- SUCCESS

Performing module support checks. -- SUCCESS

Notifying services about system upgrade. -- SUCCESS

Compatibility check is done:
Module bootable         Impact Install-type Reason
------ -------- -------------- ------------ ------
     1      yes     disruptive        reset default upgrade is not hitless

Images will be upgraded according to following table:
Module      Image Running-Version(pri:alt)        New-Version Upg-Required
------ ---------- ------------------------ ------------------ ------------
     1       nxos             7.0(3)I7(5a)        7.0(3)I7(7)          yes
     1       bios       v04.24(04/21/2016) v04.24(04/21/2016)           no

Switch will be reloaded for disruptive upgrade.    
Do you want to continue with the installation (y/n)? [n] y

Install is in progress, please wait.
Performing runtime checks. -- SUCCESS

Setting boot variables.
[####################] 100% -- SUCCESS

Performing configuration copy.
[####################] 100% -- SUCCESS

Module 1: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Finishing the upgrade, switch will reboot in 10 seconds.
ntapclus-sw1#
Network error: Software caused connection abort

Excellent! Looks like that solved the issue! I waited a few minutes for the switch to come back online. Then I logged in to undo the digital image signature verification modification:

ntapclus-sw1# conf
Enter configuration commands, one per line. End with CNTL/Z.
ntapclus-sw1(config)# feature signature-verification
ntapclus-sw1(config)# end
ntapclus-sw1# copy run startup-config
[########################################] 100%
Copy complete, now saving to disk (please wait)…

And and apply the SMU:

ntapclus-sw1# install add bootflash:nxos.CSCvr09175-n9k_ALL-1.0.0-7.0.3.I7.7.lib32_n9000.rpm activate
Adding the patch (/nxos.CSCvr09175-n9k_ALL-1.0.0-7.0.3.I7.7.lib32_n9000.rpm)
[####################] 100%
Install operation 1 completed successfully at Fri Feb 14 15:27:40 2020

Activating the patch (/nxos.CSCvr09175-n9k_ALL-1.0.0-7.0.3.I7.7.lib32_n9000.rpm)
[####################] 100%
Install operation 2 completed successfully at Fri Feb 14 15:27:50 2020

ntapclus-sw1# install commit
[####################] 100%
Install operation 3 completed successfully at Fri Feb 14 15:27:55 2020

ntapclus-sw1#

After that finished, we repeated the process (using NetApp Docs from above) to revert back to normal operations and then shift the Cluster LIFs to the other switch. This allows for the second switch to be updated also.

Modifying the destination SnapMirror Volume when using XDP

References: 
How to modify Destination SnapMirror volume size and other attributes

SnapMirror Configuration Best Practice ONTAP 9.1-9.2

NetApp has been using a replication technology called SnapMirror for a very long time. I have been around so long, I remember when when we called it BMM or Bare Metal Migrate. It was used to help customers replicate (or migrate) data from an old Network Appliance, Inc filer to a new model. It was barbaric! You actually had to create some floppies. The source and the destination had to boot from the special media. A special connection was made over IP (usually a point-to-point Gigabit link, but sometimes 100BaseT). The transfer started and finished. This was the beginning of SnapMirror. Over the next years it molded into a product that was available in the operating system (again, before ONTAP, before 7-mode!) and was dubbed SnapMirror.

The Types

Ever since that time, the underlying replication engine was known as DP or Data Protection. This intelligently moved blocks of data from the source to the destination. When SnapVault was introduced, a new engine was used to move data around. This engine was called XDP or eXtended Data Protection. This engine is more of a logical block transfer that was updated to be used in regular SnapMirror transfers as well. In ONTAP 9.1, it became the default for SnapMirror and in 9.3 it became the default for SVM-DR. In the next couple of releases, I Suspect DP will be removed completely from ONTAP.

Forwards and Backwards

One the fantastic benefits of an XDP mirror is the ability to replicate to releases on ONTAP forwards or backwards! Historically, a DP mirror can only mirror to the current version or newer and never backwards. As ONTAP rapidly advances, this affords the customer the ability to no longer have to keep source and destination ONTAP releases tight. They can let them change and still maintain the ability to migrate forwards and backwards.

Changes Stick…

When a DP SnapMirror was configured, the destination needed to be the same size or larger than the source. After the relationship was established, nearly any change on the source was made automatically on the destination. Resizing the source volume, adding more inodes, changing the Snapshot schedule or SnapShot reserve space…all of things and more triggered the same change to happen on the destination.

…But not with XDP

This is no longer the case with an XDP SnapMirror, especially in a SVM-DR relationship. In this setup, the relationship will mirror all volumes from one SVM to another and all volumes in the SVM are replicated using XDP. If a change is made on the source and want that change to be applied to the destination, the storage administrator needs to make that happen. This process cannot be done normally on the destination. Why? When an XDP mirror has been established, the destination becomes read-only and the storage admin is no longer able to run commands on the destination volume.

The Long Way

From experience, I can tell you there are two way to make changes to the destination. The first way is the long way. There are a few steps here:

  1. Update your mirror
  2. Quiesce your mirror
  3. Break your mirror (which now creates a RW destination)
  4. Make any changes to the volume
    1. vol size or vol modify commands to manipulate inodes, snapshot reserve space, etc.
  5. Re-sync the mirror
  6. Resume the mirror

This is a lot of work. Resyncing can take time. There is an alternative as I recently found out!

The Short Way

The short was is actually very simple and it uses a command I recently became aware of.

Note: This is a warning! Use of “diag” mode on the ONTAP CLI can be dangerous and result in loss of data.

The short way is also very quick. First access the CLI. Use SSH or the console or the Service-Processor(SP) or the Baseboard Management Controller (BMC) and access the ONTAP command line. Then, enter “diag” mode:

::> set -privilege diag
::*>

Your normal prompt will change and display the asterisk as indicated above. All thats left is to modify the destination volume(s):

::*> vserver config override -command "volume size -vserver <dest_SVM> -volume <dest_vol> -new-size <new_size>"
::*> vserver config override -command "volume modify -vserver <dest_SVM> -volume <dest_vol> -autosize-mode {off|grow|grow_shrink}"

Make the changes that are required (volume size, autogrow mode, autogrow thresholds, etc) and then just set the mode back to admin:

::*> set -privilege admin
::>

Ever wish you could SORT on the ONTAP CLI?

ONTAP has a very good command line interface (CLI). There is an online document titled ONTAP 9 System Administration Reference. This is a PDF documenting the ONTAP CLI. There are three privileges which include:

  • admin (default)
    • Most commands and parameters are available at this level. They are used for common or routine tasks
  • advanced
    • Commands and parameters at this level are used infrequently, require advanced knowledge, and can cause problems if used inappropriately. You use advanced commands or parameters only with the advice of support personnel
  • diagnostic
    • Diagnostic commands and parameters are potentially disruptive. They are used only by support personnel to diagnose and fix problems
Moving forward, this blog post will presume that the reader is knowledgeable about the ONTAP CLI. If you would like to know more about the ONTAP CLI, please refer to the link above.

Why Sort?

There are many times where I am looking through ONTAP CLI output and find it difficult to read as a human. I always wished there was a way I could sort the output on the CLI without have to script from linux and use sort. Some examples could be that I have a large number of data LIFs on multiple SVMs and would like to see easily what addresses are in use or looking through a large number of flexvols and would like the output in a particular order.

SSSSHHHHH!

To enable sorting on the ONTAP CLI requires the use of diagnostic mode as indicated earlier. Even more, the sort option is truly hidden and the argument itself will not obey the ONTAP CLI tab-completion! You must type out the argument completely.

How to make it work

From the ONTAP CLI, eneter into diagnostic mode:

cluster::> set -privilege diagnostic

Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}

cluster::*>

Notice the prompt changed to indicate you are no longer in the default/standard admin mode. Be extra careful in diagnostic (or even advanced) mode as entering the wrong command here could result in data loss. The sorting option I am about to show you is for you to piggy-back on to regular commands you already use.

Let’s Sort!

Now that we are in diagnostic mode, it is very easy to sort output. Anywhere in your command, simply add in -sort-by field1 <,field2,field3,etc>. The argument does not need to be at the end, it is usually most convenient there.

Tip: to get rid of the ONTAP CLI pagination, enter: rows 0
This turns off the per-screen paging and usually tightens up output

I was able to access a customers’ system for some sample output. The names have been changed to protect the innocent. Let’s look at some regular output, noting all the Network Addresses and how they are not sorted. The normal output is sorted by vserver then by LIF name. Note, the output has had only the right-most two field removed for readability!

            Logical    Status     Network            Current       
Vserver     Interface  Admin/Oper Address/Mask       Node          
----------- ---------- ---------- ------------------ ------------- 
Cluster
            CustCluster1_clus1 up/up 169.254.5.203/16 CustCluster1
            CustCluster1_clus2 up/up 169.254.91.160/16 CustCluster1
            CustCluster2_clus1 up/up 169.254.196.87/16 CustCluster2
            CustCluster2_clus2 up/up 169.254.18.84/16 CustCluster2
            CustCluster3_clus1 up/up 169.254.158.204/16 CustCluster3
            CustCluster3_clus2 up/up 169.254.243.163/16 CustCluster3
            CustCluster4_clus1 up/up 169.254.33.135/16 CustCluster4
            CustCluster4_clus2 up/up 169.254.142.203/16 CustCluster4
CustSVM01
            cifs_ediscovery up/up 192.172.76.192/24    CustCluster4 
            mgmt_svm     up/up    192.172.239.205/24   CustCluster4 
CustSVM02
            cifs_eaprofile up/up  192.172.76.207/24    CustCluster1 
            cifs_cust     up/up    192.172.76.193/24    CustCluster3 
            cifs_cust_COPS_G up/up 192.172.76.195/24    CustCluster3 
            cifs_cust_PST_archive up/up 192.172.76.196/24 CustCluster3
            cifs_cust_archive up/up 192.172.76.197/24   CustCluster3 
            cifs_cust_auditlogs up/up 192.172.76.198/24 CustCluster3 
            cifs_cust_citrix up/up 192.172.76.199/24    CustCluster3 
            cifs_cust_g_drive up/up 192.172.76.200/24   CustCluster3 
            cifs_cust_nya_public up/up 192.172.76.201/24 CustCluster3
            cifs_cust_ovw up/up    192.172.76.202/24    CustCluster3 
            cifs_cust_software up/up 192.172.76.203/24  CustCluster3 
            cifs_cust_sqlbackups up/up 192.172.76.204/24 CustCluster3
            cifs_cust_support up/up 192.172.76.205/24   CustCluster3 
            cifs_cust_utility up/up 192.172.76.194/24   CustCluster3 
            cifs_cust_wtr up/up    192.172.76.206/24    CustCluster3 
            mgmt_svm     up/up    192.172.239.206/24   CustCluster4 
            nfs_ISO      up/up    192.172.73.199/24    CustCluster3 
            tmp_nbu      up/up    192.172.76.209/24    CustCluster1 
CustZCluster
            mgmt_D1      up/up    192.172.239.193/24   CustCluster1 
            mgmt_D2      up/up    192.172.239.194/24   CustCluster2 
            mgmt_D3      up/up    192.172.239.195/24   CustCluster3 
            mgmt_D4      up/up    192.172.239.196/24   CustCluster4 
            mgmt_cluster up/up    192.172.239.192/24   CustCluster1 
            snapmirror_D1 up/up   192.172.239.240/24   CustCluster1 
            snapmirror_D2 up/up   192.172.239.241/24   CustCluster2 
            snapmirror_D3 up/up   192.172.239.242/24   CustCluster3 
            snapmirror_D4 up/up   192.172.239.243/24   CustCluster4 
CustZServSVM01
            cifs_AT_PAASDB_DC00 up/up 192.172.76.210/24 CustCluster4
            cifs_fsw_gp_evclu_dc01 up/up 192.172.76.208/24 CustCluster4
            cifs_fsw_jd1sw_ssawcl up/up 192.172.76.214/24 CustCluster4
            iscsi1_D1_e0e up/up   192.172.74.192/24    CustCluster1 
            iscsi1_D2_e0e up/up   192.172.74.193/24    CustCluster2 
            iscsi1_D3_e0e up/up   192.172.74.194/24    CustCluster3 
            iscsi1_D4_e0e up/up   192.172.74.195/24    CustCluster4 
            iscsi2_D1_e0f up/up   192.172.75.192/24    CustCluster1 
            iscsi2_D2_e0f up/up   192.172.75.193/24    CustCluster2 
            iscsi2_D3_e0f up/up   192.172.75.194/24    CustCluster3 
            iscsi2_D4_e0f up/up   192.172.75.195/24    CustCluster4 
            mgmt_svm     up/up    192.172.239.207/24   CustCluster4 
            nfs_D1       up/up    192.172.73.192/24    CustCluster1 
            nfs_D2       up/up    192.172.73.193/24    CustCluster2 
            nfs_D3       up/up    192.172.73.194/24    CustCluster3 
            nfs_D4       up/up    192.172.73.195/24    CustCluster4 
            nfs_vm_gp_ev_dc01_os_index up/up 192.172.73.203/24 CustCluster1
            nfs_vm_gp_ev_dc01_vault up/up 192.172.73.204/24 CustCluster4
            nfs_vm_gp_ev_dc02_os_index up/up 192.172.73.201/24 CustCluster1
            nfs_vm_gp_ev_dc02_vault up/up 192.172.73.207/24 CustCluster4
            nfs_vm_gp_ev_dc03_os_index up/up 192.172.73.202/24 CustCluster2
            nfs_vm_gp_ev_dc03_vault up/up 192.172.73.205/24 CustCluster4
            nfs_vm_gp_ev_dc04_os_index up/up 192.172.73.197/24 CustCluster2
            nfs_vm_gp_ev_dc04_vault up/up 192.172.73.206/24 CustCluster4
            nfs_vm_msdp_cc_nbd_02 up/up 192.172.73.198/24 CustCluster4
            nfs_vm_msdp_cc_nbd_03 up/up 192.172.73.200/24 CustCluster4
            nfs_vm_nfs_wa up/up   192.172.73.196/24    CustCluster2 
64 entries were displayed.

CustCluster::>

I would like to know across all vservers, what addresses are actually used. It would be great to sort this list by the address field. The command is simply this: net int show -sort-by address

CustCluster::*> network interface show -sort-by address

            Logical    Status     Network            Current       
Vserver     Interface  Admin/Oper Address/Mask       Node          
----------- ---------- ---------- ------------------ ------------- 
CustZServSVM01
            nfs_D1       up/up    192.172.73.192/24    CustCluster1 
            nfs_D2       up/up    192.172.73.193/24    CustCluster2 
            nfs_D3       up/up    192.172.73.194/24    CustCluster3 
            nfs_D4       up/up    192.172.73.195/24    CustCluster4 
            nfs_vm_nfs_wa up/up   192.172.73.196/24    CustCluster2 
            nfs_vm_gp_ev_dc04_os_index up/up 192.172.73.197/24 CustCluster2
            nfs_vm_msdp_cc_nbd_02 up/up 192.172.73.198/24 CustCluster4
CustSVM02
            nfs_ISO      up/up    192.172.73.199/24    CustCluster3 
CustZServSVM01
            nfs_vm_msdp_cc_nbd_03 up/up 192.172.73.200/24 CustCluster4
            nfs_vm_gp_ev_dc02_os_index up/up 192.172.73.201/24 CustCluster1
            nfs_vm_gp_ev_dc03_os_index up/up 192.172.73.202/24 CustCluster2
            nfs_vm_gp_ev_dc01_os_index up/up 192.172.73.203/24 CustCluster1
            nfs_vm_gp_ev_dc01_vault up/up 192.172.73.204/24 CustCluster4
            nfs_vm_gp_ev_dc03_vault up/up 192.172.73.205/24 CustCluster4
            nfs_vm_gp_ev_dc04_vault up/up 192.172.73.206/24 CustCluster4
            nfs_vm_gp_ev_dc02_vault up/up 192.172.73.207/24 CustCluster4
            iscsi1_D1_e0e up/up   192.172.74.192/24    CustCluster1 
            iscsi1_D2_e0e up/up   192.172.74.193/24    CustCluster2 
            iscsi1_D3_e0e up/up   192.172.74.194/24    CustCluster3 
            iscsi1_D4_e0e up/up   192.172.74.195/24    CustCluster4 
            iscsi2_D1_e0f up/up   192.172.75.192/24    CustCluster1 
            iscsi2_D2_e0f up/up   192.172.75.193/24    CustCluster2 
            iscsi2_D3_e0f up/up   192.172.75.194/24    CustCluster3 
            iscsi2_D4_e0f up/up   192.172.75.195/24    CustCluster4 
CustSVM01
            cifs_ediscovery up/up 192.172.76.192/24    CustCluster4 
CustSVM02
            cifs_cust     up/up    192.172.76.193/24    CustCluster3 
            cifs_cust_utility up/up 192.172.76.194/24   CustCluster3 
            cifs_cust_COPS_G up/up 192.172.76.195/24    CustCluster3 
            cifs_cust_PST_archive up/up 192.172.76.196/24 CustCluster3
            cifs_cust_archive up/up 192.172.76.197/24   CustCluster3 
            cifs_cust_auditlogs up/up 192.172.76.198/24 CustCluster3 
            cifs_cust_citrix up/up 192.172.76.199/24    CustCluster3 
            cifs_cust_g_drive up/up 192.172.76.200/24   CustCluster3 
            cifs_cust_nya_public up/up 192.172.76.201/24 CustCluster3
            cifs_cust_ovw up/up    192.172.76.202/24    CustCluster3 
            cifs_cust_software up/up 192.172.76.203/24  CustCluster3 
            cifs_cust_sqlbackups up/up 192.172.76.204/24 CustCluster3
            cifs_cust_support up/up 192.172.76.205/24   CustCluster3 
            cifs_cust_wtr up/up    192.172.76.206/24    CustCluster3 
            cifs_eaprofile up/up  192.172.76.207/24    CustCluster1 
CustZServSVM01
            cifs_fsw_gp_evclu_dc01 up/up 192.172.76.208/24 CustCluster4
CustSVM02
            tmp_nbu      up/up    192.172.76.209/24    CustCluster1 
CustZServSVM01
            cifs_AT_PAASDB_DC00 up/up 192.172.76.210/24 CustCluster4
            cifs_fsw_jd1sw_ssawcl up/up 192.172.76.214/24 CustCluster4
CustZCluster
            mgmt_cluster up/up    192.172.239.192/24   CustCluster1 
            mgmt_D1      up/up    192.172.239.193/24   CustCluster1 
            mgmt_D2      up/up    192.172.239.194/24   CustCluster2 
            mgmt_D3      up/up    192.172.239.195/24   CustCluster3 
            mgmt_D4      up/up    192.172.239.196/24   CustCluster4 
CustSVM01
            mgmt_svm     up/up    192.172.239.205/24   CustCluster4 
CustSVM02
            mgmt_svm     up/up    192.172.239.206/24   CustCluster4 
CustZServSVM01
            mgmt_svm     up/up    192.172.239.207/24   CustCluster4 
CustZCluster
            snapmirror_D1 up/up   192.172.239.240/24   CustCluster1 
            snapmirror_D2 up/up   192.172.239.241/24   CustCluster2 
            snapmirror_D3 up/up   192.172.239.242/24   CustCluster3 
            snapmirror_D4 up/up   192.172.239.243/24   CustCluster4 
Cluster
            CustCluster1_clus1 up/up 169.254.5.203/16 CustCluster1
            CustCluster2_clus2 up/up 169.254.18.84/16 CustCluster2
            CustCluster4_clus1 up/up 169.254.33.135/16 CustCluster4
            CustCluster1_clus2 up/up 169.254.91.160/16 CustCluster1
            CustCluster4_clus2 up/up 169.254.142.203/16 CustCluster4
            CustCluster3_clus1 up/up 169.254.158.204/16 CustCluster3
            CustCluster2_clus1 up/up 169.254.196.87/16 CustCluster2
            CustCluster3_clus2 up/up 169.254.243.163/16 CustCluster3
64 entries were displayed.

CustCluster::*>

Notice the LIFs are now sorted in Address order! SVM names may be listed multiple times as the list, sorted by address jockeys back and forth between SVMs. Well, lets’ clean this up a little more. I want the same data but sorted by SVM first, then by address. That command would be: network interface show -sort-by vserver,address . Here is the actual output with the same data:

CustCluster::*> network interface show -sort-by vserver,address

            Logical    Status     Network            Current       
Vserver     Interface  Admin/Oper Address/Mask       Node          
----------- ---------- ---------- ------------------ ------------- 
Cluster
            CustCluster1_clus1 up/up 169.254.5.203/16 CustCluster1
            CustCluster2_clus2 up/up 169.254.18.84/16 CustCluster2
            CustCluster4_clus1 up/up 169.254.33.135/16 CustCluster4
            CustCluster1_clus2 up/up 169.254.91.160/16 CustCluster1
            CustCluster4_clus2 up/up 169.254.142.203/16 CustCluster4
            CustCluster3_clus1 up/up 169.254.158.204/16 CustCluster3
            CustCluster2_clus1 up/up 169.254.196.87/16 CustCluster2
            CustCluster3_clus2 up/up 169.254.243.163/16 CustCluster3
CustSVM01
            cifs_ediscovery up/up 192.172.76.192/24    CustCluster4 
            mgmt_svm     up/up    192.172.239.205/24   CustCluster4 
CustSVM02
            nfs_ISO      up/up    192.172.73.199/24    CustCluster3 
            cifs_cust     up/up    192.172.76.193/24    CustCluster3 
            cifs_cust_utility up/up 192.172.76.194/24   CustCluster3 
            cifs_cust_COPS_G up/up 192.172.76.195/24    CustCluster3 
            cifs_cust_PST_archive up/up 192.172.76.196/24 CustCluster3
            cifs_cust_archive up/up 192.172.76.197/24   CustCluster3 
            cifs_cust_auditlogs up/up 192.172.76.198/24 CustCluster3 
            cifs_cust_citrix up/up 192.172.76.199/24    CustCluster3 
            cifs_cust_g_drive up/up 192.172.76.200/24   CustCluster3 
            cifs_cust_nya_public up/up 192.172.76.201/24 CustCluster3
            cifs_cust_ovw up/up    192.172.76.202/24    CustCluster3 
            cifs_cust_software up/up 192.172.76.203/24  CustCluster3 
            cifs_cust_sqlbackups up/up 192.172.76.204/24 CustCluster3
            cifs_cust_support up/up 192.172.76.205/24   CustCluster3 
            cifs_cust_wtr up/up    192.172.76.206/24    CustCluster3 
            cifs_eaprofile up/up  192.172.76.207/24    CustCluster1 
            tmp_nbu      up/up    192.172.76.209/24    CustCluster1 
            mgmt_svm     up/up    192.172.239.206/24   CustCluster4 
CustZCluster
            mgmt_cluster up/up    192.172.239.192/24   CustCluster1 
            mgmt_D1      up/up    192.172.239.193/24   CustCluster1 
            mgmt_D2      up/up    192.172.239.194/24   CustCluster2 
            mgmt_D3      up/up    192.172.239.195/24   CustCluster3 
            mgmt_D4      up/up    192.172.239.196/24   CustCluster4 
            snapmirror_D1 up/up   192.172.239.240/24   CustCluster1 
            snapmirror_D2 up/up   192.172.239.241/24   CustCluster2 
            snapmirror_D3 up/up   192.172.239.242/24   CustCluster3 
            snapmirror_D4 up/up   192.172.239.243/24   CustCluster4 
CustZServSVM01
            nfs_D1       up/up    192.172.73.192/24    CustCluster1 
            nfs_D2       up/up    192.172.73.193/24    CustCluster2 
            nfs_D3       up/up    192.172.73.194/24    CustCluster3 
            nfs_D4       up/up    192.172.73.195/24    CustCluster4 
            nfs_vm_nfs_wa up/up   192.172.73.196/24    CustCluster2 
            nfs_vm_gp_ev_dc04_os_index up/up 192.172.73.197/24 CustCluster2
            nfs_vm_msdp_cc_nbd_02 up/up 192.172.73.198/24 CustCluster4
            nfs_vm_msdp_cc_nbd_03 up/up 192.172.73.200/24 CustCluster4
            nfs_vm_gp_ev_dc02_os_index up/up 192.172.73.201/24 CustCluster1
            nfs_vm_gp_ev_dc03_os_index up/up 192.172.73.202/24 CustCluster2
            nfs_vm_gp_ev_dc01_os_index up/up 192.172.73.203/24 CustCluster1
            nfs_vm_gp_ev_dc01_vault up/up 192.172.73.204/24 CustCluster4
            nfs_vm_gp_ev_dc03_vault up/up 192.172.73.205/24 CustCluster4
            nfs_vm_gp_ev_dc04_vault up/up 192.172.73.206/24 CustCluster4
            nfs_vm_gp_ev_dc02_vault up/up 192.172.73.207/24 CustCluster4
            iscsi1_D1_e0e up/up   192.172.74.192/24    CustCluster1 
            iscsi1_D2_e0e up/up   192.172.74.193/24    CustCluster2 
            iscsi1_D3_e0e up/up   192.172.74.194/24    CustCluster3 
            iscsi1_D4_e0e up/up   192.172.74.195/24    CustCluster4 
            iscsi2_D1_e0f up/up   192.172.75.192/24    CustCluster1 
            iscsi2_D2_e0f up/up   192.172.75.193/24    CustCluster2 
            iscsi2_D3_e0f up/up   192.172.75.194/24    CustCluster3 
            iscsi2_D4_e0f up/up   192.172.75.195/24    CustCluster4 
            cifs_fsw_gp_evclu_dc01 up/up 192.172.76.208/24 CustCluster4
            cifs_AT_PAASDB_DC00 up/up 192.172.76.210/24 CustCluster4
            cifs_fsw_jd1sw_ssawcl up/up 192.172.76.214/24 CustCluster4
            mgmt_svm     up/up    192.172.239.207/24   CustCluster4  
64 entries were displayed.

CustCluster::*>

All done?

When you are finished, do not forget to either logout, or set your privilege mode back to admin: set -privilege admin

Setting up an easy TFTP Server

Configuration management of network switches is usually done with TFTP (Trivial File Transfer Protocol) or SCP (Secure CoPy, a copy mechanism running of SSH). This short post will show how to easily set up a TFTP server that can be used to transfer files to or from network devices supporting TFTP.

One of the easiest TFTP servers I have found to setup and use is made and distributed by SolarWinds. The Free software is located HERE.

Download the software. Install it. It only takes a few seconds. This software *is not* a Windows Service like many others I have seen. My use cases typically need it on rare occasions and I do not need the service running in the background. Just run the application when it is needed.

One important item to note: If you are using the Windows Firewall, you will either need to disable it or allow port 69 UDP through. That is the port TFTP communicates on.

After the software is installed (and you have made any changes to Windows Firewall), start the software. It will start-up a small window indicating the base path to the TFTP file directory in the lower left corner. If the Server started correctly and bound to port 69, it will be indicated in the main window and at the lower right corner:

TFTP1

To change the configuration of the TFTP Server, click File -> Configure. This will pop open the configuration for the TFTP server. Here you can start or stop the server, allow/not allow the TFTP server in the Windows System Tray, modify time-outs and point to where the CHROOT-ed storage is. On the other tabs, you can also manipulate the IP address bindings for TFTP, whether clients can send files, receive files or both and you can set restrictions on what IP addresses are allowed to send/receive (or just allow all). You can also change the default language of the Application if you wish.

TFTP3TFTP4

TFTP5

As files are sent or received, they will be logged on the main window. To stop the server, either use the configuration tool or simply kill the application.

Configuring a NetApp Branded CN1610 ClusterNet Switch (FASTPATH 1.2.0.7 / RCF 1.2)

NetApp has lots Knowledge Base articles to help configure these switches. I wanted to put a blog post together that arranges all info in one place that is easy to read. As delivered, the switch login is “admin” with an empty password (just hit enter!)

First, we need to get the switch on the network. Connect to the serial port (9800/N/8/1).

Login with username admin followed by “enter” twice (no password yet).

Enter privileged mode by typing “enable” followed by “enter” twice (no password yet).

Setup the “Service” port:

serviceport ip

Example:

(CN1610) #serviceport ip 192.168.99.10 255.255.255.0 192.168.99.1

Verify the service port:

(CN1610) #show serviceport
Interface Status............................... Up
IP Address..................................... 192.168.99.10
Subnet Mask.................................... 255.255.255.0
Default Gateway................................ 192.168.99.1
IPv6 Administrative Mode....................... Enabled
IPv6 Prefix is ................................ <masked>
Configured IPv4 Protocol....................... None
Configured IPv6 Protocol....................... None
IPv6 AutoConfig Mode........................... Disabled
Burned In MAC Address.......................... <masked>

Ping the Gateway:

(CN1610) #ping 192.168.99.1

More than likely, you will need to update the FASTPATH code. To do that you need a SCP or TFTP server(see another post about this).

It is best to copy the current running firmware to the backup on the switch (although, if needed, the software can be downloaded from the NetApp Support Page for the CN1610:

(CN1610) #copy active backup

You will need to confirm by pressing “y” and nothing else. Once that finishes, copy the image from your TFTP server to the active image:

(CN1610) #copy tftp:///image.stk active

The current images are 1.2.0.7 (with RCF 1.2) and 1.1.0.8 (with RCF 1.1)  located at on this NetApp Support Page . Always verify version information with the NetApp Interoperability Matrix Tool (IMT).

Verify the boot image:

(CN1610) #show bootvar

Image Descriptions

 active :
 backup :


 Images currently available on Flash

 ---- ---------- ---------- ----------------- -----------------
 unit     active     backup    current-active       next-active
 ---- ---------- ---------- ----------------- -----------------

    1    1.2.0.7    1.1.0.8           1.1.0.8           1.2.0.7

Reboot the switch:

(CN1610) #reload

When the switch finishes rebooting, create a “running-config.scr” file:

(CN1610) #show running-config running-config.scr

Place a backup copy off the switch and on the TFTP server. I like to add more to the off-switch name to make it easy to identify:

(CN1610) #copy nvram:script running-config.scr tftp:///switch01-running-config.scr

Copy the appropriate RCF to your switch:

(CN1610) #copy tftp:///CN1610_CS_RCF_v1.2.scr nvram:script CN1610_RCF_v1.2.scr

Verify it made it on the switch:

(CN1610) #script list

Configuration Script Name        Size(Bytes)
-------------------------------- -----------
CN1610_CS_RCF_v1.0.scr                  2149
CN1610_CS_RCF_v1.1.scr                  2169
CN1610_CS_RCF_v1.2.scr                  2225
running-config.scr                      3648

Validate the script:

(CN1610) #script validate CN1610_RCF_v1.2.scr

That will print each line and validate the script. If any commands are wrong or do not apply to the current FASTPATH version, the validation will indicate the line number where the issue(s) occurred.

Apply the script:

(CN1610) #script apply CN1610_RCF_v1.2.scr

This will also print out each line in the script and notify that it was successful. Save the in memory running.

(CN1610) #write mem

Check out the running configuration:

(CN1610) #show running-config

Set the passwords for standard and privilege mode:

(CN1610) #password

If this is a new switch, there is no password; just hit enter. If you already assigned a password, enter the password at the prompt. Followup with the new password and then confirm the new password.

Enter enable mode and set the password:

(CN1610) #enable

(The Enable password should be empty so press enter. If not enter current password)

(CN1610) #enable password

If this is a new switch, there is no enable password; just hit enter. If you already assigned an enable password, enter the password at the prompt. Followup with the new password and then confirm the new password.)

Save the running configuration:

(CN1610) #write mem

Reboot the switch:

(CN1610) #reload

Here are the commands to customize your configuration. All lines beginning with the “!” will be ignored by the switch. It is safe to copy/paste those lines without worry of error. Modify to fit your site as needed:

 

!Set the switch Hostname:
 hostname "clusterswitch01"
!Setup and configure the Serviceport for external IP access:
 serviceport protocol none
 serviceport ip 192.168.99.10 255.255.255.0 192.168.99.1
!Setup SSH version 2, generating RSA/DSA keys
 ip ssh protocol 2
 configure
   crypto key generate dsa
   crypto key generate rsa
 exit
!Enable the SSH Server
 ip ssh server enable
!Setup and configure the date and time
!set today's date
 configure 
   clock set 08/08/2016
!Set today's Time in UTC!
   clock set 09:30:00
!Set the clock Timezone and Summer time
   clock summer-time recurring USA offset 60 zone "EDT"
   clock timezone -5 minutes 0 zone "EST"
!Setup NTP to client mode
   sntp client mode unicast
   sntp client port 123
   sntp server "ntp1.example.com"
   sntp server "ntp2.example.com"
!Setup DNS
   ip domain name "my.example.com"
   ip name server 192.168.99.200 192.168.99.202
   ip domain lookup
!Setup Logging and email
!Persistent logging to NOTICE(4)
   logging persistent 4
!Send logs to email
   logging email
!Non-urgent email logging configuration really
!    indicates a digest email notification. The 
!    frequency of the email digest is determined
!    with the "Email Alert Notification Period". 
!    Since this email type is not necessarily an 
!    alert type email, consider setting the 
!    frequency to the highest interval of 1440 
!    minutes (every 24 hours). In other words, the
!    non-urgent digest style combines all the 
!    non-urgent switch events in a single email.
   logging email logtime 1440
!Send Severity type WARNING(2) to email
   logging email 3
!This command sets the lowest severity level at which log messages are
!    emailed immediately in a single email message.
!Setting to ERROR(2) 
   logging email urgent 2
!Where is this email coming from?
   logging email from-addr clusterswitch01@example.com
!Where to send URGENT emails?
   logging email message-type urgent to-addr pager@example.com
!where to send non-urgent emails?
   logging email message-type non-urgent to-addr pager@example.com
!Subjects for Urget and Non-Urgent emails
   logging email message-type urgent subject "Urgent NetAppCluster in Site XY Cluster-Interconnect Switch 01 Notification"
   logging email message-type non-urgent subject "NetAppCluster in Site XY Cluster-Interconnect Switch 01 Error Log Digest"
!What is the name or IP of my mailserver
   mail-server "smtp.example.com"
   exit
  !Log all CLI commands
   logging cli-command
!Turn off paginiation for Console
   line console
   length 0
   exit
!Turn off paginiation for SSH
   line ssh
   length 0
   exit
 exit

Now that you have the configuration in place, save it and upload it to your TFTP server:

(clusterswitch01) #write mem

This operation may take a few minutes.
Management interfaces will not be available during this time.

Are you sure you want to save? (y/n) y

Config file 'startup-config' created successfully .


Configuration Saved!

(clusterswitch01) #show running-config running-config.scr

Config script created successfully.

(clusterswitch01) #copy nvram:script running-config.scr tftp://tftpserver/clusterswitch01.scr

Mode........................................... TFTP
Set Server IP.................................. 192.168.99.9
Path........................................... ./
Filename....................................... clusterswitch01.scr
Data Type...................................... Config Script
Source Filename................................ running-config.scr

Management access will be blocked for the duration of the transfer
Are you sure you want to start? (y/n) y

File transfer operation completed successfully.

Reference Links (warning, some links are only accessible to NetApp and Partners)

Unable to locate how to enable SSH for the CN1610 switches – 2018779
Unable to ping CN1610; however, ‘show network’ displays the correct IP address
OEM: How to configure the 10Gb NetApp CN1610 clustered Data ONTAP switch
How to configure e-mail alerts for CN1610 and CN1601
How to configure NTP services on the cluster interconnect switch CN1610
How to transfer firmware or script files to a NetApp CN1610 firmware using SCP
INTERNAL: How to disable SSH V1 on CN1610 cluster switches
How to configure SNMP Community String in Cluster Interconnect Switch CN1601/CN1610
How to disable telnet on a NetApp CN1610 switch
INTERNAL: How to configure TACACS

 

OnCommand Unified Manager and OnCommand Performance Manager -> Fully Integrated? Mostly.

Working at a customer site on residency just outside of Baltimore, MD. We have installed and implemented OnCommand Unified Manager 6.4RC1(OCUM) and OnCommand Performance Manager 2.1RC1(OCPM) utilizing the Full Integration feature found in these two products at this release and moving forward. The vApp/ESXi versions were used here, but I suspect using other variants will likely produce similar results.

After the installation, it was determined that the email address for the “admin” account needed to change. Figured I would just go into the GUI and modify the admin email address.

After doing this, anything that was OCUM related got the update. This was also verified on the maintenance console of OCUM:

root@OCUM:/home/diag# mysql -e " select id,name,emailAddress from ocum.authorizationunit;"
+------+--------------+----------------------+
| id   | name         | emailAddress         |
+------+--------------+----------------------+
|    1 | admin        | goodemail99@cust.com |
|    2 | ocpm         | nowhere@cust.com     |
|    3 | Cloud-Admins | NULL                 |
|    4 | RAD-NetOps   | NULL                 |
|    6 | RAD-Archive  | NULL                 |
|    7 | tmccar14     | tmac@netapp.com      |
|  100 | cliadmin     | cliadmin@netapp.com  |
| 1001 | tmac         | NULL                 |
+------+--------------+----------------------+

When we looked on OCPM for something similar, we found this:

root@OCPM:/home/diag# mysql -e " select id,name,emailAddress from ocf.authorizationunit;"
+------+-------+-------------------+
| id   | name  | emailAddress      |
+------+-------+-------------------+
|    1 | admin | bademail@cust.com |
| 1002 | tmac  | NULL              |
+------+-------+-------------------+

Currently, the only way to *fix* this is by enabling the diagnostic user and logging into the maintenance console. (I will not be enabling how to do that here, consult NetApp Tech Support if you really need to do this!). After you are on the maintenance console, I was instructed to use this command to fix the database:

root@OCPM:/home/diag# mysql -e "update ocf.authorizationunit set emailAddress='goodemail99@cust.com' where id=1;"

Re-running the command above showed the updated info:

root@OCPM:/home/diag# mysql -e " select id,name,emailAddress from ocf.authorizationunit;"
+------+-------+----------------------+
| id   | name  | emailAddress         |
+------+-------+----------------------+
| 1    | admin | goodemail99@cust.com |
| 1002 | tmac  | NULL                 |
+------+-------+----------------------+


A bug has been opened to learn about this behavior. Hopefully, they will be able to fix this minor little issue soon.

Full Integration of OnCommand Unified Manager and Performance Manager

NetApp has recently released a “full integration” of the two core Clustered Data ONTAP monitoring products, OnCommand Unified Manager (vsphere version link) and OnCommand Performance Manager (vsphere version link).

ocumWhat does this mean?

Historically, when using these two products, you would need to setup each individually and mange each individually. With the “full integration” release, you still perform a basic setup on both. If using HTTPS Certificates generated by your own Certificate Authority, generate the signing requests, get and install the certificates and then, following the documentation, configure the “full integration” on the maintenance console of the performance manager. After a few minutes, you are presented with an updated single management pane through the OnCommand Unified Manger. Nearly all configuration options that apply to one, will apply to the other as needed. In fact, the GUI to OnCommand Performance Manager is now gone as a stand alone product (hitting the OCPM IP address with a browser no longer works) when full integration is used.

Partial Integration is what the application used in prior releases and is still a viable option. The preferred method moving forward is the Full Integration.

 

 

 

Power Supplies causing other issues? Really!

ds4246-2tb-2

 

So, I have recently been involved in a couple of cases regarding power supplies. Back in October I was asked to come to a site during a maintenance windows to see about fixing a problem that won’t seem to go away.

Case #1:

This first case had the following symptoms:

  • The IOM3-B module appeared quasi-online. It was there, but not quite.
    • Firmware updates did not work. Resetting/re-seating did not do much.
  • The DS4246 shelf would not allow the shelf ID to be set.
  • I am sure there were other un-diagnosed issues, but these two were most obvious

NetApp was baffled. I asked for and received a whole new shelf, two Power Supply modules and two IOM3 modules to basically have everything on hand to fix whatever the problem could be. This had been festering for a few weeks. The customer and NetApp Support simply wanted this fixed.

During our outage, the first thing we did was eliminate the shelf. We moved all disks, Power Supplies and IOMs over to the new shelf and powered it on. The Shelf ID LED would not come on….at all. Mmm? Ok. Swap the IOM3’s for the new ones. Still nothing! Swap the Power Supplies. Ah HA! The Shelf ID light came on.

To further isolate, we ended up shuffling the Power Supplies around further finding that there was one bad Power Supply that was causing significant problems. When it was in *any* shelf, problems followed. Remove the Power Supply and the problems disappear.

After looking at older ASUP’s it is likely we might have been able to deduce a bad power supply, but the details were in a less commonly used section of the environment output.

Case #2:

This second case had the following symptoms:

  • Upon performing A-side / B-side power testing, according to the netapp environment command, both power supplies were now unknown!
  • Some / most of the drives powered down
  • after power-cycling the shelf (both power supplies) NONE of the drives would power up!

Here we tried a few things, power-cycling a few times, resetting the IOM6 modules. For this case, we removed ONE power supply (PSU #4, lower right from the back of the shelf perspective). As soon as that ONE power supply was removed, the drives started powering on.

This was very odd. Fortunately for me, after I got this rectified and that power supply replaced, my NetApp case owner just happened to be an Electrical Engineer! He was able to dive into the many AutoSupport (ASUP) messages and further determine that power supply #1 in the same shelf was also on the fritz and it should be replaced also.

He was able to deduce that voltages and amperage’s were not quite right and strongly recommended to replace that power supply #1…which we did.

The takeaway

Never discount the power supplies. Also, be careful when you pull them out if you suspect them. In my case number two, we did the A-side test and all appeared OK when power was restored. After the B-side test, that is when everything went nuts so I figured that was the place to start. In hind sight, I would also use the environmental commands to verify amperage and voltage among other items before pulling a power supply.