Thursday, May 15, 2014

Nexus5K - DCBX Troubleshooting between Nexus5K to Netapp direct connection via FCoE

Problem Description


VFC interface status shows the interface is stuck at initializing for the VSAN. All other config appears to be right, vsan is bound to vlan, vlan is allowed on the link. Fabric interconnects appear to be flogi'd in appropriately.
sh flogi database shows flogi's in from all ports connected to the FI, but no Netapp ports. Ports from the FI are NOT port-channeled neither are the ports to the NetApp.


System Info 

N5K-C5596UP
5.1(3)N2(1)


Simplified Topology 

UCS --- Nexus5K --vfc33 (Eth2/8) -- trunk -- Netapp


Background of FCoE

Fibre Channel over Ethernet (FCoE) provides a method of transporting Fibre Channel traffic over a physical Ethernet connection. FCoE requires that the underlying Ethernet be full duplex and provides lossless behavior for Fibre Channel traffic. So things we need to verify are as follows :

fcoe_FIP-ladder.png

Troubleshooting Steps

Customer states two symptoms. First symptom is VSAN is stuck in initialization phase.


1.  VSAN is stuck in initialization phase

n5k_sh_int_vfc.png


2. No FLOGI learning for Netapp CNA adapter

- Below is WWPN for Netapp CNA adapter. The Netapp filer has FC Portname with 50:0a:09:83:8f:e7:7b:99
netapp_cna.png

- run "sh flogi database" from N5K and it shows flogi's in from all ports connected to the FI, but no Netapp ports are shown here :

n5k_sh_flogi.png

3. FCF Discovery or VLAN Discovery 
- Go back to "FIP - Login Flow ladder" diagram above. We see that there is no FLOGI learning for NetApp CNA but we can understand it is ether due to FCP discovery failure or VLAN discovery failure.

- Check the configuration again. General step to configure FCoE is start from enabling FCoE feature. Map a VSAN onto a VLAN. Then create virtual Fibre Channel (vfc) interfaces.

- Below is what we see from the Nexus5K.
n5k_config_vfc_ethernet.png

- The config looks good and we can see that vfc33 allows vsan 402 and bound to Ethernet2/8. However,  the interface vfc33 complains Trunk VSAN is not yet to be UP. 
n5k_sh_int_vfc.png


4. Trunk VSANS stuck at Initializing phase

- This indicates there is some issue in DCBX negotiationn but still it is not really clear what actually went wrong.

- Verify VLAN FCOE
sh_vlan_fcoe.png

5. Check DCBX negotiation


- DCBX is Data Center Bridge eXchange protocol. The FCoE switch and CNA adapter exchange capability information and configuration values via DCBX. 
- DCBX runs on the physical ethernet link between Nexus5K and CNA
- It uses LLDP as its transport layer to carry DCBX packets between DCBX capable devices

- From the DCBX spec,  PFC Feature is described on page 29 to 30.  This is a 16 bits structure, with 8 bits as Priority, followed by 8 bits as Number of Traffic Classes supported. For N5K, value is 0808.


sh_lldp_dcbx.png

- From above output we do not see 0808 for the Type 003 in local device (i.e. N5K). However, Netapp CNA adapter is good with Type 003. So in here at least we know this is not Netapp CNA issue. Something is not right in Nexus5K.



6. Further deep dive into DCBX output to understand what is really failing.

- Run "sh sys int dcbx info interface e2/8" and look for an error. This is fairly lengthy output so you have to read it carefully. Especially around "error" line. So try to use | grep error and quickly verify if there is any error. Then review the complete output.

- In this sample case, we do see some error in RED colour words in below output. This tells that there is some issue with PFC configuration. 


sh_sys_int_dcbx_info.png

Step 7. Check QoS config for PFC
- As the error indicate, we do see some problem with PFC. Generally speaking Nexus 5000 Series uses PFC to establish a lossless medium for the Fibre Channel payload in the FCoE implementation.

- Type "show ipqos" to verify PFC config part in Nexus5K.

sh_ipqos_new.png

- We can see that system Qos in yellow box does not have FCoE PFC.

- Add a following commands for FCoE :
add_PFC_qos_new.png

- As soon as we add above cmd, we can see some change in the interface status.

vfc_up_log.png

- Check "sh lldp dcbx interface e2/8" again and vrify "type 003" in local device (Nexus5k)

sh_lldp_dcbx_2_new.png

Note : Refer back to Step 5 explanation. From the DCBX spec,  PFC Feature is described on page 29 to 30.  This is a 16 bits structure, with 8 bits as Priority, followed by 8 bits as Number of Traffic Classes supported. For N5K, value is 0808.


- Check VFC interface again. Trunk VLAN state looks good now.

sh_int_vfc_2_new.png

- Lastly check the flogi database. We do see Netapp CNA port WWPN.

sh_flogi_data_2_new.png



Thursday, May 8, 2014

VSG deployment and Integration with VSM VNMC and vCenter

Introduction


Cisco Virtual Security Gateway (VSG) is a virtual firewall for Cisco Nexus 1000v Switches that delivers security and compliance for virtual computing environments. Cisco VSG uses virtual network service data path (vPath) technology embedded in the Cisco Nexus 1000V Series Virtual Ethernet Module (VEM). However, when you deploy the VSG, it can be overwhelm to understand which element is meant to interact with others. Also it can be huge obstacle when you troubleshoot any type of VSG issue.

So the purpose of this document is to explain the core components of VSG deployment and how they relates to each other. What needs to be configured and where it should be applied.

vsg_high_architecture.png




Solution Components

Virtual Network Management Center (VMNC)
- Cisco VNMC is a virtual appliance that provides centralized device and security policy management of the Cisco VSG.

Virtual Security Gateway (VSG)
- VSG operates with the Cisco Nexus 1000V Series distributed virtual switch in VMware vSphere hypervisor, and it uses the vPath embedded in the Nexus 1000V Series VEM.

Nexus1000V Switches
- Nexus 1000V Series Switches are virtual machine access switches that are an intelligent software switch implementation for VMware vSphere environments running the Cisco NX-OS Software operating system.

VMware vCenter
- VMware vCenter Server manages the vSphere environment and provides unified management of all the hosts and VMs in the data center from a single console.




Understanding of Communication between the devices

VNMC-to-vCenter Communication
- VNMC registers to vCenter to have visibility into the VMware environment. This allows the security administrator to define the policies based on the VMware VM attributes. VNMC integrates via an XML plug-in. The process is similar to the way the Cisco Nexus 1000V VSM integrates with vCenter. The communication between VNMC and vCenter takes place over a Secure Sockets Layer (SSL) connection on port 443


VNMC-to-VSG Communication
- VSG registers to VNMC via the policy agent configuration done on VSG. Once registered, VNMC pushes the security and device polices to VSG. No policy configuration is done via the VSG command-line interface (CLI) once it is registered to VNMC. The CLI is available to the administrator for monitoring and troubleshooting purposes. Communication between VSG and VNMC takes place over an SSL connection on port 443


VNMC-to-VSM Communication
- VSM registers to VNMC via the policy agent configuration done on VSM. The steps to register are similar to those for VSG-to-VNMC registration. Once registered, VSM will be able to send IP-to-VM binding to VNMC. IP-to-VM mapping is required by the VSG for evaluating policies that are based on VM attributes. VSM also resolves the security-profile-id using VNMC. This security-profile-id is sent in every vPath packet to VSG and is used to identify the policy for evaluation. The communication between VSG and VNMC takes place over an SSL connection on port 443


VSG-to-VEM (vPATH) Communication
- VSG receives traffic from VEM when protection is enabled on a port profile. The redirection of the traffic occurs via vPath. vPath encapsulates the original packet with the VSG’s MAC address and sends it to VSG. VSG has a dedicated interface (Data 0). VEM uses this interface to attain the VSG’s MAC address by performing Address Resolution Protocol (ARP) to that IP address. Cisco VSG is required to be Layer 2 adjacent to vPath. The mechanism used for communication between vPath and VSG is similar to that used for communication between VEM and the Cisco Nexus 1000V Series on a packet VLAN. VSG evaluates policies on the first packet of each flow that is redirected by vPath. VSG then transmits the policy evaluation results to vPath. vPath maintains the result in the flow table, and subsequent packets of the flow are permitted or denied based on the result cached in the flow table



VSG Setup requirements
VSG uses three vNICs
- Management : VNMC talks to vCenter, VSM, VSG via management VLAN.
- HA : Its' own VLAN is recommended.
- Data : N1K vPath and VSG communicate over this VLAN.

Installation and Initial Setup
1. Install the VNMC as a virtual appliance
2. Install the VSG as a virtual appliance
3. Register VSG to VNMC
vsg_vnm-pa.png

4. Register VSM to VNMC
vsm_vnm-pa.png

5. Register VNMC to vCenter


vCenter_vnmc_vsm.png


At VSM

1. Login to the VSM

2. Configure "port-profile". In this example, vsg_pp_tenant-anam" is the new port-profile we will use traffic redirection to VN service. This new port-profile should be seen from vCenter when you configure "Network Connection".

port-profile.png

3. Configure "vservice node". In this example "an-vsg" is the vservice mode name and service type is "VSG".

vservice.png


At vCenter

1. Login to vCenter and verify if this new port-profile is visible.

vCenter_port-profile.png


At VNMC

1. Login to VNMC


2. If your VSG is properly configured to talk to VNMC, you should be able to see the VSG under "Resource Management > Resources > Firewalls > All VSG". Confirm that the VSG shows up in this list. If it does not, resolve this issue by properly registering your VSG. In this example, VSG is shown as "an-vsg".





vnmc_an-vsg.png


Once VSG is properly registered as above, you are good to configure the security policies to control VM traffic.

UCS B - Disk failure Troubleshooting via IPMI

Introduction

This document describes several command-line interface (CLI) commands, as well as other troubleshooting techniques, that can help troubleshoot hard disk drive (HDD) issues in UCS B. In this document we discuss the fast and accurate method for troubleshooting HDD issues using IPMI sensor CLI output. Also post check up using show tech.



Troubleshooting Step


UCSM
-Start from UCSM and look for what Fault tab describes the failure
- Got to Equipment > Chassis > Server : Faults tab
- In below example, Local disk1 on server 4/1 complains drive fault.

ucsm_faults_tab.png



IPMI (Intelligent Platform Management Interface) sensor reading.
- You can check the status of the HDD from the IPMI sensor reading output. This method is very quick and useful when you do live troubleshooting. Here is the steps for how to.

Step1. connect to CIMC Debug Firmware Utility Shell

connect cimc <chassis/blade number>

 conn_cimc.png


Step2. type "sensors fault" as above and you now see disk status.

There are two Hard Disks in this case. Each disk has a different status. One is 0x2202 and the other 0x0101. 
What this means?


Code Interpretation :

Bit[15:10] - Unused
Bit[9:8]   - Fault
Bit[7:4]   – LED Color
Bit[3:0]   – LED State

Fault:
0x100 – On Line
0x200 - Degraded

LED Color:
0x10 – GREEN
0x20 – AMBER
0x40 – BLUE
0x80 – RED

LED State:
0x01 – OFF
0x02 – ON
0x04 – FAST BLINK 
0x08 – SLOW BLINK


Example :

0x0101
Fault : 0x100 indicates On line
LED status : 0x01 indicates OFF

So HDD1 is in normal state.

2. 0x2202   
Fault : 0x200 indicates Degraded
LED status: 0x02 indicates ON


So HDD2 is in degraded. This should be replaced. 


If you already have show tech, then look for a following output

1 "show fault" from sam_techsupportinfo

Severity: Major
Code: F0181
Last Transition Time: 2014-03-08T14:45:06.209
ID: 1263592
Status: None
Description: Local disk 1 on server 4/1 operability: inoperable. Reason: Firmware Detected Drive Fault    <<<<<----
Affected Object: sys/chassis-4/blade-1/board/storage-SAS-1/disk-1
Name: Storage Local Disk Inoperable
Cause: Equipment Inoperable
Type: Equipment
Acknowledged: No
Occurrences: 1
Creation Time: 2014-03-08T14:45:06.209
Original Severity: Major
Previous Severity: Major
Highest Severity: Major





2. obfl-log 

5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 38 02 00 00 01 02 00 00 3A 92 1A 53 20 00 04 0D 97 00 00 00 7F 01 FF FF # 238 | 03/08/2014 14:44:58 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED is on | Asserted
5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 39 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 05 FF FF # 239 | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED color is amber | Asserted
5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 3A 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 09 FF FF # 23a | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | Degraded | Asserted



3. sel log

 # 38 02 00 00 01 02 00 00 3A 92 1A 53 20 00 04 0D 97 00 00 00 7F 01 FF FF # 238 | 03/08/2014 14:44:58 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED is on | Asserted
 # 39 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 05 FF FF # 239 | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED color is amber | Asserted
 # 3A 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 09 FF FF # 23a | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | Degraded | Asserted



4. /var/log/message

5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 38 02 00 00 01 02 00 00 3A 92 1A 53 20 00 04 0D 97 00 00 00 7F 01 FF FF # 238 | 03/08/2014 14:44:58 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED is on | Asserted
5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 39 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 05 FF FF # 239 | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | LED color is amber | Asserted
5:2014 Mar  8 14:44:59:2.1(1a):selparser:-: selparser.c:667: # 3A 02 00 00 01 02 00 00 3B 92 1A 53 20 00 04 0D 97 00 00 00 7F 09 FF FF # 23a | 03/08/2014 14:44:59 | CIMC | Drive slot(Bay) HDD0_INFO #0x97 | Degraded | Asserted