Thursday, May 15, 2014

Nexus5K - DCBX Troubleshooting between Nexus5K to Netapp direct connection via FCoE

Problem Description


VFC interface status shows the interface is stuck at initializing for the VSAN. All other config appears to be right, vsan is bound to vlan, vlan is allowed on the link. Fabric interconnects appear to be flogi'd in appropriately.
sh flogi database shows flogi's in from all ports connected to the FI, but no Netapp ports. Ports from the FI are NOT port-channeled neither are the ports to the NetApp.


System Info 

N5K-C5596UP
5.1(3)N2(1)


Simplified Topology 

UCS --- Nexus5K --vfc33 (Eth2/8) -- trunk -- Netapp


Background of FCoE

Fibre Channel over Ethernet (FCoE) provides a method of transporting Fibre Channel traffic over a physical Ethernet connection. FCoE requires that the underlying Ethernet be full duplex and provides lossless behavior for Fibre Channel traffic. So things we need to verify are as follows :

fcoe_FIP-ladder.png

Troubleshooting Steps

Customer states two symptoms. First symptom is VSAN is stuck in initialization phase.


1.  VSAN is stuck in initialization phase

n5k_sh_int_vfc.png


2. No FLOGI learning for Netapp CNA adapter

- Below is WWPN for Netapp CNA adapter. The Netapp filer has FC Portname with 50:0a:09:83:8f:e7:7b:99
netapp_cna.png

- run "sh flogi database" from N5K and it shows flogi's in from all ports connected to the FI, but no Netapp ports are shown here :

n5k_sh_flogi.png

3. FCF Discovery or VLAN Discovery 
- Go back to "FIP - Login Flow ladder" diagram above. We see that there is no FLOGI learning for NetApp CNA but we can understand it is ether due to FCP discovery failure or VLAN discovery failure.

- Check the configuration again. General step to configure FCoE is start from enabling FCoE feature. Map a VSAN onto a VLAN. Then create virtual Fibre Channel (vfc) interfaces.

- Below is what we see from the Nexus5K.
n5k_config_vfc_ethernet.png

- The config looks good and we can see that vfc33 allows vsan 402 and bound to Ethernet2/8. However,  the interface vfc33 complains Trunk VSAN is not yet to be UP. 
n5k_sh_int_vfc.png


4. Trunk VSANS stuck at Initializing phase

- This indicates there is some issue in DCBX negotiationn but still it is not really clear what actually went wrong.

- Verify VLAN FCOE
sh_vlan_fcoe.png

5. Check DCBX negotiation


- DCBX is Data Center Bridge eXchange protocol. The FCoE switch and CNA adapter exchange capability information and configuration values via DCBX. 
- DCBX runs on the physical ethernet link between Nexus5K and CNA
- It uses LLDP as its transport layer to carry DCBX packets between DCBX capable devices

- From the DCBX spec,  PFC Feature is described on page 29 to 30.  This is a 16 bits structure, with 8 bits as Priority, followed by 8 bits as Number of Traffic Classes supported. For N5K, value is 0808.


sh_lldp_dcbx.png

- From above output we do not see 0808 for the Type 003 in local device (i.e. N5K). However, Netapp CNA adapter is good with Type 003. So in here at least we know this is not Netapp CNA issue. Something is not right in Nexus5K.



6. Further deep dive into DCBX output to understand what is really failing.

- Run "sh sys int dcbx info interface e2/8" and look for an error. This is fairly lengthy output so you have to read it carefully. Especially around "error" line. So try to use | grep error and quickly verify if there is any error. Then review the complete output.

- In this sample case, we do see some error in RED colour words in below output. This tells that there is some issue with PFC configuration. 


sh_sys_int_dcbx_info.png

Step 7. Check QoS config for PFC
- As the error indicate, we do see some problem with PFC. Generally speaking Nexus 5000 Series uses PFC to establish a lossless medium for the Fibre Channel payload in the FCoE implementation.

- Type "show ipqos" to verify PFC config part in Nexus5K.

sh_ipqos_new.png

- We can see that system Qos in yellow box does not have FCoE PFC.

- Add a following commands for FCoE :
add_PFC_qos_new.png

- As soon as we add above cmd, we can see some change in the interface status.

vfc_up_log.png

- Check "sh lldp dcbx interface e2/8" again and vrify "type 003" in local device (Nexus5k)

sh_lldp_dcbx_2_new.png

Note : Refer back to Step 5 explanation. From the DCBX spec,  PFC Feature is described on page 29 to 30.  This is a 16 bits structure, with 8 bits as Priority, followed by 8 bits as Number of Traffic Classes supported. For N5K, value is 0808.


- Check VFC interface again. Trunk VLAN state looks good now.

sh_int_vfc_2_new.png

- Lastly check the flogi database. We do see Netapp CNA port WWPN.

sh_flogi_data_2_new.png



No comments:

Post a Comment