osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Openstack] DHCP not accessible on new compute node.


So I did further ping tests and explored differences between my working compute nodes and my non-working compute node. Firstly, it seems that the VXLAN is working between the nonworking compute node and controller nodes. After manually setting IP addresses, I can ping from an instance on the non working node to 172.16.1.1 (neutron gateway); when running tcpdump I can see icmp on:
-compute's bridge interface
-compute's vxlan interface
-controller's vxlan interface
-controller's bridge interface
-controller's qrouter namespace

This behavior is expected and is the same for instances on the working compute nodes. However if I try to ping 172.16.1.2 (neutron dhcp) from an instance on the nonworking compute node, pings do not flow. If I use tcpdump to listen for pings I cannot hear any, even listening on the compute node itself; this includes listening on the vxlan, bridge, and the tap device directly. Once I try to ping in reverse, from the dhcp netns on the controller to the instance on the non-working compute node, pings begin to flow. The same is true for pings between the instance on the nonworking compute and an instance on the working compute. Pings do not flow, until the working instance pings. Once pings are flowing between the nonworking instance and neutron DHCP; I run dhclient on the instance and start listening for DHCP requests with tcpdump, and I hear them on:
-compute's bridge interface
-compute's vxlan interface
They don't make it to the controller node.

I've re-enabled l2-population on the controller's and rebooted them just in case, but the problem persists. A diff of /etc/ on all compute nodes shows that all openstack and networking related configuration is effectively identical. The last difference between the non-working compute node and the working compute nodes as far as I can tell, is that the new node has a different network card. The working nodes use "Broadcom Limited NetXtreme II BCM57712 10 Gigabit Ethernet" and the nonworking node uses a "NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter".

Are there any known issues with neutron and this brand of network adapter? I looked at the capabilities on both adapters and here are the differences:

Broadcom	 NetXen	  
 tx-tcp-ecn-segmentation: on	 tx-tcp-ecn-segmentation: off [fixed]	  
 rx-vlan-offload: on [fixed]	 rx-vlan-offload: off [fixed]	  
 receive-hashing: on	 receive-hashing: off [fixed]	  
 rx-vlan-filter: on	 rx-vlan-filter: off [fixed]	  
 tx-gre-segmentation: on	 tx-gre-segmentation: off [fixed]	  
 tx-gre-csum-segmentation: on	 tx-gre-csum-segmentation: off [fixed]	  
 tx-ipxip4-segmentation: on	 tx-ipxip4-segmentation: off [fixed]	  
 tx-udp_tnl-segmentation: on	 tx-udp_tnl-segmentation: off [fixed]	  
 tx-udp_tnl-csum-segmentation: on	 tx-udp_tnl-csum-segmentation: off [fixed]	  
 tx-gso-partial: on	 tx-gso-partial: off [fixed]	  
 loopback: off	 loopback: off [fixed]	  
 rx-udp_tunnel-port-offload: on	 rx-udp_tunnel-port-offload: off [fixed]	 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20181106/093cf0e5/attachment.html>