osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VXLAN and KVm experiences



On 11/14/18 6:25 PM, Simon Weller wrote:
> Wido,
> 
> 
> Here is the original document on the implemention for VXLAN in ACS
> - https://cwiki.apache.org/confluence/display/CLOUDSTACK/Linux+native+VXLAN+support+on+KVM+hypervisor
> 
> It may shed some light on the reasons for the different multicast groups.
> 

Yes, I see now. It is to prevent a single multicast group being flooded
with traffic for VNIs.

Thanks!

Wido

>  
> - Si
> 
> ------------------------------------------------------------------------
> *From:* Wido den Hollander <wido@xxxxxxxxx>
> *Sent:* Tuesday, November 13, 2018 4:40 AM
> *To:* dev@xxxxxxxxxxxxxxxxxxxxx; Simon Weller
> *Subject:* Re: VXLAN and KVm experiences
>  
> 
> 
> On 10/23/18 2:34 PM, Simon Weller wrote:
>> Linux native VXLAN uses multicast and each host has to participate in multicast in order to see the VXLAN networks. We haven't tried using PIM across a L3 boundary with ACS, although it will probably work fine.
>> 
>> Another option is to use a L3 VTEP, but right now there is no native support for that in CloudStack's VXLAN implementation, although we've thought about proposing it as feature.
>> 
> 
> Getting back to this I see CloudStack does this:
> 
> local mcastGrp="239.$(( ($vxlanId >> 16) % 256 )).$(( ($vxlanId >> 8) %
> 256 )).$(( $vxlanId % 256 ))"
> 
> VNI 1000 would use group 239.0.3.232 and VNI 1001 uses 239.0.3.233 1000.
> 
> Why are we using a different mcast group for every VNI? As the VNI is
> encoded in the packet this should just work in one group, right?
> 
> Because this way you need to configure all those groups on your
> Router(s) as each VNI will use a different Multicast Group.
> 
> I'm just looking for the reason why we have this different multicast groups.
> 
> I was thinking that we might want to add a option to agent.properties
> where we allow users to set a fixed Multicast group for all traffic.
> 
> Wido
> 
> [0]:
> https://github.com/apache/cloudstack/blob/master/scripts/vm/network/vnet/modifyvxlan.sh#L33
> 
> 
> 
>> 
>> ________________________________
>> From: Wido den Hollander <wido@xxxxxxxxx>
>> Sent: Tuesday, October 23, 2018 7:17 AM
>> To: dev@xxxxxxxxxxxxxxxxxxxxx; Simon Weller
>> Subject: Re: VXLAN and KVm experiences
>> 
>> 
>> 
>> On 10/23/18 1:51 PM, Simon Weller wrote:
>>> We've also been using VXLAN on KVM for all of our isolated VPC guest networks for quite a long time now. As Andrija pointed out, make sure you increase the max_igmp_memberships param and also put an ip address on each interface host VXLAN interface in the same subnet for all hosts that will share networking, or multicast
> won't work.
>>>
>> 
>> Thanks! So you are saying that all hypervisors need to be in the same L2
>> network or are you routing the multicast?
>> 
>> My idea was that each POD would be an isolated Layer 3 domain and that a
>> VNI would span over the different Layer 3 networks.
>> 
>> I don't like STP and other Layer 2 loop-prevention systems.
>> 
>> Wido
>> 
>>>
>>> - Si
>>>
>>>
>>> ________________________________
>>> From: Wido den Hollander <wido@xxxxxxxxx>
>>> Sent: Tuesday, October 23, 2018 5:21 AM
>>> To: dev@xxxxxxxxxxxxxxxxxxxxx
>>> Subject: Re: VXLAN and KVm experiences
>>>
>>>
>>>
>>> On 10/23/18 11:21 AM, Andrija Panic wrote:
>>>> Hi Wido,
>>>>
>>>> I have "pioneered" this one in production for last 3 years (and suffered a
>>>> nasty pain of silent drop of packages on kernel 3.X back in the days
>>>> because of being unaware of max_igmp_memberships kernel parameters, so I
>>>> have updated the manual long time ago).
>>>>
>>>> I never had any issues (beside above nasty one...) and it works very well.
>>>
>>> That's what I want to hear!
>>>
>>>> To avoid above issue that I described - you should increase
>>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
>>>> with more than 20 vxlan interfaces, some of them will stay in down state
>>>> and have a hard traffic drop (with proper message in agent.log) with kernel
>>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
>>>> pay attention to MTU size as well - anyway everything is in the manual (I
>>>> updated everything I though was missing) - so please check it.
>>>>
>>>
>>> Yes, the underlying network will all be 9000 bytes MTU.
>>>
>>>> Our example setup:
>>>>
>>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
>>>> - so this is defined as KVM traffic label. In our case it didn't make sense
>>>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>>>> test it on your own - since this bridge is used only to extract child
>>>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>>>> vxlanYYY@xxxxxxxxx and join this new vxlan interface to NEW bridge created
>>>> (and then of course vNIC goes to this new bridge), so original bridge (to
>>>> which bond0.xxx belonged) is not used for anything.
>>>>
>>>
>>> Clear, I indeed thought something like that would happen.
>>>
>>>> Here is sample from above for vxlan 867 used for tenant isolation:
>>>>
>>>> root@hostname:~# brctl show brvx-867
>>>>
>>>> bridge name     bridge id               STP enabled     interfaces
>>>> brvx-867                8000.2215cfce99ce       no              vnet6
>>>>
>>>>      vxlan867
>>>>
>>>> root@hostname:~# ip -d link show vxlan867
>>>>
>>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142 qdisc noqueue
>>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>>>>     link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>>>>     vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl 10 ageing 300
>>>>
>>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>>>           UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>>>
>>>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>>>> bond0.950 parent interface (which could affects traffic inside VM) - so
>>>> jumbo frames are needed anyway on the parent interface (bond.950 in example
>>>> above with minimum of 1550 MTU)
>>>>
>>>
>>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
>>> networks underneath will be ~9k.
>>>
>>>> Ping me if more details needed, happy to help.
>>>>
>>>
>>> Awesome! We'll be doing a PoC rather soon. I'll come back with our
>>> experiences later.
>>>
>>> Wido
>>>
>>>> Cheers
>>>> Andrija
>>>>
>>>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander <wido@xxxxxxxxx> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I just wanted to know if there are people out there using KVM with
>>>>> Advanced Networking and using VXLAN for different networks.
>>>>>
>>>>> Our main goal would be to spawn a VM and based on the network the NIC is
>>>>> in attach it to a different VXLAN bridge on the KVM host.
>>>>>
>>>>> It seems to me that this should work, but I just wanted to check and see
>>>>> if people have experience with it.
>>>>>
>>>>> Wido
>>>>>
>>>>
>>>>
>>>
>>