[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova] NUMA live migration is ready for review and testing

On Thu, 2019-08-15 at 15:23 -0500, Dean Troyer wrote:
> On Thu, Aug 15, 2019 at 2:31 PM Matt Riedemann <mriedemos at gmail.com> wrote:
> > As I've said in IRC a few times, this feature was mentioned (at the last
> > summit/PTG in Denver) as being critical for the next StarlingX release
> > so I'd really hope the StarlingX community can help review and test
> > this. I know there was some help from WindRiver in Stein which uncovered
> > some issues, so it would be good to have that same kind of attention
> > here. Feature freeze for Train is less than a month away (Sept 12).
> StarlingX does have time built in for this testing, intending to be
> complete before the STX 2.0 release at the end of August.  I've
> suggested that we need to test both Train and our Stein backport but I
> am not the one with the resources to allocate.
i doubt you will be able to safely backport this to Stein as it contains
RPC/object chagnes which would normally will break things on upgrade.

e.g. if you backport this to Stine in STX 1.Y.Z going form 1.0 to 1.Y.Z would
require you to treat it like a majour upgrde and upgrade all your contoller first
followed by the compute to ensure you never generate a copy of the updated object
before the nodes that recive tehm are updated.

if you don't do that then service will start exploding.

we did a part backport of numa aware vswitch internally and had to drop all the object
changes and schduler change and only backprot the virt driver chagnes as we could
not figure out a safe way to backprot ovo changes that would not break deployment
if you didnt syncquest eh update like a majory version upgrade wiche we cant
assume for z releases (x.y.z).

but glad to hear that in either case ye do plan to test it in some capasity.
i have dual numa hardware i own that i plan to test it on personally but the
more the better.
> > Again the testing here with real hardware is key, and that's something
> > I'd hope Intel/WindRiver/StarlingX folk can help with since I personally
> > don't have a lab sitting around available for NUMA testing. Since we
> > won't have third party CI for this feature, it's going to be important
> > that at least someone is hitting this with a real environment, ideally
> > with mixed Stein and Train compute services as well to make sure it
> > behaves properly during rolling upgrades.
> Oddly enough, in my $OTHER_DAY_JOB Intel's new Third Party CI is at
> the top of my list and we are getting dangerously close there in
> general, but this testing is unfortunately not first in line.
speak of that. i see igor rebased https://review.opendev.org/#/c/652197/
i havent really look at that since may and it looks like some files permission
have change so its currenlty broken. im not sure if he/ye planned on taking that
over or if he was just interested either is fine.

my fist party ci solution has kind of stalled since i just have not had time
to work on it (given it snot part of my $OTHER_DAY_JOB) so looking forward to
the third part ci you are working on. if i find time to work on that again i will but it
still didnt have full parity with what the intel nfv ci was testing as it was running
with the singel numa node guest we have in the gate but it would be nice to have
even basic first party test of pinning/hugepages at some point.
even though i wrote it i dont liek the fact that i was force to use fedroa with the virt
preview repos enabled ot get a new enough qemu/libvirt to even to patial testing witout
nested virt so i would still guess the third part ci would be more reliyable since it can
actully use nested vert provide you replace the default ubuntu kernel with somthing based on 4.19

> dt