osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][scheduler] - Stack VMs based on RAM


 OK! I have applied the patch and now weights are shown!
 Furthermore as per your suggestion I have removed the "RamFilter" which 
 was the only one present.


 And here is the new log where spawning of 2 VMs can be seen with a few 
 seconds of difference: https://pastebin.com/Xy2FL2KL

 Initially both hosts are of weight 1.0 then the one with one VM already 
 running has negative weight but the new VM is placed on the other host.

 Really-really strange why this is happening...

 G.


> On Wed, 17 Apr 2019 12:55:45 -0700, Melanie Witt <melwittt at gmail.com> 
> wrote:
>> On Wed, 17 Apr 2019 22:45:45 +0300, Georgios Dimitrakakis
>> <giorgis at acmac.uoc.gr> wrote:
>>>    Hello again Menalie!
>>>
>>>    Exactly this is what I am thinking...something is not working
>>>    correctly!
>>>
>>>    To answer your questions there is one node acting as controller 
>>> where
>>>    the scheduler is running and I have pasted the nova.conf file 
>>> from
>>>    there.
>>>
>>>    I have also noticed that I have "ram_weight_multiplier" two 
>>> times (one
>>>    in [cells] and one in [filter_scheduler]) therefore I have 
>>> removed the
>>>    one in [cells] because I though it might give a problem but the 
>>> results
>>>    are still the same.
>>>
>>>    The log for the scheduler has this entry:
>>>
>>>    2019-04-17 22:04:50.045 131723 DEBUG oslo_service.service
>>>    [req-7e548ecb-f3ed-4a4d-835f-b3a996e32534 - - - - -]
>>>    filter_scheduler.ram_weight_multiplier = -1.0 log_opt_values
>>>    /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3032
>>>
>>>    so it seems to be picked up correctly but without any influence.
>> Agreed, that log shows that the -1.0 value is being picked up 
>> properly
>> by the scheduler service.
>>
>>>    What also worries me from the scheduler log that I have send to 
>>> you
>>>    before is that in there I see an entry like this:
>>>
>>>    2019-04-17 19:53:07.298 98874 DEBUG nova.filters
>>>    [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>>>    6a4c2e32919e4a6fa5c5d956beb68eef 
>>> 9f22e9bfa7974e14871d58bbb62242b2 -
>>>    default default] Filter RamFilter returned 2 host(s)
>>>    get_filtered_objects
>>>    /usr/lib/python2.7/site-packages/nova/filters.py:104
>>>
>>>    Shouldn't the RamFilter return 1host and the one with less RAM? 
>>> Why
>>>    does it return 2hosts??
>> No -- the RamFilter will return any hosts that meet the RAM 
>> requirement.
>> Filters do not weigh hosts. The RamFilter returns two hosts because 
>> both
>> hosts have enough RAM to fulfill the request. FYI though, as of Pike
>> [1], the (Core|Ram|Disk)Filter are redundant, as placement will do 
>> the
>> filtering for those resources before the nova scheduler filters run. 
>> So
>> you can safely remove (Core|Ram|Disk)Filter from your 
>> enabled_filters.
>> [1]
>> 
>> https://docs.openstack.org/releasenotes/nova/pike.html#relnotes-16-0-0-stable-pike-upgrade-notes
>>
>>>    If you have any other ideas or would like me to do some more 
>>> checking I
>>>    am all ears!
>> At this point, you could take Matt's suggestion from his latest 
>> reply on
>> this thread and patch in the logging regression fix he linked. That
>> would allow you to see in the debug log what weights nova is giving 
>> to
>> the hosts.
>
> OK, so I just searched open nova bugs for "weigh" and found this
> issue, which isn't necessarily a defect:
>
> https://bugs.launchpad.net/nova/+bug/1818239
>
> but something that could be affecting the host weighing in your
> environment. There's something called the BuildFailureWeigher which
> will apply a low weight multiplier to hosts that have had VMs fail to
> build on them. And that weight resets when a host experiences a
> successful VM build.
>
> If you apply the patch Matt suggested and take a look at the host
> weights, we should be able to see whether the BuildFailureWeigher is
> involved in the behavior you're seeing.
>
> -melanie
>
>> Aside from that, it's looking like we/I would need to reproduce this
>> issue locally with a devstack and try to figure out what's causing 
>> this
>> behavior.
>> -melanie
>>
>>>>> Thank you both Melanie and Matt for trying to assist me.
>>>>> I have double checked the nova.conf at the controller and here is
>>>>> what
>>>>>     I have (ignored hashed lines and obfuscating sensitive data):
>>>>>     https://pastebin.com/hW1PE4U7
>>>>> As you can see I have everything with default values as discussed
>>>>>     before with Melanie except the filters and the weight that I 
>>>>> have
>>>>>     applied that should lead to VM stacking instead of spreading.
>>>>> My case scenario is with two compute hosts (let's call them 
>>>>> "cpu1"
>>>>> and
>>>>>     "cpu2") and when an instance is already placed on "cpu2" I 
>>>>> expect
>>>>> the
>>>>>     next instance to be placed also there. But instead is placed 
>>>>> on
>>>>> "cpu1"
>>>>>     as you can see from the scheduler log that can find here:
>>>>>     https://pastebin.com/sCzB9L2e
>>>>> Do you see something strange that I fail to recognize?
>>>>
>>>> Thanks for providing the helpful data. It appears you have set 
>>>> your
>>>> nova.conf correctly (this is where your scheduler is running, 
>>>> yes?).
>>>> I
>>>> notice you have duplicated the ram_weight_multiplier setting but 
>>>> that
>>>> shouldn't hurt anything.
>>>>
>>>> The relevant scheduler log is this one:
>>>>
>>>> 2019-04-17 19:53:07.303 98874 DEBUG 
>>>> nova.scheduler.filter_scheduler
>>>> [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>>>> 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 
>>>> -
>>>> default default] Weighed [(cpu1, cpu1) ram: 32153MB disk: 
>>>> 1906688MB
>>>> io_ops: 0 instances: 0, (cpu2, cpu2) ram: 30105MB disk: 1886208MB
>>>> io_ops: 0 instances: 1] _get_sorted_hosts
>>>>
>>>> 
>>>> /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455
>>>>
>>>> and here we see that host 'cpu1' is being weighed ahead of host
>>>> 'cpu2', which is the problem. I don't understand this considering 
>>>> the
>>>> docs say that setting the ram_weight_multiplier to a negative 
>>>> value
>>>> should result in the host with the lesser RAM being weighed
>>>> higher/first. According to your log, the opposite is happening --
>>>> 'cpu1' with 32153MB RAM is being weighed higher than 'cpu2' with
>>>> 30105MB RAM.
>>>>
>>>> Either your ram_weight_multiplier setting is not being picked up 
>>>> or
>>>> there's a bug causing weight to be applied with reverse logic?
>>>>
>>>> Can you look at the scheduler debug log when the service first
>>>> started up and verify what value of ram_weight_multiplier the 
>>>> service
>>>> is using?
>>>>
>>>> -melanie
>>>>
>>>>>> On 4/16/2019 7:03 PM, melanie witt wrote:
>>>>>>> To debug further, you should set debug to True in the nova.conf 
>>>>>>> on
>>>>>>> your scheduler host and look for which filter is removing the
>>>>>>> desired
>>>>>>> host for the second VM. You can find where to start by looking 
>>>>>>> for
>>>>>>> a
>>>>>>> message like, "Starting with N host(s)". If you have two hosts
>>>>>>> with
>>>>>>> enough RAM, you should see "Starting with 2 host(s)" and then 
>>>>>>> look
>>>>>>> for
>>>>>>> the log message where it says "Filter returned 1 host(s)" and 
>>>>>>> that
>>>>>>> will be the filter that is removing the desired host. Once you
>>>>>>> know
>>>>>>> which filter is removing it, you can debug further.
>>>>>>
>>>>>> If the other host isn't getting filtered out, it could be the
>>>>>> weighers that aren't prioritizing the host you expect, but debug
>>>>>> logs
>>>>>> should dump the weighed hosts as well which might give a clue.
>>>>>
>>>>>
>>>
>>
>>
>>
>>