osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][scheduler] - Stack VMs based on RAM


On Wed, 17 Apr 2019 12:55:45 -0700, Melanie Witt <melwittt at gmail.com> wrote:
> On Wed, 17 Apr 2019 22:45:45 +0300, Georgios Dimitrakakis
> <giorgis at acmac.uoc.gr> wrote:
>>    Hello again Menalie!
>>
>>    Exactly this is what I am thinking...something is not working
>>    correctly!
>>
>>    To answer your questions there is one node acting as controller where
>>    the scheduler is running and I have pasted the nova.conf file from
>>    there.
>>
>>    I have also noticed that I have "ram_weight_multiplier" two times (one
>>    in [cells] and one in [filter_scheduler]) therefore I have removed the
>>    one in [cells] because I though it might give a problem but the results
>>    are still the same.
>>
>>    The log for the scheduler has this entry:
>>
>>    2019-04-17 22:04:50.045 131723 DEBUG oslo_service.service
>>    [req-7e548ecb-f3ed-4a4d-835f-b3a996e32534 - - - - -]
>>    filter_scheduler.ram_weight_multiplier = -1.0 log_opt_values
>>    /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3032
>>
>>    so it seems to be picked up correctly but without any influence.
> 
> Agreed, that log shows that the -1.0 value is being picked up properly
> by the scheduler service.
> 
>>    What also worries me from the scheduler log that I have send to you
>>    before is that in there I see an entry like this:
>>
>>    2019-04-17 19:53:07.298 98874 DEBUG nova.filters
>>    [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>>    6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 -
>>    default default] Filter RamFilter returned 2 host(s)
>>    get_filtered_objects
>>    /usr/lib/python2.7/site-packages/nova/filters.py:104
>>
>>    Shouldn't the RamFilter return 1host and the one with less RAM? Why
>>    does it return 2hosts??
> 
> No -- the RamFilter will return any hosts that meet the RAM requirement.
> Filters do not weigh hosts. The RamFilter returns two hosts because both
> hosts have enough RAM to fulfill the request. FYI though, as of Pike
> [1], the (Core|Ram|Disk)Filter are redundant, as placement will do the
> filtering for those resources before the nova scheduler filters run. So
> you can safely remove (Core|Ram|Disk)Filter from your enabled_filters.
> 
> [1]
> https://docs.openstack.org/releasenotes/nova/pike.html#relnotes-16-0-0-stable-pike-upgrade-notes
> 
>>    If you have any other ideas or would like me to do some more checking I
>>    am all ears!
> At this point, you could take Matt's suggestion from his latest reply on
> this thread and patch in the logging regression fix he linked. That
> would allow you to see in the debug log what weights nova is giving to
> the hosts.

OK, so I just searched open nova bugs for "weigh" and found this issue, 
which isn't necessarily a defect:

https://bugs.launchpad.net/nova/+bug/1818239

but something that could be affecting the host weighing in your 
environment. There's something called the BuildFailureWeigher which will 
apply a low weight multiplier to hosts that have had VMs fail to build 
on them. And that weight resets when a host experiences a successful VM 
build.

If you apply the patch Matt suggested and take a look at the host 
weights, we should be able to see whether the BuildFailureWeigher is 
involved in the behavior you're seeing.

-melanie

> Aside from that, it's looking like we/I would need to reproduce this
> issue locally with a devstack and try to figure out what's causing this
> behavior.
> 
> -melanie
> 
>>>> Thank you both Melanie and Matt for trying to assist me.
>>>> I have double checked the nova.conf at the controller and here is
>>>> what
>>>>     I have (ignored hashed lines and obfuscating sensitive data):
>>>>     https://pastebin.com/hW1PE4U7
>>>> As you can see I have everything with default values as discussed
>>>>     before with Melanie except the filters and the weight that I have
>>>>     applied that should lead to VM stacking instead of spreading.
>>>> My case scenario is with two compute hosts (let's call them "cpu1"
>>>> and
>>>>     "cpu2") and when an instance is already placed on "cpu2" I expect
>>>> the
>>>>     next instance to be placed also there. But instead is placed on
>>>> "cpu1"
>>>>     as you can see from the scheduler log that can find here:
>>>>     https://pastebin.com/sCzB9L2e
>>>> Do you see something strange that I fail to recognize?
>>>
>>> Thanks for providing the helpful data. It appears you have set your
>>> nova.conf correctly (this is where your scheduler is running, yes?).
>>> I
>>> notice you have duplicated the ram_weight_multiplier setting but that
>>> shouldn't hurt anything.
>>>
>>> The relevant scheduler log is this one:
>>>
>>> 2019-04-17 19:53:07.303 98874 DEBUG nova.scheduler.filter_scheduler
>>> [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>>> 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 -
>>> default default] Weighed [(cpu1, cpu1) ram: 32153MB disk: 1906688MB
>>> io_ops: 0 instances: 0, (cpu2, cpu2) ram: 30105MB disk: 1886208MB
>>> io_ops: 0 instances: 1] _get_sorted_hosts
>>>
>>> /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455
>>>
>>> and here we see that host 'cpu1' is being weighed ahead of host
>>> 'cpu2', which is the problem. I don't understand this considering the
>>> docs say that setting the ram_weight_multiplier to a negative value
>>> should result in the host with the lesser RAM being weighed
>>> higher/first. According to your log, the opposite is happening --
>>> 'cpu1' with 32153MB RAM is being weighed higher than 'cpu2' with
>>> 30105MB RAM.
>>>
>>> Either your ram_weight_multiplier setting is not being picked up or
>>> there's a bug causing weight to be applied with reverse logic?
>>>
>>> Can you look at the scheduler debug log when the service first
>>> started up and verify what value of ram_weight_multiplier the service
>>> is using?
>>>
>>> -melanie
>>>
>>>>> On 4/16/2019 7:03 PM, melanie witt wrote:
>>>>>> To debug further, you should set debug to True in the nova.conf on
>>>>>> your scheduler host and look for which filter is removing the
>>>>>> desired
>>>>>> host for the second VM. You can find where to start by looking for
>>>>>> a
>>>>>> message like, "Starting with N host(s)". If you have two hosts
>>>>>> with
>>>>>> enough RAM, you should see "Starting with 2 host(s)" and then look
>>>>>> for
>>>>>> the log message where it says "Filter returned 1 host(s)" and that
>>>>>> will be the filter that is removing the desired host. Once you
>>>>>> know
>>>>>> which filter is removing it, you can debug further.
>>>>>
>>>>> If the other host isn't getting filtered out, it could be the
>>>>> weighers that aren't prioritizing the host you expect, but debug
>>>>> logs
>>>>> should dump the weighed hosts as well which might give a clue.
>>>>
>>>>
>>
> 
> 
> 
>