osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][scheduler] - Stack VMs based on RAM


On Wed, 17 Apr 2019 22:45:45 +0300, Georgios Dimitrakakis 
<giorgis at acmac.uoc.gr> wrote:
>   Hello again Menalie!
> 
>   Exactly this is what I am thinking...something is not working
>   correctly!
> 
>   To answer your questions there is one node acting as controller where
>   the scheduler is running and I have pasted the nova.conf file from
>   there.
> 
>   I have also noticed that I have "ram_weight_multiplier" two times (one
>   in [cells] and one in [filter_scheduler]) therefore I have removed the
>   one in [cells] because I though it might give a problem but the results
>   are still the same.
> 
>   The log for the scheduler has this entry:
> 
>   2019-04-17 22:04:50.045 131723 DEBUG oslo_service.service
>   [req-7e548ecb-f3ed-4a4d-835f-b3a996e32534 - - - - -]
>   filter_scheduler.ram_weight_multiplier = -1.0 log_opt_values
>   /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3032
> 
>   so it seems to be picked up correctly but without any influence.

Agreed, that log shows that the -1.0 value is being picked up properly 
by the scheduler service.

>   What also worries me from the scheduler log that I have send to you
>   before is that in there I see an entry like this:
> 
>   2019-04-17 19:53:07.298 98874 DEBUG nova.filters
>   [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>   6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 -
>   default default] Filter RamFilter returned 2 host(s)
>   get_filtered_objects
>   /usr/lib/python2.7/site-packages/nova/filters.py:104
> 
>   Shouldn't the RamFilter return 1host and the one with less RAM? Why
>   does it return 2hosts??

No -- the RamFilter will return any hosts that meet the RAM requirement. 
Filters do not weigh hosts. The RamFilter returns two hosts because both 
hosts have enough RAM to fulfill the request. FYI though, as of Pike 
[1], the (Core|Ram|Disk)Filter are redundant, as placement will do the 
filtering for those resources before the nova scheduler filters run. So 
you can safely remove (Core|Ram|Disk)Filter from your enabled_filters.

[1] 
https://docs.openstack.org/releasenotes/nova/pike.html#relnotes-16-0-0-stable-pike-upgrade-notes

>   If you have any other ideas or would like me to do some more checking I
>   am all ears!
At this point, you could take Matt's suggestion from his latest reply on 
this thread and patch in the logging regression fix he linked. That 
would allow you to see in the debug log what weights nova is giving to 
the hosts.

Aside from that, it's looking like we/I would need to reproduce this 
issue locally with a devstack and try to figure out what's causing this 
behavior.

-melanie

>>> Thank you both Melanie and Matt for trying to assist me.
>>> I have double checked the nova.conf at the controller and here is
>>> what
>>>    I have (ignored hashed lines and obfuscating sensitive data):
>>>    https://pastebin.com/hW1PE4U7
>>> As you can see I have everything with default values as discussed
>>>    before with Melanie except the filters and the weight that I have
>>>    applied that should lead to VM stacking instead of spreading.
>>> My case scenario is with two compute hosts (let's call them "cpu1"
>>> and
>>>    "cpu2") and when an instance is already placed on "cpu2" I expect
>>> the
>>>    next instance to be placed also there. But instead is placed on
>>> "cpu1"
>>>    as you can see from the scheduler log that can find here:
>>>    https://pastebin.com/sCzB9L2e
>>> Do you see something strange that I fail to recognize?
>>
>> Thanks for providing the helpful data. It appears you have set your
>> nova.conf correctly (this is where your scheduler is running, yes?).
>> I
>> notice you have duplicated the ram_weight_multiplier setting but that
>> shouldn't hurt anything.
>>
>> The relevant scheduler log is this one:
>>
>> 2019-04-17 19:53:07.303 98874 DEBUG nova.scheduler.filter_scheduler
>> [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e
>> 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 -
>> default default] Weighed [(cpu1, cpu1) ram: 32153MB disk: 1906688MB
>> io_ops: 0 instances: 0, (cpu2, cpu2) ram: 30105MB disk: 1886208MB
>> io_ops: 0 instances: 1] _get_sorted_hosts
>>
>> /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455
>>
>> and here we see that host 'cpu1' is being weighed ahead of host
>> 'cpu2', which is the problem. I don't understand this considering the
>> docs say that setting the ram_weight_multiplier to a negative value
>> should result in the host with the lesser RAM being weighed
>> higher/first. According to your log, the opposite is happening --
>> 'cpu1' with 32153MB RAM is being weighed higher than 'cpu2' with
>> 30105MB RAM.
>>
>> Either your ram_weight_multiplier setting is not being picked up or
>> there's a bug causing weight to be applied with reverse logic?
>>
>> Can you look at the scheduler debug log when the service first
>> started up and verify what value of ram_weight_multiplier the service
>> is using?
>>
>> -melanie
>>
>>>> On 4/16/2019 7:03 PM, melanie witt wrote:
>>>>> To debug further, you should set debug to True in the nova.conf on
>>>>> your scheduler host and look for which filter is removing the
>>>>> desired
>>>>> host for the second VM. You can find where to start by looking for
>>>>> a
>>>>> message like, "Starting with N host(s)". If you have two hosts
>>>>> with
>>>>> enough RAM, you should see "Starting with 2 host(s)" and then look
>>>>> for
>>>>> the log message where it says "Filter returned 1 host(s)" and that
>>>>> will be the filter that is removing the desired host. Once you
>>>>> know
>>>>> which filter is removing it, you can debug further.
>>>>
>>>> If the other host isn't getting filtered out, it could be the
>>>> weighers that aren't prioritizing the host you expect, but debug
>>>> logs
>>>> should dump the weighed hosts as well which might give a clue.
>>>
>>>
>