osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova] NUMA scheduling


> What is the error thrown by Openstack when NUMA0 is full?

 

OOM is actually killing the QEMU process, which causes Nova to report:

 

/var/log/kolla/nova/nova-compute.log.4:2020-08-25 12:31:19.812 6 WARNING nova.compute.manager [req-62bddc53-ca8b-4bdc-bf41-8690fc88076f - - - - -] [instance: 8d8a262a-6e60-4e8a-97f9-14462f09b9e5] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 4

 

So, there isn't a NUMA or memory-specific error from Nova - Nova is simply scheduling a VM on a node that it thinks has enough memory, and Libvirt (or Nova?) is configuring the VM to use CPU cores on a full NUMA node.

 

NUMA Node 1 had about 240GiB of free memory with about 100GiB of buffer/cache space used, so plenty of free memory, whereas NUMA Node 0 was pretty tight on free memory.

 

These are some logs in /var/log/messages (not for the nova-compute.log entry above, but the same condition for a VM that was killed - logs were rolled, so I had to pick a different VM):

 

Oct 10 15:17:01 <redacted hostname> kernel: CPU 0/KVM invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0

Oct 10 15:17:01 <redacted hostname> kernel: CPU: 15 PID: 30468 Comm: CPU 0/KVM Not tainted 5.3.8-1.el7.elrepo.x86_64 #1

Oct 10 15:17:01 <redacted hostname> kernel: Hardware name: <redacted hardware>

Oct 10 15:17:01 <redacted hostname> kernel: Call Trace:

Oct 10 15:17:01 <redacted hostname> kernel: dump_stack+0x63/0x88

Oct 10 15:17:01 <redacted hostname> kernel: dump_header+0x51/0x210

Oct 10 15:17:01 <redacted hostname> kernel: oom_kill_process+0x105/0x130

Oct 10 15:17:01 <redacted hostname> kernel: out_of_memory+0x105/0x4c0

â?¦

â?¦

Oct 10 15:17:01 <redacted hostname> kernel: active_anon:108933472 inactive_anon:174036 isolated_anon:0#012 active_file:21875969 inactive_file:2418794 isolated_file:32#012 unevictable:88113 dirty:0 writeback:4 unstable:0#012 slab_reclaimable:3056118 slab_unreclaimable:432301#012 mapped:71768 shmem:570159 pagetables:258264 bounce:0#012 free:58924792 free_pcp:326 free_cma:0

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 active_anon:382548916kB inactive_anon:173052kB active_file:0kB inactive_file:2272kB unevictable:289840kB isolated(anon):0kB isolated(file):128kB mapped:16696kB dirty:0kB writeback:0kB shmem:578812kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 286420992kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA free:15880kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 1589 385604 385604 385604

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA32 free:1535904kB min:180kB low:1780kB high:3380kB active_anon:90448kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1717888kB managed:1627512kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1008kB local_pcp:248kB free_cma:0kB

Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 0 384015 384015 384015

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 Normal free:720756kB min:818928kB low:1212156kB high:1605384kB active_anon:382458300kB inactive_anon:173052kB active_file:0kB inactive_file:2272kB unevictable:289840kB writepending:0kB present:399507456kB managed:393231952kB mlocked:289840kB kernel_stack:58344kB pagetables:889796kB bounce:0kB free_pcp:296kB local_pcp:0kB free_cma:0kB

Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 0 0 0 0

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA32: 1*4kB (U) 1*8kB (M) 0*16kB 9*32kB (UM) 11*64kB (UM) 12*128kB (UM) 12*256kB (UM) 11*512kB (UM) 11*1024kB (M) 1*2048kB (U) 369*4096kB (M) = 1535980kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 Normal: 76633*4kB (UME) 30442*8kB (UME) 7998*16kB (UME) 1401*32kB (UE) 6*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 723252kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB

Oct 10 15:17:01 <redacted hostname> kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Oct 10 15:17:01 <redacted hostname> kernel: 24866489 total pagecache pages

Oct 10 15:17:01 <redacted hostname> kernel: 0 pages in swap cache

Oct 10 15:17:01 <redacted hostname> kernel: Swap cache stats: add 0, delete 0, find 0/0

Oct 10 15:17:01 <redacted hostname> kernel: Free swap  = 0kB

Oct 10 15:17:01 <redacted hostname> kernel: Total swap = 0kB

Oct 10 15:17:01 <redacted hostname> kernel: 200973631 pages RAM

Oct 10 15:17:01 <redacted hostname> kernel: 0 pages HighMem/MovableOnly

Oct 10 15:17:01 <redacted hostname> kernel: 3165617 pages reserved

Oct 10 15:17:01 <redacted hostname> kernel: 0 pages hwpoisoned

Oct 10 15:17:01 <redacted hostname> kernel: Tasks state (memory values in pages):

Oct 10 15:17:01 <redacted hostname> kernel: [   2414]     0  2414    33478    20111   315392        0             0 systemd-journal

Oct 10 15:17:01 <redacted hostname> kernel: [   2438]     0  2438    31851      540   143360        0             0 lvmetad

Oct 10 15:17:01 <redacted hostname> kernel: [   2453]     0  2453    12284     1141   131072        0         -1000 systemd-udevd

Oct 10 15:17:01 <redacted hostname> kernel: [   4170]     0  4170    13885      446   131072        0         -1000 auditd

Oct 10 15:17:01 <redacted hostname> kernel: [   4393]     0  4393     5484      526    86016        0             0 irqbalance

Oct 10 15:17:01 <redacted hostname> kernel: [   4394]     0  4394     6623      624   102400        0             0 systemd-logind 

â?¦

â?¦

Oct 10 15:17:01 <redacted hostname> kernel: oom-kill:constraint=CONSTRAINT_MEMORY_POLICY,nodemask=0,cpuset=vcpu0,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d237\x2dinstance\x2d0000fda8.scope,task=qemu-kvm,pid=25496,uid=42436

Oct 10 15:17:01 <redacted hostname> kernel: Out of memory: Killed process 25496 (qemu-kvm) total-vm:67989512kB, anon-rss:66780940kB, file-rss:11052kB, shmem-rss:4kB

Oct 10 15:17:02 <redacted hostname> kernel: oom_reaper: reaped process 25496 (qemu-kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201017/5f9efa58/attachment-0001.html>