osdir.com

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Openstack] [NOVA][KOLLA] not able to configure pci pass-through on gpu devices


Dear Openstack community,

I am not able to configure PCI pass-through with GPUs successfully and was wondering if someone could give advice.

physical machine: Dell C4140

DT-V and SR-IOV are enabled on BIOS

# hosts names/role:

openstack-deployment --> kolla deployment and openstack client
controller node --> TEST-openstack-controller
proto-gpu --> compute node with 4 GPUs installed

latest nvidia cude software has been installed on the node

iommu kernel module has been enabled on GRUB but I am not sure whether it is working

# Check Grub config

[root at proto-gpu ~]# vi /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="nouveau.modeset=0 rd.driver.blacklist=nouveau crashkernel=auto rhgb quiet intel_iommu=on"
GRUB_DISABLE_RECOVERY="true"

### Check dmesg after system reboot

[root at proto-gpu ~]# dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 000000006f6c3000 001D0 (v01 DELL   PE_SC3   00000001 DELL 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.232060] DMAR: Host address width 46
[    0.232062] DMAR: DRHD base: 0x000000d37fc000 flags: 0x0
[    0.232068] DMAR: dmar0: reg_base_addr d37fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232069] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
[    0.232074] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232075] DMAR: DRHD base: 0x000000ee7fc000 flags: 0x0
[    0.232080] DMAR: dmar2: reg_base_addr ee7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232081] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    0.232085] DMAR: dmar3: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232086] DMAR: DRHD base: 0x000000aaffc000 flags: 0x0
[    0.232090] DMAR: dmar4: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232091] DMAR: DRHD base: 0x000000b87fc000 flags: 0x0
[    0.232096] DMAR: dmar5: reg_base_addr b87fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232097] DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
[    0.232101] DMAR: dmar6: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232102] DMAR: DRHD base: 0x0000009d7fc000 flags: 0x1
[    0.232106] DMAR: dmar7: reg_base_addr 9d7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.232107] DMAR: RMRR base: 0x0000006e2cd000 end: 0x0000006e7ccfff
[    0.232108] DMAR: RMRR base: 0x0000006f362000 end: 0x0000006f364fff
[    0.232109] DMAR: ATSR flags: 0x0
[    0.232110] DMAR: ATSR flags: 0x0
[    0.232112] DMAR-IR: IOAPIC id 12 under DRHD base  0xc5ffc000 IOMMU 6
[    0.232113] DMAR-IR: IOAPIC id 11 under DRHD base  0xb87fc000 IOMMU 5
[    0.232114] DMAR-IR: IOAPIC id 10 under DRHD base  0xaaffc000 IOMMU 4
[    0.232115] DMAR-IR: IOAPIC id 18 under DRHD base  0xfbffc000 IOMMU 3
[    0.232116] DMAR-IR: IOAPIC id 17 under DRHD base  0xee7fc000 IOMMU 2
[    0.232117] DMAR-IR: IOAPIC id 16 under DRHD base  0xe0ffc000 IOMMU 1
[    0.232118] DMAR-IR: IOAPIC id 15 under DRHD base  0xd37fc000 IOMMU 0
[    0.232120] DMAR-IR: IOAPIC id 8 under DRHD base  0x9d7fc000 IOMMU 7
[    0.232121] DMAR-IR: IOAPIC id 9 under DRHD base  0x9d7fc000 IOMMU 7
[    0.232122] DMAR-IR: HPET id 0 under DRHD base 0x9d7fc000
[    0.232123] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.232124] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.234147] DMAR-IR: Enabled IRQ remapping in xapic mode
[    2.094974] DMAR: dmar6: Using Queued invalidation
[    2.094983] DMAR: dmar4: Using Queued invalidation
[    2.094990] DMAR: dmar2: Using Queued invalidation
[    2.094995] DMAR: dmar1: Using Queued invalidation
[    2.095000] DMAR: dmar7: Using Queued invalidation
[    2.095043] DMAR: Setting RMRR:
[    2.098832] DMAR: Setting identity map for device 0000:00:14.0 [0x6f362000 - 0x6f364fff]
[    2.098838] DMAR: Setting identity map for device 0000:00:14.0 [0x6e2cd000 - 0x6e7ccfff]
[    2.098846] DMAR: Prepare 0-16MiB unity mapping for LPC
[    2.102818] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    2.102831] DMAR: Intel(R) Virtualization Technology for Directed I/O

### List GPUs

[root at proto-gpu ~]# lspci -nn | egrep -i nvidia
1a:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
1c:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
1d:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)
1e:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2] [10de:1db1] (rev a1)

### Check pci configuration on nova-compute

[root at proto-gpu ~]# docker exec -it nova_compute cat /etc/nova/nova.conf
...
[pci]
passthrough_whitelist = { "vendor_id": "10de", "product_id": "1db1" }
alias = { "vendor_id":"10de", "product_id":"1db1", "device_type":"type-PF", "name":"nv_v100" }

### Check pci configuration on nova-scheduler and nova-api

[root at TEST-openstack-controller ~]# docker exec -it nova_scheduler cat /etc/nova/nova.conf
...
[filter_scheduler]
enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters

[root at TEST-openstack-controller ~]# docker exec -it nova_api cat /etc/nova/nova.conf
...
[pci]
alias = { "vendor_id":"10de", "product_id":"1db1", "device_type":"type-PF", "name":"nv_v100" }

### Create a flavor
[root at openstack-deployment ~]# openstack flavor create gpu.medium --ram 4096 --disk 40 --vcpus 2 --property "pci_passthrough:alias"="nv_v100:2"

### Nova scheduler logs

2018-11-01 22:07:20.832 32 INFO nova.filters [req-7c6bc2df-028a-4d5f-a025-1e9a677883a4 02132c31dafa4d1d939bd52e0420b975 abc29399b91d423088549d7446766573 - default default] Filtering removed all hosts for the request with instance ID '6fd33275-7e1f-4673-a905-1213d4eaa1b3'. Filter results: ['RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 2)', 'PciPassthroughFilter: (start: 2, end: 0)']

PciPassthroughFilter is not returning any host and I am not sure why. What I am doing wrong?

please feel free to ask for more details as I have missed something important.

thank you very much

Manuel

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20181101/5c969cbd/attachment.html>