|
Nmi_watchdog and x86_64 lockups: msg#00003linux.smp
Having narrowly skirted death by allowing photographers near the lab today. . . I would like to enlist the open source community in debugging our Oracle problem. Online kernel docs recommend reporting issues with NMI (related to our lockup/dump issue) to the kernel-smp list. I have composed the following email, but want to make sure you are comfortable with me pursuing this. I do not mention any application details, but it is not possible to omit fairly detailed descriptions of the hardware when submitting to the kernel list. Not sure if that is kosher or not. Please let me know how I should proceed with this. Domo Arigato. Jeremy Hypothetical email: ------------------- I am looking for assistance with x86_64 SMP systems locking up. Under a heavy application workload, the system freezes and I am unable to send an alt-sysrq-d to trigger a dump. The systems are booting with nmi_watchdog=1 set, but the watchdog is not working. No oops events are registered in messages and I have observed nothing on the console (direct attached KVM - working on setting up a term server and logging serial console). According to nmi_watchdog.txt, I should see non-zero counters in /proc/interrupts with this enabled or "you probably have a processor that needs to be added to the nmi code". The lockups are occurring in two separate configurations (details below), both of which are showing all zeros for NMI in /proc/interrupts. Any advice on if these configurations are supported by the NMI code or suggestions for how to successfully get a dump would be most appreciated. Thanks in advance, Jeremy Ulstad Config 1: 2 x AMD Opteron 240 (8 GB RAM) SLES 9 Linux number6 2.6.5-7.111.19-smp #1 SMP Fri Dec 10 15:10:58 UTC 2004 x86_64 x86_64 x86_64 GNU/Linux number6:~ # cat /proc/interrupts CPU0 CPU1 0: 383170 23276745 IO-APIC-edge timer 1: 9 227 IO-APIC-edge i8042 2: 0 0 XT-PIC cascade 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 12: 207 0 IO-APIC-edge i8042 14: 4900 57432 IO-APIC-edge ide0 15: 54 0 IO-APIC-edge ide1 19: 0 0 IO-APIC-level ohci_hcd, ohci_hcd 27: 327047839 0 IO-APIC-level eth0, eth1 NMI: 0 0 LOC: 23656684 23657709 ERR: 0 MIS: 0 Config 2: 4 x AMD Opteron 850 (8 GB RAM) SLES 9 Linux riddick 2.6.5-7.145-smp #1 SMP Thu Jan 27 09:19:29 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux riddick:~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 20317266 25048606 25048495 25048500 IO-APIC-edge timer 1: 9 0 0 0 IO-APIC-edge i8042 2: 0 0 0 0 XT-PIC cascade 4: 652 92 0 0 IO-APIC-edge serial 8: 0 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 12: 59 0 0 0 IO-APIC-edge i8042 15: 63 4 0 0 IO-APIC-edge ide1 19: 0 0 0 0 IO-APIC-level ohci_hcd, ohci_hcd 25: 93875682 0 1 81 IO-APIC-level eth0 27: 0 275078 99550 4603 IO-APIC-level ioc0 NMI: 0 0 0 0 LOC: 95441672 95441724 95441724 95441606 ERR: 0 MIS: 0 I should also note that all the config 1 systems are being forced to 3.8 GB of memory with "mem=3800m" to compensate for a bug with lkcd which results in dumps (triggered manually with system up) failing with >= 4GB RAM. - To unsubscribe from this list: send the line "unsubscribe linux-smp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: your mail: 00003, Phil White |
|---|---|
| Next by Date: | Re: Nmi_watchdog and x86_64 lockups: 00003, Nielsen, Eric |
| Previous by Thread: | (unknown)i: 00003, pravin |
| Next by Thread: | Re: Nmi_watchdog and x86_64 lockups: 00003, Nielsen, Eric |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |