|
Re: debugging around machine-checks...: msg#00135os.freebsd.devel.alpha
On Wed, 23 Oct 2002, Andrew Gallatin wrote: > > > that FreeBSD is instantenously interrupted when a machine check happens > > and that I dont get crash-dumps. > > Hmm.. I haven't used a machine check generating alpha in a while, but > from the code in interrupt.c, it looks like it *should* give you a > crashdump. Perhaps I'm just clueless - I build my kernel with the option makeoptions DEBUG=-g (install, reboot) by hand I run dumpon -v /dev/da0b (which is my swap partition, twice what I have of ram in size) and then I do my fiddling with XFree86 that gives me the machine-check and I end up at the SRM prompt. At this point, I know that just booting will fail. I have to power-cycle the box and when it comes back up, savecore either doesn't find anything, or isn't being run by the rc scripts. Once I get a chance to log in /var/crash has only minfree in it... Should I be doing something else? I just looked in /var/log/mesages and saw no evidence of crashdumps being written (ie dumping to.... or dump 254 253 252 251... etc). > > Can't you use the program counter from the panic output as a start? > If its in the X server, there should be a PC from userspace. > (see disclaimer below) > So can you interpret this for me then - honestly I just dont know what all the fields represent -- I should probably just go read the source code and see :) Oct 8 06:42:24 liron /kernel: unexpected machine check: Oct 8 06:42:24 liron /kernel: Oct 8 06:42:24 liron /kernel: mces = 0x1 Oct 8 06:42:24 liron /kernel: vector = 0x660 Oct 8 06:42:24 liron /kernel: param = 0xfffffc0000006068 Oct 8 06:42:24 liron /kernel: pc = 0x1604006ac Oct 8 06:42:24 liron /kernel: ra = 0x12006cb10 Oct 8 06:42:24 liron /kernel: curproc = 0xfffffe0009910200 Oct 8 06:42:24 liron /kernel: pid = 90765, comm = XFree86 Oct 8 06:42:24 liron /kernel: Oct 8 06:42:24 liron /kernel: panic: machine check The program counter is pc? so I should be able to, with gdb and a debug-version of XFree86, figure out what code this is? > > > > Look at alpha/alpha/interrupt.c:badaddr_read(). > > If you're feeling really lucky, you could add code to send the > appropriate signal (sigbus?) if the PC is in a userland app. > > The problem with this is that machine checks are somewhat > asynchronous, and I'm not sure the PC at the time of the fault > corresponds to the PC that actually caused the fault. > (that's why there are so many memory barriers all over the pci probing > and baddaddr code). Your explanation is helpful, and perhaps I'll try your suggestion of turning userland machine checks into sigbus or something - I'm sure I'm just begging for trouble here, but at least this isn't a production machine that other people depend on :). To send a signal to a process from within the kernel, it seems I just call psignal(pid, signo) - is this right? Thanks very much for your information - looks like a little check in machine_check() in interrupt.c will do pretty much what I want - perhaps I'll make sure that my hack only works on processes who's name starts with 'X' or something just to be safe.... Fred -- Fred Clift - fclift@xxxxxxxxx -- Remember: If brute force doesn't work, you're just not using enough. To Unsubscribe: send mail to majordomo@xxxxxxxxxxx with "unsubscribe freebsd-alpha" in the body of the message |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: debugging around machine-checks...: 00135, Andrew Gallatin |
|---|---|
| Next by Date: | Re: debugging around machine-checks...: 00135, Wilko Bulte |
| Previous by Thread: | Re: debugging around machine-checks...i: 00135, Andrew Gallatin |
| Next by Thread: | Re: debugging around machine-checks...: 00135, Andrew Gallatin |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |