|
Re: debugging around machine-checks...: msg#00143os.freebsd.devel.alpha
Fred Clift writes: > by hand I run dumpon -v /dev/da0b (which is my swap partition, twice what > I have of ram in size) > > and then I do my fiddling with XFree86 that gives me the machine-check and > I end up at the SRM prompt. At this point, I know that just booting will > fail. I have to power-cycle the box and when it comes back up, savecore > either doesn't find anything, or isn't being run by the rc scripts. Once > I get a chance to log in /var/crash has only minfree in it... > That *should* work.. > Should I be doing something else? > > I just looked in /var/log/mesages and saw no evidence of crashdumps being > written (ie dumping to.... or dump 254 253 252 251... etc). If you powercyle, the message buffer is lost. When I would crash X on an old miata, 1/2 the time I'd get a 'machine check in pal mode' -- this doesn't even get caught by the OS. However, if you're seeing the message below, I do not understand why you're not getting a crashdump. In any case, since the problem is probably with the X server (based on the mesage below), a crashdump would not help you. > > > > > Can't you use the program counter from the panic output as a start? > > If its in the X server, there should be a PC from userspace. > > (see disclaimer below) > > > > So can you interpret this for me then - honestly I just dont know what all > the fields represent -- I should probably just go read the source code and > see :) > > Oct 8 06:42:24 liron /kernel: unexpected machine check: > Oct 8 06:42:24 liron /kernel: > Oct 8 06:42:24 liron /kernel: mces = 0x1 > Oct 8 06:42:24 liron /kernel: vector = 0x660 > Oct 8 06:42:24 liron /kernel: param = 0xfffffc0000006068 > Oct 8 06:42:24 liron /kernel: pc = 0x1604006ac > Oct 8 06:42:24 liron /kernel: ra = 0x12006cb10 > Oct 8 06:42:24 liron /kernel: curproc = 0xfffffe0009910200 > Oct 8 06:42:24 liron /kernel: pid = 90765, comm = XFree86 > Oct 8 06:42:24 liron /kernel: > Oct 8 06:42:24 liron /kernel: panic: machine check > > > The program counter is pc? so I should be able to, with gdb and a > debug-version of XFree86, figure out what code this is? Yes, except its in a shared lib, or other dynamically loaded text. I don't know how you could debug that without a cordump. The ra (return address) is at least somewhere in the main text of the program (not a shared lib). <...> > Your explanation is helpful, and perhaps I'll try your suggestion of > turning userland machine checks into sigbus or something - I'm sure I'm > just begging for trouble here, but at least this isn't a production > machine that other people depend on :). > > To send a signal to a process from within the kernel, it seems I just call > > psignal(pid, signo) > > - is this right? > More or less. I think trapsignal may be more correct. > Thanks very much for your information - looks like a little check in > machine_check() in interrupt.c will do pretty much what I want - perhaps > I'll make sure that my hack only works on processes who's name starts > with 'X' or something just to be safe.... Good luck to you!! Drew To Unsubscribe: send mail to majordomo@xxxxxxxxxxx with "unsubscribe freebsd-alpha" in the body of the message |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | RE: Prpblems installing 4.7-RELEASE: 00143, John Baldwin |
|---|---|
| Next by Date: | snapshots.jp.freebsd.org [ISO's broken]: 00143, Wilkinson,Alex |
| Previous by Thread: | Re: debugging around machine-checks...i: 00143, Fred Clift |
| Next by Thread: | Re: debugging around machine-checks...: 00143, Wilko Bulte |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |