logo       

Re: debugging around machine-checks...: msg#00135

os.freebsd.devel.alpha

Subject: Re: debugging around machine-checks...

On Wed, 23 Oct 2002, Andrew Gallatin wrote:


>
> > that FreeBSD is instantenously interrupted when a machine check happens
> > and that I dont get crash-dumps.
>
> Hmm.. I haven't used a machine check generating alpha in a while, but
> from the code in interrupt.c, it looks like it *should* give you a
> crashdump.


Perhaps I'm just clueless - I build my kernel with the option

makeoptions DEBUG=-g


(install, reboot)

by hand I run dumpon -v /dev/da0b (which is my swap partition, twice what
I have of ram in size)

and then I do my fiddling with XFree86 that gives me the machine-check and
I end up at the SRM prompt. At this point, I know that just booting will
fail. I have to power-cycle the box and when it comes back up, savecore
either doesn't find anything, or isn't being run by the rc scripts. Once
I get a chance to log in /var/crash has only minfree in it...


Should I be doing something else?

I just looked in /var/log/mesages and saw no evidence of crashdumps being
written (ie dumping to.... or dump 254 253 252 251... etc).



>
> Can't you use the program counter from the panic output as a start?
> If its in the X server, there should be a PC from userspace.
> (see disclaimer below)
>

So can you interpret this for me then - honestly I just dont know what all
the fields represent -- I should probably just go read the source code and
see :)

Oct 8 06:42:24 liron /kernel: unexpected machine check:
Oct 8 06:42:24 liron /kernel:
Oct 8 06:42:24 liron /kernel: mces = 0x1
Oct 8 06:42:24 liron /kernel: vector = 0x660
Oct 8 06:42:24 liron /kernel: param = 0xfffffc0000006068
Oct 8 06:42:24 liron /kernel: pc = 0x1604006ac
Oct 8 06:42:24 liron /kernel: ra = 0x12006cb10
Oct 8 06:42:24 liron /kernel: curproc = 0xfffffe0009910200
Oct 8 06:42:24 liron /kernel: pid = 90765, comm = XFree86
Oct 8 06:42:24 liron /kernel:
Oct 8 06:42:24 liron /kernel: panic: machine check


The program counter is pc? so I should be able to, with gdb and a
debug-version of XFree86, figure out what code this is?


> >
>
> Look at alpha/alpha/interrupt.c:badaddr_read().
>
> If you're feeling really lucky, you could add code to send the
> appropriate signal (sigbus?) if the PC is in a userland app.
>
> The problem with this is that machine checks are somewhat
> asynchronous, and I'm not sure the PC at the time of the fault
> corresponds to the PC that actually caused the fault.
> (that's why there are so many memory barriers all over the pci probing
> and baddaddr code).


Your explanation is helpful, and perhaps I'll try your suggestion of
turning userland machine checks into sigbus or something - I'm sure I'm
just begging for trouble here, but at least this isn't a production
machine that other people depend on :).

To send a signal to a process from within the kernel, it seems I just call

psignal(pid, signo)

- is this right?


Thanks very much for your information - looks like a little check in
machine_check() in interrupt.c will do pretty much what I want - perhaps
I'll make sure that my hack only works on processes who's name starts
with 'X' or something just to be safe....


Fred


--
Fred Clift - fclift@xxxxxxxxx -- Remember: If brute
force doesn't work, you're just not using enough.


To Unsubscribe: send mail to majordomo@xxxxxxxxxxx
with "unsubscribe freebsd-alpha" in the body of the message



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise