logo       

Re: Sym53C8xx Driver Hardening: msg#00155

Subject: Re: Sym53C8xx Driver Hardening

On Thu, 25 Jul 2002, Jeremy Higdon wrote:

> On Jul 25,  1:11am, Gérard Roudier wrote:
> >
> > By the way, the sym53c8xx_2 driver (and probably version 1 too) never
> > intentionnaly panics the system on hardware failure detection.
> > The couple of calls to panic you can see in the driver are related to
> > software unexpected situations.
> > In my opinion, serious high reliability requires special hardware support.
> >
> > On serious harware error detected, the driver simply tries to reset
> > everything it can in order to have a chance to restart operations
> > properly. May-be it should count such events and give up in some way after
> > some given number of retries. If you just suggest such 'give up'
> > operations to disable the device this should be doable, but this will not
> > make upper layers aware of the situation and certainly not be what most
> > users expect. This MEANS that serious high reliability ALSO requires
> > special SOFTWARE support and USER utilities, and certainly NOT ONLY be
> > based on some questionnable trivial tinking in device drivers, in my
> > opinion.
> >
> >   Gérard.
>
>
> Also, with some sorts of errors, one might suspect that the device has
> DMA'd data into the wrong place in memory.
>
> Generally, I think a crash is preferable to data corruption (i.e. no
> answer is better than a wrong answer), so in such cases, the driver
> might want to panic.

The sym53c8xx ensures that all possible path checkings against errors it
has under control are enabled. If the drivers gets the error, then this
ideally means that it is not a fatal system error or user or O/S does not
want it to handled so. Otherwise, some NMI should occur and the system
should halt. So, my opinion is that a hardware error that could be handled
as a system error but is not so should be considered as an invitation to
try to recover. Hence, the answer of the driver is the appropriate one, in
my opinion.

> If hardware can place a fence around DMA (such that DMA to unintended
> locations is impossible), then you might not want to panic in such
> situations, assuming that you can retry.

You may be dreaming there. At least, I never heard about such magic in
the PCI world.

Regards,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



<Prev in Thread] Current Thread [Next in Thread>