osdir.com
mailing list archive

Subject: Re: sbp2: sbp2util_node_write_no_wait failed - msg#00012

List: linux.kernel.firewire.user

Date: Prev Next Index Thread: Prev Next Index
On Saturday 05 November 2005 01:36, Michael Brade wrote:
> > > It doesn't seem justified to lock up for 30 seconds
> > > since a new label could be available much earlier. But that's just my
> > > guess.
> >
> > These pauses aren't spent locked-up in sbp2. It is the period that the
> > SCSI subsystem waits for completion of a task.
>
> I know... do you think I would put something at risk if I'd lower the
> timeout to, say, 5 seconds? 30 seconds is *really* annoying.
Well, I just did it now, I put
#define SD_TIMEOUT (7 * HZ)
in drivers/scsi/sd.c and I can tell you, finally it's fun again to work with
this hd :-) I'll see if it does any bad...

Cheers,
--
Michael Brade; KDE Developer, Student of Computer Science
|-mail: echo brade !#|tr -d "c oh"|s\e\d 's/e/\@/2;s/$/.org/;s/bra/k/2'
°--web: http://www.kde.org/people/michaelb.html

KDE 3: The Next Generation in Desktop Experience

Attachment: pgp2OuXkxLUex.pgp
Description: PGP signature

Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: sbp2: sbp2util_node_write_no_wait failed

On Friday 04 November 2005 23:13, Stefan Richter wrote: > Michael Brade wrote: > > Heh, good news, debugging finished! :-) > > Let's say, _nearly_ finished. Ok, fair enough, however, you didn't tell me yet what to do next... > There are 64 tlabels, therefore what you have seen is that sbp2 added > ORBs and rung the doorbell 64 times while the target did not finish > these 64 small transactions to the doorbell register with a response. Yep, but what I don't quite understand yet is when exactly it happens. It seems that I can copy one huge file almost without errors, maybe one or two, rarely more. But when I write a lot of small files and try to do some reading inbetween the error happens every 10 seconds or worse. With the odd exception to the rule. > > Any idea what to fix? > > I am unsure. > > Idea 1: > > Maybe we could move the initiator's part of the protocol (in particular, > writes to command block agent register and writes to doorbell register) > out of atomic context, e.g. into an additional kthread or into a > workqueue. That would let sbp2 wait for availability of a tlabel. That sounds about good to me ;-) > Moreover, as I said in an earlier reply, there are many reports about > mysterious "aborting sbp2 command" mishaps but only few reports about > "sbp2util_node_write_no_wait failed" along with command abortions. Hm, I have no idea about how many "aborting sbp2 command" reports there are but I found quite some reports on Google with the sbp2util_node_write_no_wait failed, none of them with a good answer though. > Therefore, IMO, we should implement changes of such kind only after we > understood the more common problem better and have an idea how such > changes may affect it, ideally cure it. Ok, is there anything I do to help there? I mean, I have a system where the error happens with 100% certainty every time, so as a test bed it's quite perfect, eh? > Idea 2: > > [...] It won't improve > performance relative to what you are seeing now, but it would lower the > risk of data loss. Well, as far as I can see I haven't lost any data yet. Do you reckon that could happen or is even likely to happen? Cause then I'd do some backups pretty soon. > > It doesn't seem justified to lock up for 30 seconds > > since a new label could be available much earlier. But that's just my > > guess. > > These pauses aren't spent locked-up in sbp2. It is the period that the > SCSI subsystem waits for completion of a task. I know... do you think I would put something at risk if I'd lower the timeout to, say, 5 seconds? 30 seconds is *really* annoying. Cheers, -- Michael Brade; KDE Developer, Student of Computer Science |-mail: echo brade !#|tr -d "c oh"|s\e\d 's/e/\@/2;s/$/.org/;s/bra/k/2' °--web: http://www.kde.org/people/michaelb.html KDE 3: The Next Generation in Desktop Experience pgpF61WlYhXWU.pgp Description: PGP signature

Next Message by Date: click to view message preview

Re: sbp2: sbp2util_node_write_no_wait failed

Michael Brade wrote: On Saturday 05 November 2005 01:36, Michael Brade wrote: It doesn't seem justified to lock up for 30 seconds since a new label could be available much earlier. But that's just my guess. These pauses aren't spent locked-up in sbp2. It is the period that the SCSI subsystem waits for completion of a task. I know... do you think I would put something at risk if I'd lower the timeout to, say, 5 seconds? 30 seconds is *really* annoying. Well, I just did it now, I put #define SD_TIMEOUT (7 * HZ) in drivers/scsi/sd.c and I can tell you, finally it's fun again to work with this hd :-) I'll see if it does any bad... Perhaps you need to define this time-out specifically for normal I/O (that which bothers you most; I think that would be sd_probe, perhaps sd_prepare_flush too) and keep the standard time-out for the rest, like spin-up, read capacity, cache sync on device removal, and so on. Note to other readers: Don't do this at home. A shorter SCSI timeout is only a hack, not a fix for sbp2's problems. It will just cause the command abortions to happen at higher frequency. -- Stefan Richter -=====-=-=-= =-== --=-= http://arcgraph.de/sr/ ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php

Previous Message by Thread: click to view message preview

Re: sbp2: sbp2util_node_write_no_wait failed

On Friday 04 November 2005 23:13, Stefan Richter wrote: > Michael Brade wrote: > > Heh, good news, debugging finished! :-) > > Let's say, _nearly_ finished. Ok, fair enough, however, you didn't tell me yet what to do next... > There are 64 tlabels, therefore what you have seen is that sbp2 added > ORBs and rung the doorbell 64 times while the target did not finish > these 64 small transactions to the doorbell register with a response. Yep, but what I don't quite understand yet is when exactly it happens. It seems that I can copy one huge file almost without errors, maybe one or two, rarely more. But when I write a lot of small files and try to do some reading inbetween the error happens every 10 seconds or worse. With the odd exception to the rule. > > Any idea what to fix? > > I am unsure. > > Idea 1: > > Maybe we could move the initiator's part of the protocol (in particular, > writes to command block agent register and writes to doorbell register) > out of atomic context, e.g. into an additional kthread or into a > workqueue. That would let sbp2 wait for availability of a tlabel. That sounds about good to me ;-) > Moreover, as I said in an earlier reply, there are many reports about > mysterious "aborting sbp2 command" mishaps but only few reports about > "sbp2util_node_write_no_wait failed" along with command abortions. Hm, I have no idea about how many "aborting sbp2 command" reports there are but I found quite some reports on Google with the sbp2util_node_write_no_wait failed, none of them with a good answer though. > Therefore, IMO, we should implement changes of such kind only after we > understood the more common problem better and have an idea how such > changes may affect it, ideally cure it. Ok, is there anything I do to help there? I mean, I have a system where the error happens with 100% certainty every time, so as a test bed it's quite perfect, eh? > Idea 2: > > [...] It won't improve > performance relative to what you are seeing now, but it would lower the > risk of data loss. Well, as far as I can see I haven't lost any data yet. Do you reckon that could happen or is even likely to happen? Cause then I'd do some backups pretty soon. > > It doesn't seem justified to lock up for 30 seconds > > since a new label could be available much earlier. But that's just my > > guess. > > These pauses aren't spent locked-up in sbp2. It is the period that the > SCSI subsystem waits for completion of a task. I know... do you think I would put something at risk if I'd lower the timeout to, say, 5 seconds? 30 seconds is *really* annoying. Cheers, -- Michael Brade; KDE Developer, Student of Computer Science |-mail: echo brade !#|tr -d "c oh"|s\e\d 's/e/\@/2;s/$/.org/;s/bra/k/2' °--web: http://www.kde.org/people/michaelb.html KDE 3: The Next Generation in Desktop Experience pgpF61WlYhXWU.pgp Description: PGP signature

Next Message by Thread: click to view message preview

Re: sbp2: sbp2util_node_write_no_wait failed

Michael Brade wrote: On Saturday 05 November 2005 01:36, Michael Brade wrote: It doesn't seem justified to lock up for 30 seconds since a new label could be available much earlier. But that's just my guess. These pauses aren't spent locked-up in sbp2. It is the period that the SCSI subsystem waits for completion of a task. I know... do you think I would put something at risk if I'd lower the timeout to, say, 5 seconds? 30 seconds is *really* annoying. Well, I just did it now, I put #define SD_TIMEOUT (7 * HZ) in drivers/scsi/sd.c and I can tell you, finally it's fun again to work with this hd :-) I'll see if it does any bad... Perhaps you need to define this time-out specifically for normal I/O (that which bothers you most; I think that would be sd_probe, perhaps sd_prepare_flush too) and keep the standard time-out for the rest, like spin-up, read capacity, cache sync on device removal, and so on. Note to other readers: Don't do this at home. A shorter SCSI timeout is only a hack, not a fix for sbp2's problems. It will just cause the command abortions to happen at higher frequency. -- Stefan Richter -=====-=-=-= =-== --=-= http://arcgraph.de/sr/ ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by