osdir.com
mailing list archive
Mozy Online Backup: 2GB Free. Automatic. Secure.

Subject: Re: Re: bit error rates - msg#00038

List: linux.file-systems.yaffs

Date: Prev Next Index Thread: Prev Next Index
On 2/15/06, Peter Barada <peterb@xxxxxxxxxxx> wrote:

> Well my perspective would be BadThing if its either 1% or 50% of the
> units suffering block losages like that.

If it's 1% and this is configurable, then there's no problem. I'd
leave the current semantics and live with a small return rate if I
felt data integrity was improved.

> In either case, you end up with 30-50% of your available space being lost.

True. But if this is configurable then people can decide. The fact
that nobody has patched the current behavior publicly suggests that
not enough people have problems with it.

> Imagine a 1GB iPod type device that after a year turns into a .5Gb iPod. I
> can imagine
> customers would get pretty bent out of shape over that...

Actually, I bet they do get 1% return rates or something anyway, so
that's not a big problem :-) I'm already on my second iPod (badly
horribly designed thing - it's possible for the filesystem to get
trashed too easily - but I got it because I wanted to run ipodlinux).
In the case of an iPod, most people care less and Apple certainly
expect you to have copies of all of your *uhum* paid music in iTunes
anyway.

Jon.


Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: Delayed list mail

On Wed, 15 Feb 2006, Wookey wrote: OK, no problems. Sorry for a last email... > There has been a small flood of pent-up mail arriving on the YAFFS list > over > the last few hours. That is my fault. I turned mailman off to do some > server > admin last week and forgot to turn it back on again until yesterday. > The > mail sent during that time has now come through. (That's why your patch > was > delayed Sergey - it's not a conspiracy). > > I don't think any mail has been lost. Anything you sent that hasn't > turned > up should be resent. Apologies for the delay. > > Wookey > -- > Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK Tel +44 (0) 1223 > 811679 > work: http://www.aleph1.co.uk/ play: > http://www.chaos.org.uk/~wookey/ > > _______________________________________________ > yaffs mailing list > yaffs@xxxxxxxxxxxxxxxxxxxxxx > http://stoneboat.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs > --- ****************************************************************** * KSI@home KOI8 Net < > The impossible we do immediately. * * Las Vegas NV, USA < > Miracles require 24-hour notice. * ******************************************************************

Next Message by Date: click to view message preview

Fwd: bit error rates]

---------- Forwarded message ---------- From: Jon Masters <jonmasters@xxxxxxxxx> Date: Feb 15, 2006 11:53 PM Subject: Re: [Yaffs] bit error rates] To: William Watson <wjw1961@xxxxxxxxx> On 2/15/06, William Watson <wjw1961@xxxxxxxxx> wrote: > I will also note that a NAND vendor who paid us a visit at about that same > time said that we should expect WORSE soft error behaviour with succeeding > generations of NAND flash chips. The geometries would get smaller and > smaller, the chip dies would get larger and larger, and the amount of time > for production testing of each chip would not increase, or at least, not > increase as fast as the total storage of a chip. Thus, the testing per page > would only go down in subsequent generations of chips. These two statements > seemed to say that we would see both (1) increased rates of ECC errors, and > (2) an increase in the number of marginal blocks not marked bad by the chip > vendor. But this sounds like it might be better to be additionally cautious - I agree that marking OOB data is a good idea, maybe I'll get to look at that. > Another obvious alternative strategy for preventing data loss due to > accumulation of multiple bit errors would be to periodically read the entire > data array, checking for ECC errors. You'd want to calculate the impact > that such reading would have on the rate of appearance of errors, as well as > the impact on system and NAND performance. For a standard file system, it > might suffice to perform one additional data chunk read for every N read > requests, incrementing the "scrub" page each time. This would ensure a > complete read scrub at a fixed percentage overhead. One could also perform > a read scrub every M write operations, if desired. A low priority kernel thread which sat and got woken up about as much as kswapd probably wouldn't have much impact but could do this - and only run when nothing else is using the flash part. This would be better in the MTD layer though and might necessitate some changes to the locking currently used. Jon.

Previous Message by Thread: click to view message preview

Re: Re: bit error rates

On Thu, 2006-02-16 at 12:43 +1300, Charles Manning wrote: > On Thursday 16 February 2006 02:25, Jon Masters wrote: > > On 2/10/06, Charles Manning <manningc2@xxxxxxxxxxxxx> wrote: > > > I think an interrupted erase is probably more likely to cause > > > problems, but again this is just a hunch. > > > > I wonder how we could implement logic to detect this. > > > > > Dealing to an interrupted write is relatively straight forward. It > > > will always be the last page written before the system went > > > down. Most of the time (except for the last page written to a > > > block), we can detect the last page because it is the last page > > > in the currently allocated block. > > > > I don't think this is currently testing on mount though. > > That is correct, it is not being done at present. I was thinking as to how it > might be done. > > > > > It would be nice to improve this, but as Jon sayas, I think data > > > integrity should always come first! > > > > Other people seem to disagree with my previous suggestions and I'm not > > saying I can't be wrong in the matter :-) But I've not seen excessive > > numbers of blocks being marked bad (except when fixing the OOB > > code...) with read ECC failures. I accept though that this might just > > be good old fashioned paranoia so if one of the vendor folks on this > > list can comment, it would really help. > > Some people have reported seeing a large number of blocks (~30-50%) being > retired on some devices. That's obviously not a GoodThing, but I'd like to > see what % of units failed. Then, how does one measure and evaluate this? > > To my mind, if you ship 1000 units and half of them lose 30-50% of their > blocks in a year of normal use, that's probably a BadThing. If this only > happens on 1% of shipped units it might be an OKThing (depending on your > perspective). Well my perspective would be BadThing if its either 1% or 50% of the units suffering block losages like that. In either case, you end up with 30-50% of your available space being lost. Imagine a 1GB iPod type device that after a year turns into a .5Gb iPod. I can imagine customers would get pretty bent out of shape over that... > However, losing data is also a BadThing. > > It's one of those rock-and-hard-place sandwich choices. Any mods will be > configurable to allow current semantics. > > -- Charles >

Next Message by Thread: click to view message preview

Re: Re: bit error rates

Peter Barada wrote: On Thu, 2006-02-09 at 23:13 +0000, Sergei Sharonov wrote: Yes, I have. I use a YAFFS1 NOR-based system, and in the writes, we lay down the data chunk, and then the tag. In the unlikely event that a power-cycle occurs while writing the data, the tag is still empty, but some of the data chunk is not erased, and then next time a write occurs into that chunk, YAFFS sees that the write fails since the previous data was written(and retires the whole block), even though the tag indicated the chunk is empty. To fix this, I used two bits in the pageStatus byte in the tag, and write the tag first, then the data, and then update the tag. Assuming that the pageStatus starts out as 0xff, then the first tag write puts in the value of the tag, but writes a pageStatus byte of 0xfe to indicate that a write is in progress, then writes the chunk data, and then comes back an re-writes the tag with the same data, and a pageStatus of 0xfc. In the rest of the code, the chunk is assumed to be valid if the pageStatus is 0xff(and objectId is non-0xfffff) or if 0xfc, empty if the objectId is 0xfffff, and deleted if the pageStatus is either 0xfe, or 0x00(the value written to delete a tag). This solved the problem for me. I assume an approach like this would work for NAND... If works for NAND only if the additional partial page programming fits NAND specifications. I have looked at Samsung datasheet and it say: "The number of consecutive partial page programming operation within the same page without an intervening erase operation should not exceed 2 for main array and 3 for spare array." So it should be ok for Samsung since in this case is 1 + 2. However Toshiba say: "Multiple partial page programming attempts in a block can aggravate this error symptom" referring to Program disturb soft error. I think a better solution is to check the power fail flag before any erase/programming cycle as suggested by Charles. Unfortunately this means modify the MTD driver, something like this: static int my_nand_erase(struct mtd_info *mtd, struct erase_info *instr) { if ( check_powerfail() ) return -EIO; else return nand_erase_nand (mtd, instr, 0); } and in nand_write_ecc() /* Check, if it is write protected or power fail */ if ( nand_check_wp(mtd) || check_powerfail() ) goto out; Cheers, Claudio Lanconelli
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by